(Rothe) Topic From Relativity (2010)

Topics from Relativity 1
SEVERAL TOPICS FROM RELATIVITY
FRANZ ROTHE
2010 Mathematics subject classification: 51Fxx; 51Pxx; 70H40.

Keywords and phrases: Instructional exposition, General theory, Geometry and physics, special relativity.
Contents
1 Riemannian geometry 4
1.1 Curved coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 About differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Riemannian manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Special Relativity 23
2.1 Relativity of time and length . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Discovery of Aberration and Parallax . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Aberration and the Doppler effect . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 The one-dimensional Doppler effect . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Four-vectors and Minkowski metric . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 The relativistic Doppler effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 Four-velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.8 The energy-momentum vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.9 The Compton effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.10 Collision of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.11 The motion of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 The Lorentz Group 45

3.1 Different aging of twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 The Lorentz transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Infinitesimal generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2 F. Rothe
4 The Poincaré Half-Plane Model 56

4.1 Poincaré half-plane and Poincaré disk . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 The Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 The curve of minimal hyperbolic length . . . . . . . . . . . . . . . . . . . . . . 60
4.4 The minimum of hyperbolic length . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Some useful reflections in the half-plane . . . . . . . . . . . . . . . . . . . . . . 62
5 Equation of motion 64
5.1 Affine geodesic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Metric geodesic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 The quadratic Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4 Null geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.6 Killing vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6 Geodesics in the Schwarzschild metric 75

6.1 The equation for the shape of relativistic orbits . . . . . . . . . . . . . . . . . . 77
6.2 Kepler’s classical nonrelativistic orbits . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Scattering in Newtonian dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.4 Perturbation expansion for relativistic bounded orbits . . . . . . . . . . . . . . . 79
6.5 The mercury perihelion rotation . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.6 Perburtation expansion for the angle of deflection . . . . . . . . . . . . . . . . . 81
6.7 The bending of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7 Gauss’ Differential Geometry and the Pseudo-Sphere 83

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 About Gauss’ differential geometry . . . . . . . . . . . . . . . . . . . . . . . . 83
7.3 Riemann metric of the Poincaré disk . . . . . . . . . . . . . . . . . . . . . . . . 84
7.4 Riemann metric of Klein’s model . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.5 A second proof of Gauss’ remarkable theorem . . . . . . . . . . . . . . . . . . 91
7.6 Principal and Gaussian curvature of rotation surfaces . . . . . . . . . . . . . . . 94
7.7 The pseudo-sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.8 Poincaré half-plane and Poincaré disk . . . . . . . . . . . . . . . . . . . . . . . 99
7.9 Embedding the pseudo-sphere into Poincaré’s half-plane . . . . . . . . . . . . . 100
7.10 Embedding the pseudo-sphere into Poincaré’s disk . . . . . . . . . . . . . . . . 101
7.11 About circle-like curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.12 Mapping the boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
References
[1] Max Born, Einstein’s Theory of Relativity, revised edition, Dover publications, 1924, 1965.
[2] , The Born-Einstein Letters 1916-1955, Friendship, Politics and Physics in Uncertain Times, Macmillan,
2005.
[3] Ta-Pei Cheng, Relativity, Gravitation and Cosmology, second edtion, Oxford University Press, 2010.
[4] Leo Corry, David Hilbert and the Axiomatization of Physics (1898-1918), from Grundlagen der Geometrie to
Grundlagen der Physik , Kluwer Academic Publishers, ISBN 1-4020-2777-X(HB), 2004.
[5] David Griffiths, Introduction to Elementary Particles, second edition, Wiley-VCH, 2008.
[6] Stephen G. Gasiorowicz Jeremy Bernstein, Paul M. Fishbane, Modern Physics, Prentice-Hall, 2000.
[7] G.Efstathiou M.P.Hobson and A.N.Lasenby, General Relativity, Cambridge University Press, 2006.
[8] Abraham Pais, Subtle is the Lord—the science and life of Albert Einstein, Oxford University Press, 1982.
[9] David Park, The Grand Contraption— the world in myth, number, and chance, second printing, Princeton
University Press, 2005.
[10] Matthew Sands Richard P. Feynman, Robert B. Leighton, The Feynman Lectures on Physics, Addison-Wesley,
1965.
[11] ’Gerald ’t Hooft, Introduction to General Relativity, Rinton Press Princeton, 2001.
[12] Hermann Weyl, Raum-Zeit-Materie, 5th revised edition, Springer, 1923.
Franz Rothe, Department of Mathematics and Statistics

University of North Carolina at Charlotte
Charlotte, NC 28223
e-mail: [email protected]
4 F. Rothe
1. Riemannian geometry
1.1. Curved coordinate systems The conversion of spherical coordinates (r, θ, φ) to Cartesian
coordinates (x, y, z) is
x = r sin θ cos φ (1.1)
y = r sin θ sin φ (1.2)
z = r cos θ (1.3)
These are the formulas normally used to define spherical coordinates, taking as their standard
domain the values (r, θ, φ) with r ≥ 0, 0 ≤ θ ≤ π and −π < φ ≤ π. According to the multivariable
chain rule, the conversion of the differentials becomes
 dx   sin θ cos φ r cos θ cos φ −r sin θ sin φ   dr 
    
 dy  =  sin θ sin φ r cos θ sin φ r sin θ cos φ   dθ  (1.4)
     
dz cos θ −r sin φ 0 dφ
For example we consider a point particle moving along a path x = x(t), y = y(t), z = z(t),
respectively r = r(t), θ = θ(t), φ = φ(t). The components of the velocity
dx y dy z dz
vx = ,v = ,v =
dt dt dt
respectively
dr dθ dφ
wr = , wθ = , wz =
dt dt dt
have the same transformation law given by formula (1.4). Indeed, one gets
 v   sin θ cos φ r cos θ cos φ −r sin θ sin φ   wr
 x    

 vy  =  sin θ sin φ r cos θ sin φ r sin θ cos φ   wθ 
 (1.5)
  φ
cos θ −r sin φ
 z  
v 0 w
The following is a totally different situation. A potential V is called scalar if the invariance
W(r, θ, φ) = V(x, y, z)
holds. Usually, we use the same letter, and indicate by a prime that the functions V and W
are different. But they describe the same geometric or physical situation. According to the
multivariable chain rule, the conversion of the gradient [∂r W, ∂θ W, ∂φ W] to [∂ x V, ∂y V, ∂z V] becomes
 sin θ cos φ r cos θ cos φ −r sin θ sin φ 
 
[∂r W, ∂θ W, ∂φ W] = [∂ x V, ∂y V, ∂z V]  sin θ sin φ r cos θ sin φ r sin θ cos φ  (1.6)
 
cos θ −r sin φ 0
 
One sees that the same matrix as in formula (1.5) occurs, but now on the other side of the equation.
Too, note that this coincidence is only made possible by using column vectors for the differentials,
whereas we have used row vectors for the gradient. I use the symbol T meaning "transpose",
to convert row vectors to column vectors, for typesetting convenience, and for transposition of
matrices.
Definition 1.1 (Components of a contravariant vector). The components of a contravariant

vector have the same conversion (1.4) as the differentials.
Definition 1.2 (Components of a covariant vector). The components of a covariant vector have
the same conversion (1.6) as the components of the gradient of a scalar.
For example, the electric field [E x , Ey , Ez ] in Cartesian coordinates, gets in spherical coordi-
nates the components [Fr , Fθ , Fφ ] such that
 sin θ cos φ r cos θ cos φ −r sin θ sin φ 
 
[Fr , Fθ , Fφ ] = [E x , Ey , Ez ]  sin θ sin φ r cos θ sin φ r sin θ cos φ  (1.7)
 
cos θ −r sin φ 0
 
Remark. The same conversion (1.6) as for a gradient, holds for partial derivatives in any general
linear transformation, but only for the covariant derivatives in all nonlinear transformations.
1.2. About differentiable manifolds These laws for the conversion of coordinates systems
apply in a more general context, indeed on any differential manifold.
Definition 1.3 (Differential manifold). A differential manifold M of dimension n is a topological
locally compact Hausdorff space, with the following additional structure. For every point there
exists a neighborhood with a coordinate system (x1 , . . . , xn ).
The coordinate systems for intersecting neighborhoods are compatible. There exists bijective
differentiable transformations between any two different coordinate systems say (x1 , . . . , xn ) and
(x10 , . . . , xn0 ), which are valid in the intersection of such neighborhoods and make them compatible.
Remark. Thus coordinate transformation are considered to be passive transformations, naming the
same points with different coordinate labels,—at least at first hand. We name the coordinates,
respectively base vectors, of two system S and S 0 by the same letters and use primes for
distinguishing between them.
Remark. The extension of such transformations to the entire manifold are called point transfor-
mations. Existence of such point transformations, indeed locally of any prescribed form, can be
proved. The proof uses a tool called the partition of unity.
Definition 1.4 (Tangent plane and tangent manifold). For every point P of a differential manifold
of dimension n, there exists a tangent plane T P . This is an n-dimensional linear space, with some
basis [e1 , . . . , en ]. The contravariant differential [dxa ] corresponds to the vector
X n
dx = ea dxa
a=1
These vector are invariant under coordinate transformations, just like scalar quantities. The union
[
T= TP
P∈M
of all tangent planes is called the tangent manifold. This is a differential manifold of dimension 2n.
Lemma 1.1. Under any point transformation
xa0 = xa0 (x1 , . . . , xn ) for a = 1, . . . n
the contravariant components [vb ] have the transformation law
∂x0a b
v0a = v for a = 1, . . . n
∂xb
and the base vectors eb have the transformation law
∂x0a 0
eb = e a for b = 1, . . . n
∂xb
Hence the vector itself is invariant:
v = vb eb = v0a e0 a
These are the vectors in the tangent plane T P .
6 F. Rothe
Remark. We have already used the Einstein sum convention: over any index which appears upper
and lower-hand in an expression, is to be taken the sum, unless otherwise stated. For example, the
differential is simply written
dx = ea dxa
Remark. The existence of the tangent manifold can be proved in general, by means of a some
rather abstract (and farfetched) construction. The existence proof is easy under the assumption the
manifold is, at least locally, embedded into any higher dimension flat space RN , and no restrictions
on the dimension N are imposed. 1 This situation occurs for general relativity.
Indeed, suppose an embedding of a manifold M ⊆ RN is given by formulas
X i = X i (x1 , . . . xn ) for i = 1, . . . , N (1.8)
The tangent space basis at any point P = (x1 , . . . , xn ) is simply
#T
∂X ∂X N
" 1
ea = , . . . , for a = 1, . . . , n (1.9)
∂xa ∂xa
Definition 1.5 (Dual linear space). The dual X ∗ of a real linear space X consists of all linear
functionals X 7→ R. We get the natural bilinear form h·, ·i which maps X ∗ × X to R and assigns to
any ordered pair of x∗ ∈ X ∗ and x ∈ X the real number hx∗ , xi.
The cotangent plane T P∗ is identified with the dual of T P , which is linear space of linear
functionals T p 7→ R. One gets a convenient basis [ωa ] for T P∗ by requiring

1 if a = b;

hω , eb i = δ b = 
a a

0 if a , b

for a, b = 1 . . . n and extending by linearity.
Lemma 1.2. Under any point transformation

xa0 = xa0 (x1 , . . . , xn ) for a = 1, . . . n
the covariant components [ fb ] have the transformation law
∂x0a
fb = fa for b = 1, . . . n
∂xb
and the base vectors ωb have the transformation law
∂x0a b
ω0 a = ω for a = 1, . . . n
∂xb
Hence the vector
v = fb ωb = fa0 ω0 a
itself is invariant. These are the vectors in the cotangent plane T P∗ .
Lemma 1.3. For a particle moving in a scalar potential field S with velocity v, the rate of change
of the potential felt by the particle is
dS
= (∂a S )va = h∇S , vi
dt
1
Allowing any large number N, this not a too strong assumption. The situation is different, once restrictions on the
number N are imposed.
Proof. For the rate of change of the composite function S (t) = S (x1 (t), . . . , xn (t)), the multivariable
chain rule gives
dS ∂S dxa
= a
dt ∂x dt
but this is just the functional ∇S = (∂a S )ωa ∈ T P∗ applied to the vector v = vb eb since
h∇S , vi = h(∂a S )ωa , vb eb i = (∂a S )vb δab = (∂a S )va

Lemma 1.4. By the rule

hb, ai = h∗ id(a), bi∗ (1.10)
for all a ∈ T p and b ∈ T P∗ ,
the natural bijection id : T P 7→ T P∗∗
is given. Via this mapping id, one
may identify the double-dual T P∗∗ with the tangent plane T P , Hence one may view T P as the space
of linear functionals T P∗ 7→ R.
Proof. The double-dual T P∗∗ consists by definition of the linear functionals T P∗ 7→ R. A basis [fa ]
of T P∗∗ is given by requiring

1 if a = b;

h fa , ω i = δa = 
∗ b ∗ b

(1.11)
0 if a , b

and extending by linearity. An injective linear mapping id : T P 7→ T P∗∗ is defined by setting

id(ea ) = fa for a = 1 . . . n and extending by linearity. Since the linear spaces T P and T P∗∗ both have
the same finite dimension n, one obtains even a bijection. This bijection gives the identification
T P∗∗ = T P . Since
h∗ fa , ωb i∗ = δab = hωb , ea i
linearity gives the rule (1.10) for all c ∈ T P∗∗ = T P and b ∈ T P∗ .
1.3. Tensors
Definition 1.6 (Tensor). Let q ≥ 0 and r ≥ 0 be integers. A tensor T of type (q, r) is a multilinear
mapping 1
q factors r factors
z }| { z }| {
T : T P∗ × T P∗ × · · · × T P∗ × T P × T P × · · · × T P 7→ R
The type of such a tensor is q-fold contravariant and r-fold covariant.
We see that a tangent vector v ∈ T P has become a tensor of type (1, 0). It has been "disguised"
as the linear mapping h., vi : T P∗ 7→ R. A generic tensor T of type (q, r) is a linear combination
a1 ...aq
T = T b1 ...br ea1 ⊗ ea2 ⊗ · · · ⊗ eaq ⊗ ωb1 ⊗ ωb2 ⊗ · · · ⊗ ωbr (1.12)
a1 ...aq
Note the convenience of the Einstein sum convention. In shorthand, we often write [T b1 ...br ]
for such a tensor, since all information is already contained in the set of components.
The basis of the linear space of tensors of type (q, r) is the set of exterior products
ea1 ⊗ ea2 ⊗ · · · eaq ⊗ ωb1 ⊗ ωb2 ⊗ · · · ωbr = ea1 ...aq b1 ...br
1
Some authors used to talk about a "machine", but mathematicians still need coffee machines to convert their thoughts
into theorems, nevertheless.
8 F. Rothe
with any a1 . . . aq and b1 . . . br in 1 . . . n. The basis tensors are defined by the requirements
ea1 ...aq b1 ...br ωc1 , . . . , ωcq , ed1 , . . . , edr = δac11 · δac11 · δbd1 1 · δbd1 1


1 if a1 = c1 , . . . , aq = cq and b1 = d1 , . . . , br = dr ;

= 

0 otherwise

with any a1 . . . aq , c1 . . . cq and b1 . . . br , d1 . . . dq in 1 . . . n and extending by linearity. The

dimension of this linear space is nq+r .
Lemma 1.5. The rule

a1 ⊗ a2 ⊗ · · · aq ⊗ b1 ⊗ b2 ⊗ · · · br c1 , . . . , cq , d1 , . . . , dr
= hc1 , a1 i · · · hcq , aq i · hb1 , d1 i · · · hbr , dr i
holds for any a1 . . . aq ∈ T p , c1 . . . cq ∈ T P∗ and b1 . . . bq ∈ T P∗ , d1 . . . dq ∈ T P .
Problem 1.1. In general, when transforming the components of a tensor of arbitrary type (q, r), the
components for the S 0 -system are obtained from those of the S -system putting for each superscript
a Jacobian transformation matrix ∂x0a /∂xc , and for each subscript an inverse Jacobian ∂xc /∂x0a .
Both Jacobians appear on the right-hand side together with the S -system tensor.
Apply these rules to get the components t0 abc from the components tdef .
Answer.
∂xd ∂xe ∂x0c f
t0 abc = t
∂x0a ∂x0b ∂x f de
Problem 1.2. Suppose we may only use the Jacobian ∂x0a /∂xc but not its inverse. Under that
restriction, the superscripts are handled in the same way as in problem 1.1, but for each subscript
the Jacobian ∂x0a /∂xc appear on the left-hand side together with the S 0 -system tensor.
Apply these rules to relate the components t0 abc and tdef .
Answer.
∂x0a ∂x0b 0 c ∂x0c f
t = t
∂xd ∂xe ab ∂x f de
Remark. It helps to remember that upper and lower indices on the same side of the equation are
always paired, when one counts the upper index in the denominator of a differential quotient like a
lower index.
Definition 1.7 (Affine connection). An affine connection defines the tangential part of the rate of
change of the tangent-plane and its base vectors. The connection coefficients are defined by
Γabc = hωa , ∂c eb i (1.13)
Lemma 1.6. An affine connection yields
h∂c ωa , eb i = −Γabc (1.14)
and hence determines, too, the cotangential part of the rate of change of the cotangent-plane and
its base vectors.
Lemma 1.7. The rate of change of the tangent base vectors are
∂c eb = ea Γabc + nbc
where the normal parts satisfy
hωa , nbc i = 0
for all a, b, c = 1 · · · n.
Lemma 1.8. The rate of change of the cotangent base vectors are
∂c ωa = −ωb Γabc + mac
where the normal parts satisfy
hmac , eb i = 0
for all a, b, c = 1 · · · n.
Definition 1.8 (Intrinsic derivatives). The intrinsic derivatives 1 of the base vectors are
∂c eb = ea hωa , ∂c eb i = Γabc ea (1.15)
∂c ω = ω h∂c ω , eb i =
a b a
−Γabc ωb (1.16)
Definition 1.9 (Covariant derivative). Let c be any index in 1 . . . n. The partial covariant
derivative of a scalar function S is equal to the partial derivative:
∇c S = ∂ c S (1.17)
The partial covariant derivative ∇c T of any tensor T, say of type (q, r) as in equation (1.12), is
obtained by using linearity and the Leibniz product for the partial derivatives of the components,
and the intrinsic partial derivatives of the base vectors.
Remark. Suppose a specific embedding M ⊆ RN of the manifold is given. Then the following
situation occurs:
• The intrinsic part of any vector X ∈ RN is given by the projection
X
Pro jkT P X = ea hωa , Xi (1.18)
a=1...n
• As explained in lemma 1.12 below, the connection turns out to be symmetric: Γabc = Γacb
holds for all indices a, b, c.
• In traditional Gaussian differential geometry the dimensions are n = 2 and N = 3. Here the
normal parts can be calculated, and yield the Weingarten formulas. One needs to carefully
distinguish partial derivatives, which refer to the embedding R3 , from the intrinsic derivatives
used in this exposition.
Lemma 1.9. The following are the two simplest cases for a covariant derivative. Let u = v s e s be
a contravariant vector and c be any index in 1 . . . n. The c-th partial covariant derivative has the
components
∇c va = ∂c va + Γasc v s (1.19)
Let f = fb ωb be a covariant vector. The c-th partial covariant derivative has the components
∇c fb = ∂c fb − Γ sbc f s (1.20)
1
In this exposition to differentiable manifolds, the intrinsic derivatives get no own symbol.
10 F. Rothe
Lemma 1.10 (Rule to get covariant derivatives). In the general case of a tensor of type (q, r), the
components of the partial covariant derivative are denoted by
a1 ...aq a1 ...aq
∇c T b1 ...br or even simpler T b1 ...br ;c
They are obtained by the following rule: Each such components is the sum of 1 + q + r terms. The
first term is the partial derivative
a ...a
∂c T 1 q b1 ...br
The remaining terms are all products of the tensor components with a Christoffel symbol. The next
q terms are added. For each term, a Christoffel symbol has robbed a different one of contravariant
indices a1 . . . aq and replaced this index by a contravariant summation index s. The Christoffel
symbol gets the robbed index, the covariant summation index s, and c as last index.
The last r terms are subtracted. Once more, for each term, a Christoffel symbol "has robbed" a
different one of covariant indices b1 . . . br , and replaced it by a covariant summation index s. The
Christoffel symbol gets as a contraction the corresponding contravariant summation index s, the
robed index, and c as last index.
Proof for a contravariant vector. Let u = v s e s be a contravariant vector and c be any index in
1 . . . n. The c-th partial covariant derivative is
Dv
∇c v = ea hωa , i
Dxc
where the capital D means taking into account the derivatives of the base vectors e s , too. But
because of the projection along the tangent plane, only the tangential part of these derivatives
incorporated into the connection coefficient is taken into account. Hence
D(v s e s ) a ∂v ∂e s
s
∇c v = ea hωa , i = ea hω , es + vs c i
Dx c ∂x"
c ∂x #
∂va ∂e s ∂va
= ea c + ea u s hωa , c i = ea + Γasc v s
∂x ∂x ∂xc

Lemma 1.11. For any product or contraction of tensors, the covariant derivatives are formed
following the Leibniz product rule.
Proof for the simplest case. Take the contraction va fa of a contravariant vector v = va ea and a
covariant vector f = fb ωb . Since the contraction is a scalar, its covariant derivative is just the
partial derivative, and clearly satisfies the Leibniz product rule.
∇c (va fa ) = ∂c (va fa ) = (∂c va ) fa + va (∂c fa )

= (∂c va + Γasc v s ) fa + vb (∂c fb − Γ sbc f s ) = (∇c va ) fa + vb (∇c fb )
One can add and subtract the Christoffel terms to get the corresponding covariant derivatives. Thus
one sees that the covariant partial derivative satisfies the Leibniz product rule.
One sees that formula (1.17), together with lemma 1.9 and 1.11 already uniquely determine the
covariant derivatives of tensors of types (0, 0), (1, 0) and (0, 1). One may proceed inductively from
a ...a
tensors of type (q, r) to those of types (q + 1, r) and (q, r + 1). Indeed, take any tensor [T 1 q b1 ...br ]
of type (q, r), and [v s ] of types (1, 0), as well as [ ft ] of type (0, 1).
a ...a a ...a
The outer product [v s T 1 q b1 ...br ] is of type (q+1, r). Similarly, the outer product [T 1 q b1 ...br ft ]
is of type (q, r + 1). There covariant derivatives are to be obtained via the stipulated Leibniz rule.
One obtains:
a ...a a ...a a ...a
∇c (v s T 1 q b1 ...br ) = (∇c v s )T 1 q b1 ...br + v s (∇c T 1 q b1 ...br ) (1.21)
where the right-hand side may already be calculated using formula (1.19), and the rule for tensors of
type (q, r). Thus one may proceed inductively to tensors of any type, and check that the rules given
by definition 1.9 are always valid. Indeed, with a bid of additional work, one obtains following
result.
Theorem 1.1. Once a connection is specified, and the Leibniz rule rule is stipulated, the covariant
derivatives are uniquely determined for all tensors of arbitrary types. Indeed, they are obtained by
the rule from lemma 1.10, and moreover have the following properties:
• The covariant derivative of any contraction equals the contraction of the covariant derivative.
As an example we take a tensor [tabd f ] of type (2, 2).
If vabd f c = ∇c tabd f and one contracts to ya f = tabb f then ∇c ya f = vabb f c

In the end of course, both sides are called ∇c tabb f .
• The Leibniz product rule holds for all possible products of tensors. As an example take a
tensor [tab ] of type (2, 0) and a tensor [ fd ] of type (0, 1).
∇c (tab fd ) = (∇c tab ) fd + tab (∇c fd )
Problem 1.3. Apply the rules from lemma 1.10 to get the covariant derivative ∇d tabc .
Answer.
∇d tabc = ∂d tabc − Γ s ad t sbc − Γ s bd tasc + Γc sd tabs
Proposition 1.1 (Transformation of an affine connection). For any C 2 -smooth point transforma-
tion between two coordinate systems, say (x1 , . . . , xn ) and (x10 , . . . , xn0 ), the Christoffel symbols are
transformed following the rule
∂x0a ∂x f ∂xg d ∂x0a ∂2 xd
Γ0abc = Γ +
∂xd ∂x0b ∂x0c f g ∂xd ∂x0c ∂x0b (1.22)
∂x0a ∂x f ∂xg d ∂x f ∂xd ∂2 x0a
= Γ −
∂xd ∂x0b ∂x0c f g ∂x0b ∂x0c ∂xd ∂x f
Proof. The connection coefficients in the S 0 -system are defined following the rule (1.13)
∂e0 b
Γ0abc = hω0 a , i
∂x0c
Now the point transformation gives
∂x0a d ∂ ∂x f
!
Γ0abc = h d ω , 0c ef i
∂x ∂x ∂x0b
∂x0a ∂x f ∂e f ∂2 x f
!
= h d ωd , + e f i
∂x ∂x0b ∂x0c ∂x0b ∂x0c
∂x0a ∂x f d ∂xg ∂e f ∂2 x f
" #
= hω , 0c g i + 0b 0c hω , e f id
∂xd ∂x0b ∂x ∂x ∂x ∂x
∂x ∂x ∂x d
0a
∂ x
2 d
" f g
#
= Γ +
∂xd ∂x0b ∂x0c f g ∂x0b ∂x0c
12 F. Rothe
thus proving the first formula. Inverse point transformations have inverse Jakobi matrices. Hence
∂x0a ∂xd
= δac
∂xd ∂x0c!
∂ ∂x0a ∂xd
=0
∂x0b ∂xd ∂x0c
∂2 x0a ∂x f ∂xd ∂x0a ∂2 xd
+ =0
∂xd ∂x f ∂x0b ∂x0c ∂xd ∂x0c ∂x0b
Thus we get the second formula from the first one.
Corollary 1. For a C 2 -smooth manifold, the variation of the connection symbols δΓabc is a tensor.
Definition 1.10 (Torsion tensor). For a C 2 -smooth manifold, the antisymmetric part T abc =
Γabc − Γa cb is a tensor, called the torsion tensor.
Definition 1.11 (Symmetric connection). A connection is called symmetric iff Γabc = Γacb holds
for all indices a, b, c.
Problem 1.4. Convince convince yourself that a C 2 -smooth point transformation takes a symmetric
connection to a symmetric one.
Lemma 1.12. Suppose there exists a C 2 -smooth embedding M ⊆ RN of the manifold. Then the
connection coefficients are symmetric: Γabc = Γacb holds for all indices a, b, c.
Proof. With the embedding given by formulas (1.8), the tangent space basis at any point P =
(x1 , . . . , xn ) has the vectors
~
#T
∂X ∂X N ∂X
" 1
eb = ,..., b = b
∂x b ∂x ∂x
for b = 1, . . . , n. Hence the connection coefficients are
~
∂2 X
Γabc = hωa , ∂c eb i = hωa , i = Γacb
∂xb ∂xc
since the order taking partial derivatives can be exchanged for C 2 -smooth functions.
Proposition 1.2. Assume that the connection is symmetric. For any given point, there exists a point
transformation which makes all connection coefficients zero at this point.
Proof. To transform the connection coefficients at point P to zero, the quadratic transformation
Γabc (P) h b i
x0a = xa − xa (P) + x − xb (P) xc − xc (P) for a = 1 . . . n

(1.23)
2
will do. Since the connection is symmetric the derivatives of the above transformation are
∂x0a
= δab + Γabc (P)(xc − xc (P))
∂xb
∂2 x0a
= Γabc (P)
∂xb ∂xc
Now the second formula from equation (1.22) to transform the connection coefficients yields
Γ0abc = Γ −
∂xd ∂x0b ∂x0c f g ∂x0b ∂x0c ∂xd ∂x f
∂x f ∂xg h i ∂x f ∂xd
= 0b 0c δad + Γadc (P)(xc − xc (P)) Γdf g − 0b 0c Γad f (P)
∂x ∂x ∂x ∂x
∂x f ∂xg a ∂x f ∂xg a ∂x f ∂xd
= 0b 0c Γ f g + 0b 0c Γ dc (P)(x − x (P))Γdf g − 0b 0c Γad f (P)
c c
∂x ∂x ∂x ∂x ∂x ∂x
∂x ∂x
f g h i ∂x ∂x a
f g
= 0b 0c Γaf g − Γag f (P) + Γ (P)(xc − xc (P))Γdf g
∂x ∂x ∂x0b ∂x0c dc
Both terms are zero at point P. If the connection coefficient are C 1 -smooth, both terms are small
near the point P. Indeed
Γ0 (x0 ) = O(kx − x(P)k) = O(kx0 − x0 (P)k) = O(kx0 k)

Problem 1.5. If the connection is not assumed to be symmetric, convince yourself that one may at
least achieved by the above quadratic transformation that Γ0abc (P) = −Γ0acb (P).
Problem 1.6. Assume that the connection is symmetric and the connection coefficient are C 1 -
smooth. Use the first formula from equation (1.22) and the quadratic transformation x0 7→ x
Γdbc (P) 0b 0c
xd = xd (P) + x0d − x x for d = 1 . . . n (1.24)
2
to achieve that the transformed connection Γ0 (P) = 0 and moreover Γ0 (x0 ) = O(kx0 k) near to the
point P.
Proof. Since the connection is symmetric the derivatives of the above transformation (1.24) are
∂xd
= δab − Γdbc (P)x0c
∂x0b
∂2 xd
= −Γdbc (P)
∂x0b ∂x0c
Now the first formula from equation (1.22) to transform the connection coefficients yields
∂x0a ∂x f ∂xg d ∂2 x d
" #
Γ0abc = Γ +
∂xd ∂x0b ∂x0c f g ∂x0c ∂x0b
∂x0a h f i
= (δ b − Γ fbc (P)x0c )(δgc − Γgce (P)x0e )Γdf g − Γdbc (P)
∂x d
∂x0a h f g d f g f 0c g d f 0c g
i
= δ δ c Γ − δ Γ ce (P)x Γ
0e d
− Γ (P)x δ c Γ + Γ (P)x Γ ce (P)x Γ
0e d
− Γd
(P)
∂xd b fg b fg bc fg bc fg bc
∂x0a h d i
= Γ bc − Γgce (P)x0e Γdbg − Γ fbc (P)x0c Γdf c + Γ fbc (P)x0c Γgce (P)x0e Γdf g − Γdbc (P)
∂xd
∂x0a h d i
= Γ − Γ d
(P) + O(kx0 k) = O(kx0 k)
∂xd bc bc
Theorem 1.2. The partial covariant derivatives (∇c T)ec of any tensor T of type (q, r), is a tensor
of type (q, r + 1).
14 F. Rothe
Proof for the case (q, r) = (1, 0) . Let v = va ea be a contravariant vector. The c-th partial
covariant derivative has the components
∂va
∇c v a = + Γasc v s
∂xc
obtained from formula (1.19). Under any point transformation xa0 = xa0 (x1 , . . . , xn ) the contravariant
components [vb ] have the transformation law
∂x0a b
v0a = v
∂xb
The partial derivatives are
∂v0a ∂xc ∂v0a ∂xc ∂ ∂x0a b ∂xc ∂2 x0a ∂xc ∂x0a ∂vb
!
= = v = 0e b c vb + 0e b c
∂x 0e ∂x ∂x
0e c ∂x ∂x ∂x
0e c b ∂x ∂x ∂x ∂x ∂x ∂x
But we need the covariant derivatives
∂v0a
∇0e v0a = + Γ0abe v0b
∂x0e
The connection coefficient are transformed by the second form of the rule (1.22)
Γ0abe = Γ
0b ∂x0e f g
−
"∂x 0a∂x g
d ∂x0b ∂x0e# ∂xd ∂x f
∂x ∂x d ∂x ∂2 x0a ∂x f 0b
d
Γ0abe v0b = Γ − v
∂xd ∂x0e f g
∂x0e ∂xd ∂x f ∂x0b
∂x ∂x d ∂xb ∂2 x0a
" 0a c #
= Γ − vf
∂xd ∂x0e f c ∂x0e ∂xb ∂x f
and addition of the formulas yields the covariant derivatives, and the terms with the second
derivative cancel.
∂v0a
∇0e v0a = + Γ0abe v0b
∂x0e
∂xc ∂2 x0a ∂xc ∂x0a ∂vb ∂x0a ∂xc ∂xb ∂2 x0a
= 0e b c vb + 0e b c + b 0e Γbf c v f − 0e b f v f
∂x ∂x ∂x ∂x ∂x ∂x ∂x ∂x ∂x ∂x ∂x
∂xc ∂x0a ∂vb ∂x c
∂x 0a
" #
= 0e b + Γbf c v f = 0e b ∇c vb
∂x ∂x ∂xc ∂x ∂x

Proof for the case (q, r) = (0, 1). Let f = fa ωa be a covariant vector. The c-th partial covariant
derivative has the components
∂ fa
∇c fa = c − Γ sac f s
∂x
obtained from formula (1.20). Under any point transformation xa = xa (x10 , . . . , xn0 ) the transformed
covariant components [ fb0 ] are
∂x s
fb0 = 0b f s
∂x
and taking the partial derivatives one gets
∂ fb0 ∂ ∂x s ∂2 x s ∂xa ∂ fa ∂xd
!
= f s = f s +
∂x0c ∂x0c ∂x0b ∂x0b ∂x0c ∂x0b ∂xd ∂x0c
But we need the covariant derivatives

∂ fb0
∇0c fb = − Γ0sbc f s0
∂x0c
The connection coefficient are transformed by the first form of the rule (1.22)
∂x0s ∂xh ∂xg d ∂x0s ∂2 xd
Γ0sbc = Γ
0b ∂x0c hg
+
"∂x h∂x g ∂xd ∂x
# ∂x
d 0c 0b
∂x ∂x d ∂ x
2 d
∂x 00s
Γ0sbc f s0 = Γ hg + 0c 0b f
∂x ∂x
0b 0c ∂x ∂x ∂xd s
∂x ∂xg d ∂2 x d
" h #
= Γ + fd
∂x0b ∂x0c hg ∂x0c ∂x0b
and subtraction of the formulas yields the covariant derivatives. The terms with the second
derivative cancel. Some index shoveling is needed, and one gets
∂ fb0 ∂xa ∂xd ∂ fa ∂xh ∂xg d
∇0c fb = − Γ0s 0
f = − Γ fd
∂x0c bc s
∂x0b ∂x#0c ∂xd ∂x0b ∂x0c hg
∂xa ∂xd ∂ fa ∂xa ∂xd
"
= 0b 0c − Γ s
f s = ∇c fa
∂x ∂x ∂xd ad
∂x0b ∂x0c

End of the proof of theorem 1.2. One now proceeds by induction. Because of formula (1.17) the
covariant derivatives of tensors of type (0, 0) have type (0, 1). In other words, the derivative of a
scalar is a covariant vector.
By the proof for the case (q, r) = (1, 0), the covariant derivative [∇c va ] of a tensor [va ] of
type (1, 0) has type (1, 1). The Leibniz product rule from equation (1.3) implies that the covariant
derivatives of tensors of type (0, 1) has type (0, 2). Indeed, for any contravariant vector [va ] and
covariant vector [ fa ] holds
∇c (va fa ) − (∇c va ) fa = vb (∇c fb )
Both terms on the left-hand side have type (0, 1). The contravariant vector [vb ] has type (1, 0),
and let the tensor [∇c fb ] have type (q, r). The contraction operation lowers the type from (q, r) to
(q, r−1) for the tensor [vb ∇c fb ]. Since types on both sides are equal one concludes (0, 1) = (q, r−1),
and hence tensor [∇c fb ] has type (0, 2).
Based on the fact that all covariant derivatives are to be obtained via the stipulated Leibniz
rule (1.21)
a ...a a ...a a ...a
∇c (v s T 1 q b1 ...br ) = (∇c v s )T 1 q b1 ...br + v s (∇c T 1 q b1 ...br )
one may proceed inductively from tensors of type (q, r) to those of types (q + 1, r) and (q, r + 1),
a ...a
and check that indeed the tensor [∇c T 1 q b1 ...br ] is of type (q, r + 1).
Definition 1.12 (Intrinsic derivative of a vector along a curve). For a field v = va ea of

contravariant vectors, the intrinsic derivative along the curve x = x(u) with any real parameter
u is
Dva dva dxc
= + Γasc v s (1.25)
du du du
c
Dv dx
= (∇c va )ea (1.26)
du du
16 F. Rothe
For a field f = fb ωb of covariant vectors, the intrinsic derivative along the curve x = x(u) with any
real parameter u is
D fb d fb dxc
= − Γ sbc fb (1.27)
du du du
c
Df dx
= (∇c fb )ωb (1.28)
du du
Remark. The first formula (1.25) yields
Dva dva ∂v
" a # c c
a s dxc a s dx a dx
= + Γ sc v = + Γ v = ∇ c v
du du du ∂xc sc
du du
Hence after putting a basis the second formula (1.26) is obtained.
Dv D(va ea ) Dva dxc
= = ea = (∇c va )ea
du Du du du
N
Remark. Suppose a specific embedding M ⊆ R of the manifold is given. The intrinsic part of any
vector X ∈ RN is given by the projection Pro jkT P X from equation (1.18). The intrinsic derivative
of the vector v along a curve xc = xc (u) gets
Dv dv
= Pro jkT P (1.29)
du du
Indeed
d(va ea ) s ∂e s dx
" a c
#
dv dv
Pro jkT P = Pro jkT P = Pro jkT P ea + v c
du du du ∂x du
dva ∂e s dx c
dva
∂e s dx
c
= ea + v s Pro jkT P c = ea + v s ea hωa , c i
du " a ∂x #du du ∂x du
c c
dv dx dx
= + v s Γasc ea = ∇c v a e a

du du du
Definition 1.13 (Intrinsic derivative along a curve). Given is a curve xc = xc (u) and a tensor
field T = T(x) of type (q, r). The intrinsic derivative DT
Du of the tensor T along this curve is defined
by the identity
DT a ...a dxc
= (∇c T 1 q b1 ...br )ea1 ...aq b1 ...br (1.30)
du du
N
Under the assumption that specific embedding M ⊆ R of the manifold is given
DT a ...a dT(x(u))
= ea1 ...aq b1 ...br hhω 1 q b1 ...br , ii (1.31)
du du
Corollary 2. We assume an embedding M ⊆ RN of the manifold exists. Then the intrinsic derivative
DT
Du of the tensor T of type (q, r) is again a tensor of the same type (q, r).
Proof. The equation (1.31) is a coordinate free definition since the projection involved is coordi-
nate free. 1
Corollary 3. We assume an embedding M ⊆ RN of the manifold exists. The covariant derivative of
the tensor T of type (q, r)
∇T = (∇c T) ⊗ ωc
is a tensor of the same type (q, r + 1).
1
Helmholtz’ ants can feel tensors but no coordinates.
Proof. The contraction

DT dxc
= (∇c T)
du du
dxc
is of type (q, r) and the tangential vector du is of type (1, 0). Hence the tensor ∇c T is of type
(q, r + 1).
1.4. Riemannian manifold
Definition 1.14 (Riemannian manifold). A Riemannian manifold M is a differentiable manifold

with an additional metric structure. At every point and for every differential dx = ea dxa at that
point, the length ds is given by the Riemannian metric
ds2 = gab dxa dxb (1.32)
The symmetric matrix [gab ] is assumed to be nonsingular
gab = gba and g := det gab , 0
1
and have the same number of positive and negative eigenvalues at all points of the manifold.
Definition 1.15 (Dot product). By putting ea · eb = gab for all a, b = 1 . . . n and extending by
linearity, a commutative dot product is defined on a Riemann manifold.
Postulate. For a Riemannian manifold, the tangent space T P and the cotangent space T P∗ are
identified by the requirement that the inner product equals the bilinear form:
b · a = hb, ai for all b ∈ T P∗ and a ∈ T P (1.33)
Lemma 1.13. The postulate to identify the tangent plane to the cotangent plane is equivalent to
the rules for lifting and lowering an index:
ωa = gab eb and ea = gab ωb
Here the matrices [gab ] and [gab ] are inverse of each other. The rules extend to any of the indices
of any tensor of any type.
Moreover, under this postulate hold both
ωa · ec = δac
(1.34)
ea · ec = gac
for all a, c = 1 . . . n. The postulate is compatible with the identification T P∗∗ = T P from lemma 1.4.
Proof. For the base vectors, the requirement (1.33) gives

ωa · ec = hωa , ec i = δac for all a, c = 1 . . . n (1.35)
Since [eb ] is a basis of the tangent plane, any identification T P ↔ T P∗ gives formulas ωa = gab eb
with a, b = 1 . . . n. From equations (1.35) one obtains
gab eb · ec = gab gbc = δac (1.36)
Hence the matrices [gab ] and [gab ] are inverse to each other, and one has obtained the rule to lift
the index. The rule to lower the index is now easy to check.
1
The last requirement can be proved under the assumption that the manifold is connected.
18 F. Rothe
Conversely, the rules to lower and lift indices imply the requirement (1.33) that the inner
product equals the bilinear form.
The identifications T P∗∗ = id(T P ) = T P from equation (1.10) and T P∗ = T P from equation (1.33)
work together to produce
b · a = hb, ai = h∗ id(a), bi∗ = h∗ a, bi∗ = a · b
for all b ∈ T P∗ and a ∈ T P . One puts h., .i = h∗ ., .i∗ . No contradiction arises since the dot product is
commutative.
Corollary 4. Corresponding rules to lift and lower indices apply to tensors of any type (q, r).
Problem 1.7. Apply the rules for lifting and lowering the indices from lemma 1.13 to get the
components tabc in terms of the components tdef .
Answer.
tabc = gad gc f tdbf
Lemma 1.14. For any C 1 -smooth Riemann manifold, the metric tensor has covariant derivatives
zero, and indeed satisfies
∂c gab = Γbac + Γabc and ∇c gab = 0 (1.37)
Proof. The connection coefficients are defined by equation (1.13). By the rule from lemma 1.13,
one may lower the first indices on both sides and obtain
Γabc = hωa , ∂c eb i
Γabc = hea , ∂c eb i
Taking the intrinsic partial derivative ∂c on both sides of the second equation (1.34) and using
Leibniz rule and commutativity of the dot product, we get
ea · eb = gab
eb · (∂c ea ) + ea · (∂c eb ) = ∂c gab
From the postulate to identify the tangent space T P and the cotangent space T P∗ , the dot products
are bilinear forms and
heb , ∂c ea i + hea , ∂c eb i = ∂c gab
For the partial and the covariant derivatives of the metric tensor, one gets
∂c gab = Γbac + Γabc
∇c gab = ∂c gab − g sb Γ sac − gas Γ sbc = ∂c gab − Γbac − Γabc = 0

Theorem 1.3. For a C 2 -smoothly embedded Riemann manifold, the metric tensor determines the
connection symbols.
1
Γabc = (∂b gca + ∂c gab − ∂a gbc ) (1.38)
2
gad
Γabc = (∂b gcd + ∂c gdb − ∂d gbc ) (1.39)
2
Proof. By lemma 1.12 the connection coefficients are symmetric: Γabc = Γacb and hence Γabc =
Γacb . Since the equations
Γbac + Γabc = ∂c gab and Γabc = Γacb
holds for all indices a, b, c = 1 . . . n and especially their permutations, they imply identity (1.38).
The second identity (1.39) follows by the rule to lift an index.
Problem 1.8. A metric is called diagonal if gab = 0 for all a , b. Convince yourself that for a
diagonal metric and symmetric connection, the connection coefficients Γa bc are zero for a, b, c all
three different. Check the formulas
∂c gaa
Γa ac = for all a, c = 1 . . . n;
2gaa
∂a gbb
Γa bb =− for all a, b = 1 . . . n with a , b;
2gaa
where no summation is implied.
Given a point P on the pseudo-Riemannian manifold, as used in general relativity. We know

that there exists a point transformation such that gab (P) = ηab (P) at this one point P.
Problem 1.9. Explain, using linear algebra, how to get such a point transformation,—it is even a
linear one.
In the above situation, the cotangent and tangent base vectors satisfy
ω0 = e0 and ωi = −ei
e0 · e0 = 1 , e0 · ei = 0 , ei · ek = −δik
eα · eβ = ηαβ
for i, k = 1, 2, 3 and α, β = 0, 1, 2, 3.
Definition 1.16 (Tetrad). The base vectors satisfying
eβ = ηαβ for α, β = 0, 1, 2, 3
eα ·b
b
are called a tetrad. They are denoted by carotsb .

I call an orthonormal basis in three dimension a tetrad, too, and use the same notation.
Problem 1.10. Suppose the metric gab is diagonal and positive definite, as occurs for example for
spherical coordinates in R3 . Write down, in terms of gaa :
• the relations of the bases ea for the tangent space and ωa for the cotangent space;
• and the relations to the corresponding tetrad b ea .
Answer.
ea
ωa = gab eb =
gaa
ea √
ea = √
b = gaa ωa
gaa
20 F. Rothe
In the case of orthogonal coordinates, one obtains the same tetrad from the tangent basis as
the cotangent basis. Hence one may further simplify the notation. For example, take spherical
coordinates (r, θ, φ) in R3 . It is customary to denote the unit vector by the boldface name of the
coordinate with a carot put above it:
r = er = ωr
b
θ = r−1 eθ = r ωθ
b
φ = (r sin θ)−1 eφ = r sin θ ωφ
b
Problem 1.11. Assume at some point, the electrical field has the covariant components [Er , Eθ , Eφ ] =
[2, 3, 5]. Calculate the components for the orthonormal basis b θ, b
r, b φ.
Problem 1.12. Assume at some point, the velocity of a particle has the contravariant components
[vr , vθ , vφ ] = [2, 3, 5]. Calculate the components for the orthonormal basis b θ, b
r, b φ.
Definition 1.17 (Christoffel symbols). For any smooth Riemann manifold, the Christoffel symbols
are defined by
1
Mabc = (∂b gca + ∂c gab − ∂a gbc ) (1.40)
2
gad gad
( )
a
= (∂b gcd + ∂c gdb − ∂d gbc ) = Mdbc (1.41)
bc 2 2
Lemma 1.15. For any C 2 -smooth Riemann manifold
( )
a 1
Γabc = + T bca + T cba + T abc (1.42)
bc 2
( )
a 1
Γabc = + T abc − T ca b + T bca (1.43)
bc 2
1 a (a) 1
Γ bc + Γacb = + T bca + T cba (1.44)
2 bc 2
where T abc = Γabc − Γacb is the torsion tensor.
Proof. Since the torsion is a tensor, it is sufficient to prove
1
Γabc = Mabc + (T bca + T cba + T abc )
2
1
Γabc = Mabc + (T abc − T cab + T bca )
2
1 1
(Γabc + Γacb ) = Mabc + (T bca + T cba )
2 2
Since ∂c gab = Γbac + Γabc by equation (1.37), the definition (1.40) yields
1 1
Mabc = (∂b gca + ∂c gab − ∂a gbc ) = (Γacb + Γcab + Γbac + Γabc − Γcba − Γbca )
2 2
1 1
Mabc − Γabc = (Γacb + Γcab + Γbac − Γabc − Γcba − Γbca ) = (T acb + T cab + T bac )
2 2
1 1
Γabc − Mabc = (T bca + T cba + T abc ) = (T abc − T cab + T bca )
2 2
1
(Γabc + Γacb ) − 2Mabc = (T bca + T cba + T abc + T cba + T bca + T acb ) = T bca + T cba
2
which checks formulas (1.42) and (1.43) and (1.44).
1.5. Lie derivative Any vector field [va ] defines infinitesimal point transformations with
x0a = xa + εva xa = x0a − εva + O(ε2 ) (1.45)
∂x 0a
∂xc
= δac + ε∂c va = δca − ε∂a vc + O(ε2 ) (1.46)
∂xc ∂x0a
One may even define on the manifold a (local) flow ε 7→ x0 = Φ(ε, x) as the (local) solution of the
initial value problem
∂Φa (ε, x)
= va (x) (1.47)
∂ε
For any tensor field one gets the induced flow t0 = Φ(ε,
e t). Here the tensor t0 has been transformed,
1
but is still evaluated at the same coordinates x.
t0 (x0 ) = Product o f Jakobians · t(x) (1.48)
Φ(ε,
e t)(x) = t (x) = Product o f Jakobians · t(Φ(−ε, x))
0
(1.49)
Definition 1.18 (Lie derivative). The Lie derivative Lv t of any tensor t along the vector field v is
determined by the transformation of the tensor under the above flow. Either in terms of infinitesimal
point transformations (1.45), one defines
t(x0 ) − t0 (x0 ) = εLv t + O(ε2 )
(1.50)
t0 (x) − t(x) = −εLv t + O(ε2 )
or the flow x0 = Φ(ε, x) and its induced flow t0 = Φ(ε,

e t)
Φ(ε,
e t) − t
lim = Lv t (1.51)
ε→0 ε
Lemma 1.16 (Rule to get the Lie derivative). In the general case of a tensor T of type (q, r), the
components of the Lie derivative are denoted by
a1 ...aq
Lv T b1 ...br
They are obtained by the following rule: Each component is the sum of 1 + q + r terms. The first
term is the directional derivative
a ...a
v s ∂ s T 1 q b1 ...br
The remaining terms are all products of the tensor components with some partial derivative of
components from the vector field v along which the Lie derivative is taken. The terms corresponding
to the q upper indices are subtracted. For each term, the vector field has robbed a different one
of contravariant indices a1 . . . aq and replaced this index by a contravariant summation index s.
The partial derivative ∂ s is taken along the robbed index, of the vector component with the robbed
index.
The last r terms are added. Once more, for each term, a different one of covariant indices
b1 . . . br has been "robbed" and is replaced by a covariant summation index s. The partial
derivative of the vector field components v s is taken along the robbed index.
Problem 1.13. Convince yourself that for any scalar function S = S (x), the Lie derivative is the
directional derivative:
L v S = v a ∂a S (1.52)
1
The minus sign appears naturally, well motivated by throughout use of passive transformations.
22 F. Rothe
Answer. Since a scalar function transforms by the rule S 0 (x0 ) = S (x) under any point transforma-
tion, one gets from the first formula (1.5)
S (x0 ) − S (x) = S (x0 ) − S 0 (x0 ) = εLv S + O(ε2 )
and from any Taylor expansion
S (x0 ) = S (x) + (x0a − xa )∂a S + O(|x − x0 |2 ) = S (x) + εva ∂a S + O(ε2 )
Thus comparison give the formula (1.52). Hence the Lie derivative of a scalar is the directional
derivative.
Problem 1.14. Apply the rules for the Lie derivative, given by lemma 1.16, to get Lv tabc of tensor
t along the vector field v.
Answer.
Lv tabc = vd ∂d tabc − (∂a v s )t sbc − (∂b v s )tasc + (∂ s vc )tabs
Proof of validity.
t(x0 ) − t0 (x0 ) = t(x0 ) − t(x) − t0 (x0 ) − t(x)

The first term comes from partial derivatives:

tabc (x0 ) − tabc (x) = εv s ∂ s tabc + O(ε2 )
The second term comes from the transformation flow:
∂xd ∂xe ∂x0c f
t0 abc (x0 ) = t (x)
∂x ∂x ∂x
0a 0b f de

= δda − ε∂a vd δeb − ε∂b ve δcf + ε∂ f vc tdef (x) + O(ε2 )
= tabc (x) − ε∂a vd tdbc − ε∂b ve taec + ε∂ f vc tabf + O(ε2 )
t0 abc (x0 ) − tabc (x) = −ε∂a v s t sbc − ε∂b v s tasc + ε∂ s vc tabs + O(ε2 )
Subtraction of the second from the first yields
tabc (x0 ) − t0 abc (x0 ) = εv s ∂ s tabc + ε∂a v s t sbc + ε∂b v s tasc − ε∂ s vc tabs + O(ε2 )
The second formula is obtained since
tabc (x) − t0 abc (x) = tabc (x0 ) − t0 abc (x0 ) + O(ε2 )
The third formula (1.51) is obtained from the definition (1.48).
Corollary 5. The Lie derivative Lv of a tensor t is a tensor of the same type,—provided the vector
field v is transformed, too.
Proposition 1.3. The Lie derivative Lv applied to tensors of any type with the vector field v fixed,
obeys the Leibniz product rule.
Proof of the most simple case. For any contravariant vector [ha ] and covariant vector [ka ] holds
(Lv ha )ka + ha Lv (ka ) = (v s ∂ s ha ) − h s (∂ s va ) ka + ha (v s ∂ s ka ) + k s (∂a v s )k s

= (v s ∂ s ha )ka + ha (v s ∂ s ka ) = v s ∂ s (ha ka ) = Lv (ha ka )

2. Special Relativity
2.1. Relativity of time and length (ct, x, y, z) be the coordinates for any event, measured in the
inertial system S . Let (ct0 , x0 , y0 , z0 ) be the coordinates for the same event, measured in the inertial
system S 0 . Assume that the origins of systems S and S 0 are equal, and that the system S 0 moves
with velocity v in +x direction relative to system S . We want to determine the linear transformation
t0 = At + Bx
x0 = Dt + Ex (2.1)
y0 = y and z0 = z
In special relativity, it is customary to introduce the dimensionless parameters

v 1
β := and γ := p
c 1 − β2
Problem 2.1. From the relative velocity of the two systems, and from the postulate of constancy of
the velocity of light c, one gets the three assumptions
• x = vt if and only if x0 = 0;
• x = ct if and only if x0 = ct0 ;
• x = −ct if and only if x0 = −ct0 .
Use these three assumptions to determine the constants B, D and E in terms of A and the relativity
parameters β and γ.
Answer. • x = vt ⇔ x0 = 0 yields D = −Ev;

• x = ct ⇔ x0 = ct0 yields cA − D + c2 B − cE = 0 since
0 = ct0 − x0 = (cA − D)t + (cB − E)x = (cA − D + c2 B − cE)t
• x = −ct ⇔ x0 = −ct0 yields −cA − D + c2 B + cE = 0 since
0 = ct0 + x0 = (cA − D)t + (cB − E)x = (cA − D − c2 B + cE)t
Subtracting the last two relations yields 2cA − 2cE = 0, hence A = E. Adding them yields
−2D + 2c2 B = 0, hence D = c2 B. After eliminating B, D and E one gets
t0 = A t − Avx/c2
x0 = −Av t + A x
y0 = y and z0 = z
In 4d-matrix notation:
 0  
 ct   A −βA 0 0   ct
 

 x0   −βA A 0 0  

x 
  =  
 y0   0 0

1 0   y 
 0   
z 0 0 0 1 z
Definition 2.1 (Isochronic Lorentz-, proper Lorentz transformation). An isochronic Lorentz

transformation maps the cone of light rays which point into the future into itself. A proper Lorentz
transformation maps the cone of light rays which point into the future into itself, and has the
determinant +1.
24 F. Rothe
Problem 2.2. In many texts and lectures, it is customary to begin with the stronger postulate of
invariance of the Minkowski metric:
c2 t2 − x2 − y2 − z2 = c2 t02 − x02 − y02 − z02 (2.2)
instead of the weaker postulate of constancy of the velocity of light.
Use the invariance of the Minkowski metric and the fact that x = vt ⇔ x0 = 0, to determine
the constants A, B, D and E in the transformation (2.1), in terms of the relativity parameters β and
γ. There are four solutions, with different signs of these constants. Write down all four solutions.
What is the meaning of these solutions? Which one of these four solutions is a proper Lorentz
transformation.
Answer. Invariance of the Minkowski metric implies c2 t2 − x2 = c2 (At + Bx)2 − (Dt + Ex)2 . We
compare the coefficients of t2 , tx and x2 and get
c2 = c2 A2 − D2
0 = 2c2 AB − 2DE
−1 = c2 B2 − E 2
We still use that D = −Ev for a boost with relative velocity v in +x direction. Hence c2 AB = −vE 2
from the second relation. Squaring and plugging in the first and third relation yields
v2 E 4 = (c2 A2 )(c2 B2 ) = (c2 + v2 E 2 )(E 2 − 1)
0 = −c2 + (c2 − v2 )E 2
c
E=±√ = ±γ
c − v2
2
v2 c2
A2 = 1 + c−2 D2 = 1 + c−2 v2 E 2 = 1 + =
c2 − v2 c2 − v2
A = ±γ
First consider the solution with A = E = γ. One gets B = −vE 2 /(c2 A) = −vγ/c2 and the
transformation
t0 = γ t − γvx/c2
x0 = −γv t + γ x
y0 = y and z0 = z
which is the Lorentz boost with relative velocity +v.
The solution with A = −γ and E = +γ is
t0 = −γ t + γvx/c2
x0 = −γv t + γ x
y0 = y and z0 = z
which is the Lorentz boost followed by time reversal.
The solution with A = +γ and E = −γ is
t0 = γ t − γvx/c2
x0 = +γv t − γ x
y0 = y and z0 = z
which is the Lorentz boost followed by space reflection. For the forth solution
t0 = −γ t + γvx/c2
x0 = +γv t − γ x
y0 = y and z0 = z
both the time is reversed and the space reflected.
Here is another way to determine the constant A left open in problem 2.1. We begin with the
assumptions
(i) The proper Lorentz transformations are a group.
(ii) Among the Lorentz transformations are the boosts as well as the usual 3-dimension rotations.
(iii) The rotations without boost leave the time invariant.
(iv) Any isochronic Lorentz transformation that is diagonal leaves the time invariant.
Thus both the boost

 A −βA 0 0 
 
 −βA A 0 0 

L =  
 0 0 1 0 
0 0 0 1
and the rotation
 1 0 0 0
 

 0 −1 0 0 
S =  
 0 0 −1 0 

0 0 0 1
about the z-axis by 180◦ are Lorentz transformations.
Problem 2.3. Calculate the matrix products S LS and LS LS . Give a convincing argument that
A2 (1 − β2 ) = 1 holds in the physically meaningful case. Determine the sign of A occurring for a
proper Lorentz transformation. Once more, write down the matrix for a boost in +x direction.
Answer.
 A βA 0 0   A (1 − β2 )
 2
0 0 0
  

 βA A 0 0   0 A2
(1 − β2 ) 0 0 
S LS =   and LS LS =  
 0 0 1 0   0 0 1 0 

0 0 0 1 0 0 0 1
This last matrix is a proper Lorentz transformation, and is diagonal, too. By the above assumption
item (iv), it leaves the time invariant. Hence A2 (1 − β2 ) = 1 and A = ±γ. Only the case A = +γ is
an isochronic Lorentz transformation.
For a graphic representation of the usual boost
ct0 = γct − βγx
(2.3)
x0 = −βγct + γx
one uses the same scale for the lengths ct and x, and a right angle between the ct-axis and the
x-axis.
26 F. Rothe
Problem 2.4. Convince yourself that

• the same angle occurs between the ct- and ct0 -axis as between the x- and x0 -axis;
• the angle between the x0 - and the ct0 -axis is acute.
Draw the future light ray x = ct , t ≥ 0. Draw the hyperbola of all points which have the invariant
space-like squared distance −1 from the origin.
To illustrate the Lorentz contraction, we imagine a rigid tube of (comoving) length d with
mirrors on both ends. Now the tube is moving with velocity v relative to the S -system. Thus the
tube is at rest in the comoving S 0 system.
I calculate at first with coordinates from the S -system. The mirrors at the end of the tube move
along the lines
x = vt and x = vt + L
Let a light ray OA be sent from the mirror at the right end to the mirror at the left end, and reflected
into a light ray AB from the mirror at the left end to the mirror at the right end. The equations of
these light rays are x = ct and x = −ct + a.
Problem 2.5. Determine constant a. Determine the S -system coordinates of the reflection events
A and B.
Answer. Point A is the intersection of light ray x = ct with the world line of the right mirror
x = vt + L. Hence t = c−v
L
and the coordinates are
cL !
cL L L
(ct, x)A = , = ,
c−v c−v 1−β 1−β
and a = 2ctA = 2cL
c−v . Point B is the intersection of light ray x = a − ct with the world line of the left
mirror x = vt. Hence t = c+va
and the coordinates are
2c2 L v 2c2 L
! !
2L 2βL
(ct, x)B = 2 , · = ,
c − v2 c c2 − v2 1 − β2 1 − β2
Problem 2.6. Use the Lorentz transformation (2.3) to determine the S 0 -system coordinates of the
reflection events A and B.
Answer. From the equations (2.3) for the boost one gets the S 0 -coordinates of point A to be
L
ctA0 = γct − βγx = γ(1 − β) · = γL
1−β
L
x0A = −βγct + γx = γ(−β + 1) · = γL
1−β
and the S 0 -coordinates of point B to be
2L
ct0B = γct − βγx = γ(1 − β2 ) = 2γL
1 − β2
2L
x0B = −βγct + γx = γ(−β + β) =0
1 − β2
Problem 2.7. Convince yourself that
q
L= 1 − β2 d < d (2.4)
which is the famous Lorentz-FitzGerald contraction. There are no forces of any kind involved.
Answer. In the comoving system S 0 , the coordinates of points A and B are, by definition of the
proper distance d,
(ct0 , x0 )A = (d, d) and (ct0 , x0 )B = (2d, 0)
Comparing with the result of the previous problem 2.6 yields the relation (2.4) for the Lorentz
contraction.
To illustrate the time dilation, we imagine a rigid tube of (comoving) length d with mirrors on
both ends. Again the tube is moving with velocity v relative to the S -system, but turned by 90◦ .
Thus the tube is at rest in the comoving S 0 system, and lying on the y0 -axis.
I calculate at first with coordinates (ct0 , x0 , y0 ) from the comoving S 0 -system. The mirrors at
the end of the tube move along the lines
x0 = 0, y0 = 0 and x0 = 0, y0 = d
Let a light ray OA be sent from the mirror at the right end to the mirror at the left end, and reflected
into a light ray AB from the mirror at the left end to the mirror at the right end. The light goes on
to be reflected forth and back, and each reflection give a tick of this mirror-clock.
In the comoving system, the equations of these light rays are
OA : x0 = 0, y0 = ct0
AB : x0 = 0, y0 = 2d − ct0
The emission and reflection events have S 0 -coordinates

O = (0, 0, 0), A = (cd, 0, d), B = (2cd, 0, 0). Thus cd is the proper time interval between the ticks
of the mirror clock.
Problem 2.8. Determine the S -system coordinates (ct, x, y) of the reflection events A and B.
Determine the equations of the light rays OA and AB.
Answer. One needs the inverse of the boost (2.3). The S -coordinates of reflection event A are
ctA = γct0 + βγx0 = γcd

xA = βγct0 + γx0 = βγd
y A = y0 = d
The S -coordinates of reflection event B are
ctB = γct0 + βγx0 = 2γcd

xB = βγct0 + γx0 = 2βγd
yB = y0 = 0
We determine the equations of the light rays OA and AB. In the S -system, the equations of light
ray OA are
ct = γct0 + βγx0 = γct0

x = βγct0 + γx0 = γvt0
y = y0 = ct0
Remark. One has to keep some parameter along the light ray. This cannot be the proper time. Such
a parameter is also called an affine parameter. I have chosen t0 for that role.
28 F. Rothe
In the S -system, the equations of light ray AB are

ct = γct0 + βγx0 = γct0
x = βγct0 + γx0 = γvt0
y = y0 = 2d − ct0
Problem 2.9. Convince yourself that times of the reflection events A and B in the S -system and
γcd and 2γcd. Since γ > 1, we get longer time intervals between the ticks of the clock. The moving
clock is slowing down by the relativity parameter γ.
Answer. From the expression for ctA and ctB calculated in the previous problem 2.8, we found
already that γcd is the time interval between the ticks of the clock, as observed in the S -system.
In spite of the slowing down of the clock rate, the velocity of light remains the same. Using
the theorem of Pythagoras, we find which distance the light has travelled from event O to A to be
q q
x2A + y2A = β2 γ2 d2 + d2 = γd
So the moving clock slows down since the light has to travel a longer distance between the reflection
events.
2.2. Discovery of Aberration and Parallax In 1725, James Bradley, who held a position at
Oxford as astronomer and natural philosopher, began observations of γ Draconis at the home of
a friend, Samuel Molyneux. Using a telescope affixed to a chimney, so that it pointed nearly
vertically, he changed the position of the telescope very slightly, and very accurately measured its
change in position, using a screw and plumb-line; and over the course of a year or so, found that the
star did indeed vary in position during the course of the year by 40 arc-seconds, just like Polaris.
Figure 2.1. Aberration.
Stellar aberration produces an elliptical motion, circular at the Ecliptic poles, and linear at
the Ecliptic plane, whose semi-major axis equals a constant, regardless of the distance or angular
Figure 2.2. Parallax.
position of the star, equal to one radian multiplied by the ratio of the Earth’s orbital velocity, to the
speed of light. This ratio is about 4.9610−5 20.4800 . The first figure gives radian measure, the
second figure gives the angle in angular seconds—multiply by 3600 · 180/π. As the Earth moves,
the apparent positions of any star are shifted in the direction of the velocity of Earth’s motion.
Exactly like already had been known for Polaris, the change in motion was in the wrong
direction for stellar parallax. Parallax was really observed more than hundred years later, for the
first time by F. W. Bessel and W. Struve in 1838. They observed a shift of 0.29200 for the star 61
Cyni. This star has a distance of about 11 light years from the earth, one of the stars nearest to the
earth. Roughly spoken, parallax is about 1/100 of aberration, or less.
Parallax produces a shift reciprocal to each star’s distance in parsecs, and is used to measure
the distance of the stars nearest to the earth. Parallax produces an elliptical motion of the star,
circular at the Ecliptic poles, and linear at the Ecliptic plane, whose semi-major axis equals the
reciprocal of each star’s distance in parsecs, which is of course different for different stars. As the
Earth moves, the apparent positions are shifted in the direction of the radius from the earth to the
sun. Parallax occurs a quarter cycle of the circular motion later than of the aberration shift.
2.3. Aberration and the Doppler effect Here is the simple non-relativistic reasoning. Assume
raindrops fall with the velocity c vertically. A pedestrian moves with velocity v. In which angle
shall he observe the rain? Seen in a frame at rest, the rain gives a right triangle with hypothenuse
c and horizontal leg c cos α, vertical leg c sin α. In the moving frame, the right triangle has same
vertical leg c sin α and (non-relativistic!) horizontal leg c cos α + v. Hence
c sin α
tan α0 =
c cos α + v
Remark. Part of the information above is taken from website of Courtney Seligman, Professor of
Astronomy.
30 F. Rothe
Figure 2.3. Aberration of the rain—and the light.
http://cseligman.com/text/history/bradley.htm
A second source is dtv Atlas der Astronomie.
Figure 2.4. The one-dimensional Doppler effect.
2.4. The one-dimensional Doppler effect Let the observer O be stationary in the inertial S -
frame, with world-line CD. Let the light source or emitter E be stationary in the inertial S 0 -frame,
with world-line AB. Let the S 0 -frame move with uniform velocity v along the positive x-axis.
Suppose a light ray is sent in the negative x-direction. Let rays AC and BD be two successive crests
of the light wave. I calculate in the S -system. A figure is provided on page 30. Subtracting
(c + v)tA + k = ctA + xA = ctC + xC and
(c + v)tB + k = ctB + xB = ctD + xD = ctD + xC yields
(1 + β)∆tAB = ∆tCD
To obtain the frequency shift, one needs to calculate with the proper times, both for the emitter and
observer. Since the phase of the wave is invariant, and differs by 2π for successive crests of the
light wave
νE ∆τAB = νO ∆τCD = 2π
and taking into account the time dilation
q
∆τAB = 1 − β2 ∆tAB and ∆τCD = ∆tCD
For the ratio of the frequencies we obtain
s
1 − β2
p
νO ∆τAB ∆tAB 1−β
q
= = 1 − β2 = =
νE ∆τCD ∆tCD 1+β 1+β
For β > 0, we see that νO < νE . Thus a receding light source is redshifted, as expected.
2.5. Four-vectors and Minkowski metric The approach to special relativity we use here, goes
back to a lecture by Minkowski from 1908. I use a modern notation as in Hobson’s book [7]
General Relativity. I shall use space-time with 1 + 3 dimensions. Space-time vectors are either
denoted by their contravariant components with upper Greek indices and put into square brackets,
or by an invariant four-vector and written in bold face. To begin with, to denote any position in
space-time the four-vector x, or [xµ ] is used. Here x0 = ct and x1 = x, x2 = y, x3 = z are the
components of the three-dimensional space vector ~x. We assume t, x, y, z to be real. I anyways
make an endeavor that all components of a four-vector have the same dimension. Any two space-
time vectors x = (ct, x, y, z) and p = (s/c, p, q, r) have a Lorentz-invariant scalar product
xT η p = ts − xp − yq − zr (2.5)
With the matrix notation from linear algebra this means that
1 0 0 0 
 
0 −1 0 0 

η =  
0 0 −1 0 
0 0 0 −1
x · p := xT η p (2.6)
for the Lorentz-invariant scalar product. In spite of its surprising properties, the Lorentz-invariant
scalar product has still many properties in common with the ordinary scalar product. It is easy to
check that the scalar product is
• commutative: x · p = p · x
• non-degenerate: If x · p = 0 for all vectors p, then x = 0.
Definition 2.2 (Perpendicular subspace). For any subspace U ⊆ R4 , the Minkowski-perpendicular

space is defined as
U ⊥ = {y ∈ R4 : y · u = 0 for all u ∈ U}
32 F. Rothe
Definition 2.3 (Present, future, future light-cone)..
The future cone F uture = {(ct, x, y, z) : c2 t2 − x2 − y2 − z2 > 0 and t > 0}.

A vector such that x ∈ F uture or −x ∈ F uture is called time-like.
The future light-cone Light+ = {(ct, x, y, z) : c2 t2 − x2 − y2 − z2 = 0 and t ≥ 0}.

A vector x ∈ Light+ is called future light-like. A vector such that x ∈ Light+ or −x ∈ Light+
is called light-like.
The present Present = {(ct, x, y, z) : c2 t2 − x2 − y2 − z2 < 0}.

A vector x ∈ Present is called space-like.
The proper time along a time-like vector x = (ct, x, y, z) ∈ F uture or −x ∈ F uture is the Lorentz
invariant quantity
√ q
|x| = x · x = c2 t2 − x2 − y2 − z2
2.6. The relativistic Doppler effect Now I give the correct 1 + 3-dimensional relativistic
argument. The amplitude of a pure plane sinus wave is given by a space-time function is described
by plane waves cos(ωt − ~k · ~x), and their superposition. We consider any material- or light-wave but
disregard polarization. The vector ~k ∈ R3 is called wave vector and ω > 0 is the angular frequency.
The wave length λ is related to the wave vector by 2π ~
λ = |k|.
Concerning (passive) transformation by the Lorentz group, we know that (ct, ~x) is a four-vector
and the phase (ωt − ~k · ~x) is invariant. Hence [kµ ] = [ω/c, ~k] is a four-vector, too. Thus the phase
has turned out to be the invariant scalar product x · k of two four-vectors. So far, the arguments
hold for any type of waves.
We now restrict ourselves to light waves in vacuum. In that case, the wave equation implies
~k · ~k − ω2 /c2 = 0 and hence the four-vector (ω/c, ~k) is light-like. The wave length λ, the frequency
ν and the circular frequency ω are related by
2π ω 2πν
|~k| = = =
λ c c
Suppose a light ray lies in the xy-plane, and let θ be the angle between the direction of propagation
of light and the positive x-axis. In this setting, the wave four-vector is
2π
(ω/c, ~k) = (1, cos θ, sin θ, 0)
λ
Let the observer O be stationary in the inertial S -frame. Let the light source or emitter E be
stationary in the inertial S 0 -frame. Let the S 0 -frame move with uniform velocity v along the positive
x-axis. The corresponding quantities referring to a S 0 -frame are denoted by primes.
The relative velocity is β = tanh λL with the Lobashevskij parameter λL . We use the
abbreviation, common in relativity
v 1
β= and γ= p = cosh λL
c 1 − β2
The Lorentz transformation between the two frames is
ct0 = γct − βγx
x0 = −βγct + γx
y0 = y and z0 = z
or in matrix notation
 ct   γ
 0  
−βγ 0 0   ct
 

 x0   −βγ γ 0 0   x
 
  =  
 y0   0 0

1 0   y 
 0   
z 0 0 0 1 z
Exactly the same Lorentz transformation applies to (ω/c, ~k) since this is a the space-time vector,
too. Hence we conclude
 ω /c   γ −βγ 0 0   ω/c 
 0    
 k0   −βγ γ 0 0   k x 
  
 x  = 
 ky0   0
  
0 1 0   ky 
kz0 0 0 0 1 kz
So far, this transformation even holds for any type of waves. We specialize to light rays in the
xy-plane, and use polar coordinates to get
 1   γ −βγ 0 0   1 
    
2π  cos θ  2π  −βγ
 0  γ 0 0   cos θ 
θ  =   
  sin θ 
λ  0 
λ 
 
0  sin   0 0 1 0
0 0 0 0 1 0
Hence the Doppler shift of the wave length, respectively frequency, is
νE ν0 λ 1 − β cos θ
= = 0 = γ(1 − β cos θ) = p
νO ν λ 1 − β2
The transformation of the spatial components give the change of direction called aberration. We
obtain the formulas
cos θ − β
cos θ0 =
1 − β cos θ
γ−1 sin θ
sin θ0 =
1 − β cos θ
sin θ
tan θ0 =
γ cos θ − γβ
For the motion of the earth around the sun β ≈ 10−4 . Even motion through the milky way gives
relative velocity of the same magnitude. And even the recently measured motion of the milky way
against the average position of many spiral nebula gives β in the order of 2 · 10−3 1. Hence in
the case of astronomical aberration, the deflection angle θ0 − θ is small. After some calculation we
get
β sin θ + sin θ cos θ(γ−1 − 1)
sin(θ0 − θ) = sin θ0 cos θ − cos θ0 sin θ =
1 − β cos θ
= β sin θ + O(β2 )
For the approximation in first order of β, we have confirmed the common formula
θ0 = θ + β sin θ + O(β2 )
Remark. With a "wave" vector pointing from the observer, we need to do the substitutions:
θ → α + 180◦ and θ0 → α0 + 180◦ . Hence the different sign of β occurs in the intuitive argument
above.
34 F. Rothe
2.7. Four-velocity For a particle moving under arbitrary forces, one uses the world-line
x = xµ (τ) for components µ = 0, 1, 2, 3 and the proper time τ as parameter. For the world-
µ
line of a light ray xµ = xµ (p), one has to use any other arbitrary parameter p since the proper time
is constant along the light path. The four-velocity [uµ ] is defined as the derivative
dxµ
uµ =
dτ
whereas the Newtonian velocity is
d~x
~v =
dt
Because of time dilation dt = γdτ, they are related by 1
[uµ ] = γ[c, ~v] = γ[c, v x , vy , vz ]
Lemma 2.1. The four-velocity for a material particle is a time-like vector pointing to the future,
and has the Lorentz-invariant length |u| = c.
Reason.
u · u = γ2 (c2 − ~v2 ) = c2 γ2 (1 − β2 ) = c2
and u0 = γc ≥ c > 0.
2.8. The energy-momentum vector For any particle of rest-mass m, the energy-momentum
vector [pµ ] is defined by
dxµ
pµ = muµ = m
dτ
The space components of this four vector are
d~x
~p = γm
dt
This turns out to be the momentum occurring in Newton’s law 2
d~p
F~ = (Newton)
dt
What is the meaning of the component p0 ? To find out, one needs to use the relation the relation
W = F~ · ~x (work)
for the work W done by a force F~ during a motion over a distance ~x. We imagine that a particle
is moving along some path, under the influence of forces. These may be for example forces from
electric and magnetic fields, among other possibilities. The path is denoted as xµ = xµ (τ), with the
proper time τ as parameter. From the property of the four-velocity, checked in lemma 2.1, one gets
(p0 )2 − ~p 2 = p · p = m2 c2
As long as the forces do not charge the rest-mass,—which is true for electric forces, anyway,—we
have found the constant of motion m2 c2 . One differentiates by the parameter τ used on the particle
path and obtains
d p0 d~p
p0 = ~p ·
dτ dτ
1
To avoid upper indices 1, 2, 3 for the components of a common vector, I use instead the lower indices x, y, z.
2
I have to ask the reader to accept this statement at face-value.
The definition of the four-momentum implies
p0
~p = γm~v = ~v
c
One uses this identity and cancels p0
d p0 d~p
cp0 = p0~v ·
dτ dτ
d p0 d~p d~x d~p d~x d~p
c = ~v · = · = ·
dτ dτ dt dτ dτ dt
Now Newton’s law (Newton) is used to obtain
d p0 d~x ~
c = ·F
dτ dτ
~ This work is only used to
The right-hand side is the rate at which work is done by the force F.
increase or decrease the kinetic energy T of the particle. Hence
d p0 dT
c =
dτ dτ
Integration over the particle path yields cp0 = T + R with some constant of integration R. The
constant is determine by referring to a spot pµ (τ0 ) along the path, where the particle is at rest. For
the parameter τ0 we obtain p0 = mc and T = 0, and hence conclude that R = mc2 . Since R is
constant, one has obtained
cp0 = T + mc2
Following Einstein, the term mc2 is interpreted as the energy equivalent of the rest mass, and
E := T + mc2 as the total energy of the particle.
Theorem 2.1. The four-momentum of a particle moving in any field of forces is
pµ = [E/c, ~p ]
The total energy E and the rest mass m are related by

q
E = m2 c4 + c2 ~p 2 (2.7)
The momentum and the velocity by

E
~v ~p = (2.8)
c2
These formulas retains their meaning for m = 0, as does occur for a photon, or other massless
particle.
Problem 2.10. A particle is non-relativistic if c|~p | mc2 , or equivalently β 1. Use the power
expansion
√ x x2
1+x=1+ − ± ...
2 8
to get the approximation of a kinetic energy for a non-relativistic particle.
36 F. Rothe
Answer. From the relation (2.7), the total energy E = T + mc2 and the kinetic energy T are
r
p2
q
2
E = m c + c ~p = mc
2 4 2 2
1+ 2 2
mc
r  " 2
p4
2
#
p p
T = mc2  1 + 2 2 − 1 = mc2 . . .

 

− ±
mc 2m2 c2 8m4 c4
p2
T'
2m
as well known from the basics of classical mechanics.
Theorem 2.2. The four-momentum of a massive particle moving in a field of forces is, in terms of
its velocity
pµ = [E/c, ~p ] = [γmc, γm~v]
The total energy E and the total mass γm are related by
E = γmc2
These formulas do not retain their meaning for m = 0.
Figure 2.5. The principal setup of Compton’s experiment.
2.9. The Compton effect If light consists of photons, collisions between photons and particles
of matter should be possible. For photons and electrons, this quantum effect was discovered in
1922 by A.H. Compton from the university of St. Louis in Missouri. The figure on page 36 shows
the principle setup of Compton’s experiment. The monochromatic Mo-Kα -rays are scattered by
a graphite crystal. The wave-length spectrum dW/dλ of the scattered x-radiation is measured, by
means of a Bragg crystal, for different scattering angles θ. Additionally to photons of the incident
wavelength λ = 70 · 10−12 m = 70 pm, photons with a longer wavelength λ0 occurs. The shift
∆λ(θ) = λ0 (θ)−λ is an increasing function of the scattering angle θ. For example, ∆λ(90◦ ) = 2.4 pm.
Compton already gave the correct interpretation of his results as an elastic scattering of photons by
the quasi-free electrons inside the graphite.
Actually one finds that the scattered photons have two different wavelengths. One set of
photons gets their wavelength shifted, by a shift depending on the scattering angle. The shift comes
out as predicted below, by the scattering from electrons. A second set of photons has unshifted
wavelength. This set is due to scattering from the positively charged ions. The mechanism of
scattering is the same for both sets, except that for the ions the electron mass has to be replaced by
the ion mass, which is many thousand time larger. So the shift is tiny.
Figure 2.6. The kinematics of the collision of a photon with an electron, initially at rest.
In short-hand, the scattering process is written as γ + e → γ + e. The situation in the lab system
is shown in the figure on page 37. For a relativistic calculation, we assume that the four-momenta
of the incident photon and electron are
q = (~ω/c)[1, 1, 0, 0] and p = [mc, 0, 0, 0]
and for the scattered photon and electron 1 are
q0 = (~ω0 /c)[1, cos θ, sin θ, 0] and p0 = (E 0 /c)[1, β0 cos φ, β0 sin φ, 0]
and use the conservation of the total four-momentum. Thus one gets four equations
q + p = q0 + p0 (2.9)
from which we want to eliminate the less interesting quantity p0 , by means of the known rela-
tion (2.7), which hold both for the incident and the scattered electron. From here on, it is perfectly
possible to proceed just by elementary means. 2 I prefer to take advantage of the invariant scalar
products as follows:
(q + p)2 = (q0 + p0 )2
q2 + 2 q · p + p2 = q0 2 + 2 q0 · p0 + p0 2
We use now q2 = q0 2 = 0 and p2 = p0 2 = m2 c4 , and next eliminate the variable p0 :
q · p = q0 · p0 = q0 · (p + q − q0 ) = q0 · (p + q)
1
I prefer to use unwound angles from polar coordinates. The drawing on page 37 is an example with φ > 0 and θ < 0.
2
A up-to-date presentation along these lines is given for example in the textbook [6] p.114
38 F. Rothe
We put the assumed vectors into the last equation to get

~ωmc2 = ~ω0 (~ω + mc2 ) − ~2 ωω0 cos θ
λ0 ω ~ω(1 − cos θ) + mc2
= 0 =
λ ω mc2
~ωλ(1 − cos θ) h
λ0 − λ = = (1 − cos θ)
mc2 mc
The factor
h
= λC = 2.4263 pm
mc
is called the Compton wavelength of the electron. A photon of the wavelength λC has the energy
mc2 = 511000 eV, equivalent to the rest mass of the electron. Thus we have obtained the shift of
the wavelength to be
λ0 − λ = λC (1 − cos θ) (2.10)
Problem 2.11. Check that the kinetic energy of the scattered electron is
λC2
T 0 = mc2 (1 − cos θ)
λλ0
Take as an example the monochromatic Mo-Kα -rays from A.H. Compton’s experiment of 1922. The
incident wavelength is λ = 70 pm. Calculate the kinetic energy of the scattered electron.
Answer. From the 0-component of the four-momentum conservation (2.9), the kinetic energy of
the electron is T 0 = cq0 − cq0 0 and converted to wavelengths becomes
1 1
! λ λC
C
T = cq − cq = ~ω − ~ω = hc
0 0 00 0
− 0 = mc2 − 0
λ λ λ λ
The equation (2.10) for the shift of the wave length yields
λC2
T 0 = mc2 (1 − cos θ)
λλ0
With the data from Compton’s experiment of 1922, one obtains T 0 = (1 − cos θ)635.9 eV.
Problem 2.12. Determine the direction of the scattered electron, from the y-component of the four-
momentum conservation (2.9).
(a) Write down the y-component of the four-momentum conservation.
(b) Get sin φ in terms of sin θ, and the wave length λ0 and relativity parameter γ0 for the scattered
electron.
(c) From the kinetic energy T 0 obtained in the previous problem see that
λC2
γ0 − 1 = (1 − cos θ)
λλ0
Use this expression to simply. After a bid lengthy calculation one gets sin φ = −A cos(θ/2)
with a factor A < 1 which is approaching one for non-relativistic electron.
(d) Express the result sin φ = − cos(θ/2), to be expected in the experiments, in simple geometric
terms.
Answer. (a) The y-components ~q 0 y = (~ω0 /c) sin θ and ~p 0 y = mcγ0 β0 sin φ add up to zero. One
gets cp0 sin φ + ~ω0 sin θ = 0.
(b)
~ω0 h h λC
sin φ = − sin θ = − 0 0 sin θ = − sin θ = − p sin θ
cp0 λp mcλ β γ
0 0 0
λ0 (γ02 − 1)
(c)
√
λC λC λλ0 sin θ
sin φ = − p sin θ = − p √
λ0 γ 0 + 1 γ 0 − 1 λ0 γ0 + 1 λC (1 − cos θ)
p
r s
λ 2
=− cos(θ/2)
λ 0 γ +1
0
(d) The non-relativistic electron is scattered along the ray opposite to the angle bisector of the
incident and scattered photon. The deviation from this rule is of first order in v/c.
Remark. We see that the kinetic energy of the scattered electron increases when using harder x-
rays. The scattered electrons have been visible by means of a Wilson cloud chamber, for the
first time by Compton in 1925. Moreover Bothe and Geiger in 1925, have used an electronic
coincidence circuit connecting two Geiger counters, and have demonstrated the scattered electron
and x-ray to appear simultaneously. Decreasing the intensity of the incoming x-ray, one obtains
in the counters no continuous intensity, but discrete pulses. Similar to observations with a photon
multiplier, these observations cannot be explained by considering light as a wave. Light acts,
in these experiments, as a flow of discrete particles with all mechanical properties of a particle.
During the same experiment, in the interaction with the Bragg crystal, diffraction is used for the
measurement of the wave length. Here the same light acts as a wave!
Part of the information above is taken from the script Wellen und Quanten, by K.R. Schubert
(2004/5), professor of physics.
http://hep.phy.tu-dresden/~schubert/physik3.html
Problem 2.13. Rotate the lab-system by the angle φ such that the scattered electron moves along
the positive x-axis. Transform the components of the four-momenta for the incoming and scattered
photon and electron to this system. Convince yourself that
sin(φ − θ) ω λC
= 0 =1+ (1 − cos θ)
sin φ ω λ
Conclude that for an experiment with λ λC , one gets indeed φ ≈ θ/2 − 90◦ .
Answer. One applies the rotation matrix
 1 0 0 0
 

 0 cos φ sin φ 0 
R = 
 0 − sin φ cos φ

0 

0 0 0 1
to get transformed components of four-momenta. For the incident photon and electron
(~ω/c)R[1, 1, 0, 0]T = (~ω/c)[1, cos φ, − sin φ, 0]T and
(mc)R[1, 0, 0, 0]T = (mc)[1, 0, 0, 0]T
40 F. Rothe
and for the scattered photon and electron
(~ω0 /c)R[1, cos θ, sin θ, 0]T = (~ω0 /c)[1, cos(θ − φ), sin(θ − φ), 0]T and
(E 0 /c)R[1, β0 cos φ, β0 sin φ, 0]T = (E 0 /c)[1, β0 , 0, 0]T
Only the photon has a momentum along the new y-axis. By conservation of momentum one gets
−~ω sin φ = ~ω0 sin(θ − φ). Together with the formula for the shift of the wave length
sin(φ − θ) ω λC
= 0 =1+ (1 − cos θ)
sin φ ω λ
For an experiment with λ λC , one gets indeed sin(φ − θ) ≈ sin φ. Hence φ − θ ≈ 180◦ − φ , φ ≈
θ/2 + 90◦ and |φ| ≈ 90◦ − |θ|/2.
2.10. Collision of particles
Problem 2.14. Inverse Compton scattering 1 occurs whenever a photon scatters off a particle
moving with almost speed of light. Suppose that a particle with rest mass M and total energy E
collides head on with a photon of energy Eγ . For simplicity, assume that the scattered particles
move back along the same axis.
(i) Convince yourself that the conservation of the four-momentum implies

Eγ (E + cp) = Eγ0 (E − cp) + 2Eγ Eγ0
(ii) Solve for Eγ0 and use that Mc2 E and hence E + cp ≈ 2E holds with any accuracy needed.
Show that under this assumption the scattered photon has the energy
4E 2 · Eγ
Eγ0 =
M 2 c4 + 4E · Eγ
(iii) Such a situation may occur for example for a collision of an ultra-relativistic cosmic ray with
a photon from the cosmic background radiation. How much energy can such a cosmic ray
proton transfer to a microwave background photon?
Energies up to E = 1020 eV may occur in cosmic rays. The proton has rest energy
Mc2 = 938.272 MeV. A typical energy of photon from the cosmic background radiation
is Eγ = 2.7◦ K · 8.617 · 10−5 eV/K.
Answer. The four-momenta of the incident photon and proton are
cq = Eγ [1, 1, 0, 0] and cp = [E, −cp, 0, 0]
the four-momenta for the scattered photon and proton are
cq0 = Eγ0 [1, −1, 0, 0] and cp0 = [E 0 , −cp0 , 0, 0]
From the conservation of four-momentum (2.9) we eliminate the less interesting quantity p0 . I
prefer to take advantage of the invariant scalar products and get, with the same calculation as above
q · p = q0 · p0 = q0 · (p + q − q0 ) = q0 · (p + q)
1
See Hobson [7] p.132 problem 5.14
We put the assumed vectors into the last equation and obtain
Eγ (E + cp) = Eγ0 (E − cp) + 2Eγ Eγ0
Eγ (E + cp) Eγ (E + cp)2 Eγ (E + cp)2
Eγ0 = = 2 = 2 4
E − cp + 2Eγ E − c p + 2Eγ (E + cp) M c + 2Eγ (E + cp)
2 2
We use that Mc2 E and hence E + cp ≈ 2E holds with any accuracy needed to simplify
4Eγ · E 2
Eγ0 =
M 2 c4 + 4Eγ · E
Definition 2.4 (Center of mass system). For any set of colliding particles, the center of mass
system or CMS, is the inertial system for which the total momentum of the incoming particles is
zero. Thus their four-momentum is [Mc, 0, 0, 0] where M is the total rest mass.
Problem 2.15. Consider the Compton scattering process in the CMS system, with incoming photon
moving in the positive x-direction. Write down the four-momenta of the incident photon and
electron, and the possible four-momenta for the scattered photon and electron.
Answer. The four-momenta of the incident photon and electron are

q = (~ω/c)[1, 1, 0, 0] and p = (γmc)[1, −β, 0, 0]
with ~ω = mc2 βγ, and the possible four-momenta for the scattered photon and electron are
q0 = (~ω/c)[1, cos θ, sin θ, 0] and p0 = (γmc)[1, −β cos θ, −β sin θ, 0]
with any angle θ.
Problem 2.16. The Bevatron at Berkeley was built with the idea of producing antiprotons, 1 by the
reaction p+ p → p+ p+ p+ p. It was designed to give about 6.2 GeV kinetic energy to the protons it
accelerates. Thus one intended to let a high-energy proton strike a proton at rest;—at that point of
history, it was not yet possible to send two proton rays against each other. By known conservation
laws, it was clear that only an additional pair of proton and antiproton could be expected. Find
the threshold for the energy E, and kinetic energy T , of the incoming proton at which this reaction
becomes possible.
(i) Calculate the total energy-momentum four-vector ptot of the incoming particles in the lab
system.
(ii) At the threshold, the created four particles cannot have any additional kinetic energy. Thus
they need to be at rest in the CMS-system. Calculate the total energy-momentum four-vector
p0tot of the scattered particles in the CMS-system.
(iii) From the conservation of the four-momentum and Lorentz invariance, we know that
ptot · ptot = p0tot · p0tot
(iv) Determine E and c|~p | from the two equations

(E + Mc2 )2 − c2 ~p 2 = (4Mc2 )2
E 2 − c2 ~p 2 = M 2 c4
1
See also R. Feynman [10], vo. II, chpt. 25 and D. Griffiths [5], p.106
42 F. Rothe
(v) Determine the kinetic energy T and the speed of the incoming protons at threshold.
Answer. (i) ptot = [(E + Mc2 )/c, p, 0, 0] is the total four-momentum for the incoming two protons,
in the lab system.
(ii) p0tot = [4Mc, 0, 0, 0] is the total energy-momentum four-vector of the four scattered particles,
in the CMS system.
(iii) The Lorentz invariant of the total four-momenta are

ptot · ptot = (E/c + Mc)2 − ~p 2
p0tot · p0tot = 16M 2 c2
(iv) From the conservation of the four-momentum and Lorentz invariance, we get the first equa-
tion. The second one refers to the incoming proton from the ray.
(E + Mc2 )2 − c2 ~p 2 = (4Mc2 )2
E 2 − c2 ~p 2 = M 2 c4
√
One gets E = 7Mc2 and c|~p | = 48Mc2 .
(v) T = E − Mc2 = 6Mc2 . Indeed the antiprotons were discovered when√ the machine reached
about 6000 MeV. At threshold, the speed of the proton is β = ( 48)/7 times the speed of
light.
+++++++++++++++++++++++
2.11. The motion of particles
Definition 2.5 (Four-force).

~ γ F~
f µ = (γ/c)~v · F,
h i
is called the four-force.
Proposition 2.1. We imagine that a particle is moving along some path, under the influence of
forces of whatever origin. Newton’s law
d~p
F~ = (Newton)
dt
together with the relation
W = F~ · ~x (work)
for the work W done by a force F~ during a motion over a distance ~x are equivalent to the equation
of motion
d pµ
fµ = (2.11)
dτ
Under the assumptions made above, the four-force is a four-vector.
Reason. The relation (work) gives the rate at which work is done on the particle, which is equal to
the rate of energy increase for the particle. Hence
dE d~x
= F~ ·
dt dt
Now the definition of the four-force and Newton’s law (Newton) yield
d ~x ~ ~
" # " #
µ dE d~p
f = γ · F, γ F = γ ,γ
d ct d ct d t
We convert the derivatives by coordinate time back to derivatives by the proper time, and finally
use the definition of the four-momentum. Hence
d pµ
" #
µ dE d~p
f = , =
d cτ d τ dτ
The converse statement is checked in the same way.
Remark. The conservation of the four-momentum both for the free motion, and in collision
processes is well confirmed experimentally, and justified theoretically. Already for these reason, it
is sound to formulate Newton’s equation of motion together with the rate of work, and in terms of
a differential equation for the four-momentum pµ . Moreover, the actual studies about the motion
of electrons and other particles under the influence of electromagnetic forces confirm the equation
of motion
d~p
= q[E~ + ~v × B]
~ (Lorentz)
dt
for a charged particle with charge q, both in the non-relativistic as well the relativistic regions.
Based on these facts, the occurrence of the acceleration in the traditional form of Newton’s law
seems for me to be not more than an artifact. The complicated distinguishment of the parallel and
transverse mass in the corrected form of the Newton’s law is a further hint supporting that point of
view.
Definition 2.6 (Four-accelaration). For any particle with four-velocity [uµ ], and moving along
any path xµ = xµ (τ), with the proper time τ as parameter, the four-accelaration vector [aµ ] is
defined by
d uµ
aµ =
dτ
Lemma 2.2. In the Minkowski metric, the four-velocity and the four-acceleration are perpendicu-
lar. For a material particle, the four-acceleration is a space-like vector.
Reason. We know from lemma 2.1 that the four-velocity has length c. Differentiating by the proper
time yields
u · u = c2
d( u · u )
0= = 2u · a
dτ
For a material particle the four-velocity is a time-like vector. As shown in lemma 3.3, any vector
perpendicular to a time-like vector is space-like.
Proposition 2.2. We assume the equations of motion (2.11) to be valid. Equivalent are:
(i) The equations of motion have the form f µ = maµ ;
(ii) the four-force [ f µ ] does not change the rest mass m;
(iii) the four-force is perpendicular to the four-velocity: f · u = f µ uµ = 0.

44 F. Rothe
Proof. The equations of motion (2.11) imply

d muµ d m µ
fµ = = u + maµ (2.12)
dτ dτ
dm µ dm
( f µ − maµ )uµ = u uµ = c2 (2.13)
dτ dτ
Assume item (i) holds. We conclude 0 = ( f µ − maµ )uµ = c2 ddτm and thus item (ii) holds. Too, item
(i) and lemma 2.2 yield f µ uµ = maµ uµ = 0 and thus item (iii) holds.
Conversely assume item (iii) to hold. Hence lemma 2.2 and equation (2.13) imply 0 = f µ uµ =
( f − maµ )uµ = c2 ddτm and thus item (ii) holds.
µ
muµ
Assume now item (ii) to hold. Hence we get the equation of motion in the form f µ = d dτ =
dm µ µ µ
dτ u + ma = ma of item (i).
Lemma 2.3. In terms of the velocity ~v, acceleration ~a and relativity parameter γ, the four-
acceleration has the components
dγ dγ
a0 = cγ , ai = γ2~ai + γ ~vi for i = 1, 2, 3
dt dt
3. The Lorentz Group

3.1. Different aging of twins
Problem 3.1 (Twin paradox or travelling keeps young). For this problem, I put c = 1 and
consider only one space dimension. We do a series of simplifying calculations, and prove the
conjecture at first in 1 + 1 dimensions. Confirm for any two time-like vectors x = (t, x) ∈ F uture
and p = (E, p) ∈ F uture, the reversed triangle inequality
|x + p| ≥ |x| + |p|
holds—even with proper inequality > unless they are proportional.
Answer. All the following inequalities are equivalent:
|x| + |p| ≤ |x + p|
|x| + |p| + 2|x||p| ≤ |x + p|2
2 2
q
t2 − x2 + E 2 − p2 + 2 (t2 − x2 )(E 2 − p2 ) ≤ (t + E)2 − (x + p)2
q
2 (t2 − x2 )(E 2 − p2 ) ≤ 2tE − 2xp
(t2 − x2 )(E 2 − p2 ) ≤ tE − xp 2

−t2 p2 − x2 E 2 ≤ −2tExp
0 ≤ (tp − xE)2
Problem 3.2. At least the calculation in problem 3.1 above is a guide, as I go now back to 1 + 3
dimensions. Check that any two vectors x, p ∈ F uture ∪ Light+ have Minkowski scalar product
x · p ≥ 0.
Answer. Let x = (t, x, y, z) and p = (E, p, q, r) be the two vectors in F uture ∪ Light+ . We calculate
p Minkowski scalar product, we use the common Cauchy-Schwarz inequality xp + yq + zr ≤
the
(x2p+ y2 + z2 )(p2 + q2 + r2 ). p
Since the vectors are assumed to be in the future or light cone,
0 ≤ x + y + z ≤ t and 0 ≤ p2 + q2 + r2 ≤ E. We obtain as claimed
2 2 2
q
− x · p = −tE + xp + yq + zr ≤ (x2 + y2 + z2 )(p2 + q2 + r2 ) − tE ≤ 0 (3.1)
Problem 3.3. Assume that two nonzero vectors x, p ∈ F uture ∪ Light+ \ {0} have Minkowski
scalar product x · p = 0. Check that x = αp ∈ Light+ holds with α > 0. Thus they are light-like
and linearly dependent.
Answer. The assumption that the vectors are nonzero implies t > 0, E > 0 and tE > 0. Hence
(x, y, z) , 0 and (p, q, r) , 0. We get equality everywhere in formula (3.1). Hence the vectors
(x, y, z) and (p, q, r) are proportional, as can be seen from the following calculation:
(xp + qy + rz)2 = (x2 + y2 + z2 )(p2 + q2 + r2 )

2xpyq + 2xprz + 2qyrz = (x2 q2 + y2 p2 ) + (x2 r2 + z2 p2 ) + (y2 q2 + z2 q2 )
(xq − yp)2 + (xr − zp)2 + (yr − zq)2 = 0
xq = yp , xr = zp , yr = zq
46 F. Rothe
Since (x, y, z) , 0 and (p, q, r) , 0, we conclude that x = αp, y = αq, z = αr with some α , 0.
Furthermore,
q q q
(x + y + z )(p + q + r ) = tE ,
2 2 2 2 2 2 x + y + z ≤ t and
2 2 2 p2 + q2 + r2 ≤ E
imply x2 + y2 + z2 = t and p2 + q2 + r2 = E. Hence x, p ∈ Light+ and t = αE with α > 0.
p p
Together, we have confirmed x = αp ∈ Light+ as conjectured.

Problem 3.4. Assume the sum of any two light-like vectors is light-like. Prove that they are linearly
dependent.
Answer. We assume x, p ∈ Light and x + p ∈ Light. Calculate the Minkowski scalar products
0 = x + p · x + p = x · x + p · p + 2x · p = 2x · p
Hence the two vectors are orthogonal. By the last problem, they are linearly dependent.
Lemma 3.1 (Reversed inequalities in the future cone). For any two future time-like or future
light-like vectors x, p ∈ F uture ∪ Light+ we get the inequalities
0 ≤ ( x · x )( p · p ) ≤ ( x · p )2 (3.2)
|x||p| ≤ x · p (3.3)
|x| + |p| ≤ |x + p| (3.4)
Equality occurs in any one of these formulas if and only if the vectors x and p are linearly
dependent. Note that these inequalities in the future cone are the reversed versions of the
corresponding inequalities from Euclidean geometry.
Proof. Take vectors x = (t, x, y, z) and p = (E, p, q, r) ∈ F uture ∪ Light+ and p , 0.
( x · x )( p · p ) − ( x · p )2
= (t2 − x2 − y2 − z2 )(E 2 − p2 − q2 − r2 ) − (tE − xp − yq − zr)2
" q #2
≤ (t − x − y − z )(E − p − q − r ) − tE − (x + y + z )(p + q + r )
2 2 2 2 2 2 2 2 2 2 2 2 2 2
= t2 E 2 + (x2 + y2 + z2 )(p2 + q2 + r2 ) − t2 (p2 + q2 + r2 ) − (x2 + y2 + z2 )E 2

q
− t2 E 2 − (x2 + y2 + z2 )(p2 + q2 + r2 ) + 2tE (x2 + y2 + z2 )(p2 + q2 + r2 )
q q
= −t2 (p2 + q2 + r2 ) − E 2 (x2 + y2 + z2 ) + 2 t2 (p2 + q2 + r2 ) E 2 (x2 + y2 + z2 )
"q q #2
=− t2 (p2 + q2 + r2 ) − E 2 (x2 + y2 + z2 ) ≤ 0
We have confirmed formula (3.2). To check under which assumptions equality occurs, assume
formula (3.2) holds
p with equality. Equality occurs in the common Cauchy-Schwarz inequality
xp + yq + zr = (x2 + yp 2 + z2 )(p2 + q2 + r 2 ), hence x = αp, y = αq, z = αr with some factor
α. The last line implies t2 (p2 + q2 + r2 ) = E 2 (x2 + y2 + z2 ) and hence t = αE and α ≥ 0.

p
Together, we see that x = αp.

Formula (3.3) follows by taking roots since x · p ≥ 0 by Problem 3.2. Finally we check the
reversed triangle inequality (3.4)
−|x + p|2 + (|x| + |p|)2 = −|x + p|2 + |x|2 + |p|2 + 2|x||p|
= x + p · x + p − x · x − p · p + 2|x||p|
= 2 x · p + 2|x||p| ≤ 0

Lemma 3.2 (Convexity)..

• The convex combination of any two independent future light-like vectors is in the future cone.
• The future cone F uture is convex.
• The union F uture ∪ Light+ of the future cone and the future light cone is convex.
Proof. For given linearly independent vectors x, p ∈ Light+ the inequality (3.4) holds in the strict
form. Hence with any 0 < α < 1
0 = |αx| + |(1 − α)p| < |αx + (1 − α)p|
and αx + (1 − α)p ∈ F uture. The other claims follow similarly.
Proposition 3.1 (Twin paradox). For any two future time-like or future light-like vectors x, p ∈
F uture ∪ Light+ which are not linearly dependent holds the reversed strict triangle inequality 1
|x + p| > |x| + |p|
Lemma 3.3 (Orthogonal complement of a one-dimension space). Let e , 0 be any vector. The
orthogonal complement e⊥ is always three-dimensional.
If e , 0 is time-like, the orthogonal complement e⊥ consists only of space-like vectors.
If e , 0 is light-like, the orthogonal complement e⊥ is spanned by e itself and two space-like

vectors.
If e , 0 is space-like, the orthogonal complement e⊥ contains both space-like and time-like

vectors. There is a basis of two light-like, and a space-like vector.
+??????????++++++++++++++++++++++++++++++ We use the abbreviation

Imagine that a radar signal is sent from a station on earth to the moon and the reflected signal
is received. The duration between the sending time t1 and the receiving time t2 is measured. In
which sense can we conclude the distance from the station on earth to the reflector on the moon is
c(t2 − t1 )/2?
I assume for simplicity that both the station on earth and the reflector on the moon move without
acceleration, and do not discuss the effects of gravity. So I deal with the following simplified
picture: In space-time, there is a sender active at space-time b s, a reflector on the moon hit at space-
time mb, and a receiver getting the signal back at space-time b r. The vectors m b −b r−m
s and b b along
the radar signal are light-like. The proper time |br −bs| is measured.
Question. What is the distance between the world line br and the reflector m
sb b? How can the spatial
distance be specified precisely?
Answer. Indeed we can say the distance to be moon seen in the frame with sender and receiver on
the same spot is |b
r −bs|/2. Let
s +br
h=
b b
2
be the midpoint of the segment b r from sender to receiver. We get the following three statements
sb
specifying a spatial distance:
b −b
(a) The vectors m r −b
h and b s are perpendicular:
hm h,b
b −b si = 0
r −b
1
That is why traveling keeps young!
48 F. Rothe
(b) The space-like distance from the midpoint to the moon is greater than the distance from any
other point on the word line sender-receiver to the moon:
hm h, m
b −b b −b
hi ≥ hm p, m
b−b b−b
pi for any point b
p on the word line b
sb
r.
(c) The space-like distance from the midpoint to the moon is the (negative) half the reflection time:
1
hm h, m
b −b h i = − hb
b −b r, b
s −b s −b
ri
4

3.2. The Lorentz transformations
Definition 3.1. A Lorentz-transformation is a linear mapping from space-time to space-time that

leaves the Minkowski metric (2.5) invariant.
x,b
Let A be the matrix for the Lorentz transformation. Any two vectors b y and their images
x, Ab
Ab y satisfy
x , Ab
h Ab yi = x · y (3.5)
In other words, this means that the Lorentz transformations are the isometries for the Minkowski
metric. We convert requirement to a matrix equation characterizing the Lorentz transformations.
x,b
For all vectors b y
b y = (Ab
xT AT GAb y=b
x)T G Ab xT Gb
y
yields the matrix equation
AT G A = G (3.6)
Proposition 3.2. The set of all Lorentz transformations is a group.
Definition 3.2 (orthochronous and time-reversing transformations)..

• A Lorentz transformation which maps the future cone to itself is called orthochronous.
• A Lorentz transformation which maps the future cone to the past cone is called antichronous
or time-reversing.
Problem 3.5. Give a reason why any Lorentz transformation either maps future and past cones to
themselves, or exchanges them.
Reason. The invariance of the Minkowski metric implies

A(F uture) ⊆ F uture ∪ (−F uture)
Since F uture is a convex set, the image A(F uture) is convex, too. There exist linear convex
combinations of a time-like vector in the future and the past which lie in the present. Hence any
convex subset of the union F uture ∪ −F uture is either a subset of F uture or a subset of −F uture.
Since a Lorentz transformation is a bijection, and the inverse is Lorentz, too, we conclude that either
A(F uture) = F uture or A(F uture) = (−F uture). These two cases lead to the orthochronous and
time-reversing transformations.
From the matrix equation (3.6), we see that (det A)2 = 1 and hence det A = +1 or det A = −1.
Definition 3.3. An orthochronous Lorentz transformation with determinant one is called proper.
The subgroup of proper Lorentz transformations is the proper Lorentz group.
Problem 3.6. Fix an arbitrary vector b

a. Show that the transformation
x + x · xb
−b a
x0 =
b
N (3.7)
N = 1 − 2x · a + a · a x · x
is its own inverse. It leaves the light-cone invariant. This is a nonlinear conformal transformation.
Problem 3.7. Prove that any linear conformal transformation x 7→ Ax such that x · x = h Ab
x , Ab
xi
for all vectors b
x is a Lorentz transformation.
Lemma 3.4. If a subspace U ⊂ R3 is invariant, then the orthogonal complement is Lorentz

invariant, too. Especially, the orthogonal complement of an eigenvector is invariant.
Proof. Assume the subspace U ⊂ R3 is invariant for Lorentz transformation A. This means by
definition A(U) ⊆ U, and hence A(U) ⊆ U ⊆ A−1 A(U) ⊂ A−1 (U). Since the Lorentz transformation
is invertible, these spaces have equal finite dimension and hence A(U) = U = A−1 (U).
Given is any vector bx ∈ U ⊥ . Hence x · u = 0 for all b
u ∈ U. The Lorentz invariance implies
x , Ab
h Ab ui = x · u = 0
u ∈ U. Hence
for all b
x ∈ A(U)⊥ = U ⊥
Ab
as to be shown.
Lemma 3.5. Assume a Lorentz transformation A has two linearly independent light-like eigenvec-
tors b b ∈ Light+ . Then there eigenvalues ν, µ have product νµ = 1. The vector b
l, m b orthogonal to
b = (det A)b
their span is a space-like eigenvector, indeed Ab b.
Proof. Let U = span(b b) be the linear span of the two light-like eigenvectors. The orthogonal
l, m
complement space U ⊥ is spanned by one space-like vector b b. Since the orthogonal complement of
an invariant space is invariant, this is an eigenvector.
l = νb
Ab l , Ab
m = µb b = βb
m and Ab b
The Lorentz invariance implies
l , Ab
h Ab mi = l · m
νµ l · m = l · m
Since the two light like vectors are independent, we know by Lemma 3.1, inequality (3.3) that
l · m < 0, and hence νµ = 1 and ν = µ−1 . Since det A = νµβ, we get β = det A.
Question. Explain these facts geometrically in Klein’s model.
Lemma 3.6. Assume an orthochronous Lorentz transformation A has a space-like eigenvector.

Then there exist
(i) either two linearly independent light-like eigenvectors;
(ii) or a time-like eigenvector.

50 F. Rothe
If both (i) and (ii) happen together, the transformation is the identity A = I, or det A = −1, and
these three vectors lie in a plane and the transformation A is a reflection across this plane.
Proof. Let b b ∈ Present be the eigenvector and Ab b = βbb. The orthogonal complement b b⊥ is
b ∈ Light+ .
invariant, too. It contains two different linearly independent light-like eigenvectors l, m
b
⊥
Now the invariance of b together with the Lorentz invariance leaves us with two possibilities:
b
(i) They are both eigenvectors: Ab l = νb
l and Ab m = µb
m
l = νb
(ii) They are switched: Ab m = µb
m and Ab l
Since the Lorentz transformation A is orthochronous, in both cases ν, µ > 0. Here are the further
conclusions for case (ii). Indeed
√ √ √ √ √
A( µ b
l + νmb) = νµ ( µ b l + νm b)
is an eigenvector, which is a convex combination of vectors in the future light-cone, and hence by
Problem ?? it is time-like.
Now we assume that both (i) and (ii) happen together. Hence ν = µ > 0 and, by Lemma 3.5,
νµ = 1, which implies ν = µ = 1. By Lemma 3.5, the orthogonal complement U ⊥ = span(b b)⊥
l, m
is a space-like eigenvector satisfying Ab = (det A)b. Altogether, we see that either det A = 1 and
b b
A = I, or det A = −1 and the transformation A is a reflection across the plane U = span(b b).
l, m
Theorem 3.1 (Structure of a orthochronous Lorentz transformation). Each proper Lorentz
transformation A has an eigenvector with eigenvalue 1. Except for the identity, this eigenvector b
e
is unique. There are three mutually exclusive cases:
Rotation If the eigenvector b e is time-like, the orthogonal complement is an invariant plane
spanned by two space-like vectors. The restriction of A to this plane is a rotation by some
angle α. There exists a proper Lorentz transformation S such that
 cos α sin α 0
 
S −1 AS = − sin α cos α 0
 
0 0 1
 
and hence TrA = 1 + 2 cos α.

Lorentz boost If the eigenvector b e is space-like, the orthogonal complement has an orthonormal
basis of a space-like and a time-like unit vector. The restriction of A to this plane is a Lorentz
boost with some Lobachevskij parameter λ. There exists a proper Lorentz transformation S
such that
cosh λ sinh λ 
 
0
S −1 AS =  0 det A 0 
 
sinh λ 0 cosh λ
 
and hence TrA = det A + 2 cosh λ.

Rotation about a light ray If the eigenvector b e is light-like, the orthogonal complement has
an orthonormal basis of a space-like unit vector and a light-like vector. The restriction
of A to this plane is sheering map. The transformation A has the eigenvalue one with
geometric multiplicity one and algebraic multiplicity three. There exists a proper Lorentz
transformation S such that
1 − α2 −α α2 
 2 
2 
S −1 AS =  α

1 −α  (3.8)
 α2 α2 
−α 1 + 2

−2
Indeed, the parameter α > 0 or α < 0 can be chosen arbitrarily—except for its sign!
Identity The existence of three linearly independent light-like eigenvectors implies that the
transformation is the identity.
Each orthochronous Lorentz transformation A with det A = −1 has an eigenvector with eigenvalue
−1. This eigenvector bb is unique, and it is always space-like.
The orthogonal complement b b⊥ has an orthonormal basis of a space-like and a time-like unit
vector. The restriction of A to this plane is a Lorentz boost with some Lobachevskij parameter λ.
Only the Lorentz boost case can occur, where λ = 0 yields a common reflection.
Proof. Given is an orthochronous Lorentz transformation A. Since dimension 3 is odd, there exists
a real eigenvalue and eigenvector. It is obvious from logic to state the following mutually exclusive
cases:
(a) There exists a time-like eigenvector.
(b) There does not exist any time-like eigenvector, but there exists two light-like eigenvectors.
(c) There exist no time-like and only one light-like eigenvector.
(d) There exist three linearly independent light-like eigenvectors.
(e) There exist only space-like eigenvectors.
Consider case (a) and let be ∈ F uture be a time-like eigenvector, hence Ab e = αb

e. The Lorentz
invariance (3.6) implies α2 = 1. Because the transformation is orthochronous, we know that
α = +1. By Lemma 3.4, the orthogonal complement b e⊥ is invariant, too. By Lemma 3.3, the
orthogonal complement is spanned by two space-like vectors b a and b a, b
b. We may choose b e all to
b,b
be orthogonal unit vectors.
The matrix S for the change of basis has the new basis vectors as columns:
h i
S = b a, b
b,b
e
Question. Check that our choice of the basis makes S an orthochronous Lorentz transformation,
and indeed the sign of b
a can be chosen to make it a proper one.
Too, a straightforward matrix calculation—that each student needs to do at least ten times—
tells that the above explanations mean
 
n11 n12 0
AS = S N with the normal form N = n21 n22 0
 
0 0 1
 
Since the proper Lorentz transformations are a group, the normal form is an orthochronous Lorentz
transformation and det N = det A. Because of the zeros and one occurring in N, the 2 × 2 matrix on
the upper left corner is a two-dimensional rotation or reflection. In the latter case, we can choose
the space-like base vectors along and perpendicular to the axis of reflection, and get even case (b)
with λ = 0.
Now we consider case (b). Let b b ∈ Light+ be two linearly independent light-like eigenvec-
l, m
tors. The eigenvalues are µ > 0, since the transformation is orthochronous and µ−1 by Lemma 3.5:
m = µb
Ab l = µ−1b
m and Ab l
52 F. Rothe
Any linear combination of b

l and m
b with positive coefficients is in the future cone, as shown in
Problem ??. Hence the span U = span(b e ∈ F uture,
b) contains a normalized time-like vector b
l, m
a ∈ U. Here is the most simple choice:
from which we get an orthogonal space-like vector b
l − 2b
2b m
a=
b
l·m
l + 2b
2b m
e=
b
l·m
The action of transformation A on these vectors is
a +b
A(b e) = µ−1 (b
a +b e)
A(b e) = µ(b
a −b a −b e)
µ + µ−1 µ−1 − µ
a=
Ab a+
b e
b
2 2
µ−1 − µ µ + µ−1
e=
Ab a+
b e
b
2 2
In terms of the Lobachevskij parameter λ, the matrix elements are
µ + µ−1 µ−1 − µ
= cosh λ and = sinh λ
2 2
The orthogonal complement space U ⊥ is spanned by one space-like vector b b. Since the orthogonal
complement of an invariant space is invariant, this is an eigenvector with eigenvalue det A. The
matrix S = [b a, b e] for the change of basis has these new basis vectors as columns. Our choice
b,b
of an orthonormal basis makes S an orthochronous Lorentz transformation. Indeed, switching the
a if needed, makes det S = +1 and hence S a proper Lorentz transformation. The above
sign of b
calculations are summarized in a matrix equation
cosh λ sinh λ 
 
0
AS = S N with the normal form N =  0 det A 0 
 
sinh λ 0 cosh λ
 
Consider case (d) and assume there exist three linearly independent light-like eigenvectors.
Using just any two of them, we argue as in case (b). The existence of the third eigenvector implies
µ = 1 and hence Lobachevskij parameter λ = 0. Hence the transformation is the identity.
Finally we discuss the case (e), with the only purpose to rule it out. Assume there exist only
b ∈ Present and Ab
space-like eigenvectors. Let b b = βb
b. By Lemma 3.6 two cases can occur:
(i) either two linearly independent light-like eigenvectors exist—which has already been consid-
ered in case (b);
(ii) or there exists a time-like eigenvector—which has already been considered in case (a).
The case (c) is possible for det A = 1, and interesting. The argument is continued in the next
section.
3.3. Infinitesimal generators Let

A(x) = e xB (3.9)
in the sense of the matrix exponential function. The set of A(x) for all real x is called a one-
parameter group and B is called the generator.
Lemma 3.7. Formula (3.9) generates a one-parameter group of Lorentz transformations if and
only if
BT G + G B = 0 (3.10)
Proof. We need to confirm that

T
e xB G e xB = G (3.11)
Differentiating by x yields
d xBT T
e G e xB = e xB (BT G + G B) e xB
dx
Equation (3.11) holds for all x if and only if the derivative of the left-hand side is zero which
happens if and only the relation (3.10) holds for the generator.
In 2 + 1 dimensions with the Minkowski metric as above, relation (3.10) holds if and only if
 
 0 −b3 b2 
B = b3 0 −b1  = b1 S 1 + b2 S 2 + b3 S 3 (3.12)
 
b2 −b1 0
 
where I have defined

     
0 0 0  0 0 1 0 −1 0
S 1 = 0 0 −1 S 2 = 0 0 0 S 3 = 1 0 0
     
0 −1 0 1 0 0 0 0 0
     
We need to assume b
b , 0.
Lemma 3.8. The eigenvectors of the Lorentz transformation e xB are eigenvectors of the generator
B, too. If generator B has the eigenvalue β, the Lorentz transformation A = e xB has the eigenvalue
e xβ with the same eigenvector.
The characteristic polynomial of B is

det(B − λI) = −λ3 + (b21 + b22 − b23 )λ
It is easy to see that B has the eigenvalue 0 with eigenvector bb = (b1 , b2 , b3 )T . Hence b
b is the
xB
eigenvector
q with eigenvalue one for the Lorentz transformation e . The other two eigenvalues of
B are ± b21 + b22 − b23 .
For the evaluation of the exponential series, one needs the powers
−b3 + b22
 2 
−b1 b2 b1 b3 
B2 =  −b1 b2 −b23 + b21 b2 b3  (3.13)
 
−b1 b3 −b2 b3 b1 + b2
2 2

B3 = (b21 + b22 − b23 )B

54 F. Rothe
The last formula can be obtained from the Hamilton-Cayley theorem. By the Hamilton-Cayley
theorem, one gets zero by plugging the matrix into its own characteristic polynomial.
I now consider the case (c) of Theorem 3.1. In this case bb is a light-like vector. Hence B3 = 0.
This implies that the exponential series of formula (3.9) has only three terms:
x2 2
e xB = I + xB + B
2
Problem 3.8. Calculate the example replacing bb by b f = (1, 0, 1)T and matrix B by
 
0 −1 0 
F = 1 0 −1 = S 1 + S 3
 
0 −1 0
 
Confirm you get back the right-hand side of formula (3.8) in Theorem 3.1, the Special case. Hence
N = eαF
Proof of Theorem 3.1 for case (c). Given any other proper Lorentz transformation A in the case
(c) of Theorem 3.1. Let bf = (1, 0, 1)T be the eigenvector for the normal form N in formula (3.8).
There exists a proper Lorentz transformation R such that
b = Rb
b f
The matrices N and R−1 AR have the eigenvector b f in common. Are the matrices N and R−1 AR
equal? No, one cannot expect such a coincidence, because of the free parameter α appearing in
formula (3.8). But there is a similarity transformation to adjust this parameter. Let
cosh λ 0 sinh λ 
 
L(λ) =  0 1 0 
 
sinh λ 0 cosh λ
 
With a bid of calculations one sees that

L(λ)−1 FL(λ) = eλ F
Hence the exponential series implies
λ
L(λ)−1 eαF L(λ) = eαe F
Thus we get a transformation α 7→ αeλ .

We choose λ such that the matrices L(λ)−1 NL(λ) and R−1 AR have equal restrictions to the
invariant subspace b f ⊥ . After all the work, these two matrices have the following common
properties:
• Both are proper Lorentz transformations.
• Both have the eigenvector bf with eigenvalue 1.
• f ⊥.
Both have equal restrictions to the invariant subspace b
• Both have the same characteristic polynomial.
• Both have the same Jordan normal form with triple eigenvalue 1, and only a one-dimensional
eigenspace spanned by b f.
Question. Give a detailed reasoning with linear algebra to prove that any two matrices with all the
common properties mentioned above are equal.
Question. Given are two automorphic collineations, and of both one knows
• They preserve the orientation.
• They have the same fixpoint on the line of infinity.

• Neither one of them has a second fixpoint
Give a simple geometric argument why these two automorphic collineations are equal.
In the end, with the appropriate choice of λ, we have confirmed that R−1 AR = L(λ)−1 NL(λ). Hence
S −1 AS = N holds with the proper Lorentz transformation S = RL(λ)−1 .
Problem 3.9. Convince (at least) yourself that rotation angle α mod 2π, and the Lobacheskij
parameter λ in Theorem 3.1, and the shift α along a horocycle are unique, once you require that
the transformation S is a proper Lorentz transformation.
56 F. Rothe
4. The Poincaré Half-Plane Model

In the first subsection, we construct the half-plane model via an isometric mapping of the disk
to the half plane. We obtain the hyperbolic lines and distances as expected. Both the distance of
two points and the Riemann metric of the Poincaré’s half-plane are calculated via the isometry.
The next section explains the Euler-Lagrange equation from the calculus of variations. In
the following sections, we reconstruct all features of the Poincaré’s half-plane model, taking the
Riemann metric as starting point.
At first, the curve of minimal distance between any two given points is calculated from the
Euler-Lagrange equation. It turns out to be a circular arc with center on the boundary line of the
half-plane, or—in a special exceptional case—a Euclidean line perpendicular to the boundary line.
These minimizing lines specify the hyperbolic line connecting the two given points. The minimum
of the hyperbolic length of any connecting curve determines the hyperbolic distances between
these two points. Using the Riemann metric, the length of the hyperbolic segment is obtained by
integration. The hyperbolic distance turns out to be the logarithm of the cross ratio of the two
endpoints of the given segment, and the ideal endpoints of the hyperbolic line through these two
points.
4.1. Poincaré half-plane and Poincaré disk The open unit disk is denoted by D = {(x, y) :
x + y < 1}, and its boundary is ∂D = {(x, y) : x2 + y2 = 1}. We denote the upper open half-plane
2 2
by H = {(u, v) : v > 0}. Its boundary is just the real axis ∂H = {(u, v) : v = 0}. We shall construct
the half-plane model via an isometric mapping of the disk to the half plane. It is convenient to use
the complex variables z = x + iy and w = u + iv. We use the notation w = u − iv for the conjugate
complex.
Proposition 4.1 (Isometric mapping of the half-plane to the disk). The linear fractional function
1−z
w=i (4.1)
1+z
is a conformal mapping, and a bijection from C ∪ {∞} to C ∪ {∞}. The inverse mapping is
i−w
z= (4.2)
i+w
These bijective mappings preserves angles, the cross ratio, the orientation, and map generalized
circles to generalized circles.
The unit disk D = {z = x + iy : x2 + y2 < 1} is mapped bijectively to the upper half-plane
H = {w = u + iv : v > 0}. Especially, one easily checks that
z = −1 7→ w = ∞ , z = i 7→ w = 1 , z = 1 7→ w = 0 , z = 0 7→ w = i
Problem 4.1..
(a) We check whether the mapping (4.1) maps indeed the boundaries ∂D → 7 ∂H and find the
restriction of the mapping to the boundary. Confirm that a point z = eiθ is mapped to
w = tan 2θ . Do not separate real and imaginary parts.
(b) Only now separate now real and imaginary parts. Use the mapping (4.1) to confirm the
identities
θ sin θ 1 − cos θ
tan = =
2 1 + cos θ sin θ
(c) Use the inverse mapping (4.2) to confirm the identities

θ
1 − tan2 2 tan 2θ
cos θ = 2
and sin θ =
1+ tan2 2θ 1 + tan2 θ
2
Solution of part (a). We plug z = eiθ into formula (4.1) and get
1−z 1 − eiθ e−iθ/2 − eiθ/2 2 sin(θ/2) θ
w=i =i = i = = tan
1+z 1+e iθ e −iθ/2 +e iθ/2 2 cos(θ/2) 2

The Poincaré half-plane model of hyperbolic geometry is constructed from the disk model via
this isometry. One translates the definitions from the section on the Poincaré disk model to the
half-plane model and arrives at the following conventions:
The points of H are the "points" for Poincaré’s model. The points of ∂H are called "ideal
points" or "endpoints". Those are not points for the hyperbolic geometry. The "lines" for
Poincaré’s model are circular arcs, or—in a special case—Euclidean lines perpendicular to ∂H.
The "angles" for Poincaré’s model are the usual Euclidean angles between tangents to the circular
arcs.
Remark. The inversion image of any point P = (u, v) obtained by reflection across the real axis
is P0 = (u, −v). In complex notation, reflection by the real axis is complex conjugation: point
P = w = u + iv is reflected to P0 = w = u − iv.
Remark. The one-point compactification C ∗ = C ∪ {∞} of the complex plane is well-known
and useful in complex analysis, especially it is possible to define regularity and power series of
functions in a neighborhood of ∞. Only a linear fractional mapping as for example mapping (4.1)
and its inverse are naturally extended to bijective continuous mappings 1
C ∪ {∞} 7→ C ∪ {∞}
Hence the half-plane of Poincaré’s model gets just one point ∞. This point can occur as ideal end
of a line, circle, equidistance lines or horocyle.
Especially for the half-plane, this state of affairs is a bid contrary to the common imagination.
Indeed, we have a totally different definition and usage of improper points in the projective
completion from projective geometry.
Given are any two points A and B. Let P and Q be the ideal endpoints of the hyperbolic line
through A and B. These points are named in a way that A, B, P, Q occur in this order during an
entire turn around the circle.
For the definition of a hyperbolic "distance" and of "congruence of segments", the
Definition ?? and the preservation of the cross ratio are used as starting points. Thus we arrive at
the following
Definition 4.1 (Distance and Congruence). The hyperbolic distance of points A and B is given
by
|AP| · |BQ|
s(A, B) := ln(AB, PQ) = ln (4.3)
|AQ| · |BP|
Two segments AB and XY are called "congruent" iff s(A, B) = s(X, Y).
1
They are indeed the only analytic mappings with such an extension
58 F. Rothe
Since the mapping (4.2) provides an isometry between the half-plane and the disk model, we
can directly calculate the Riemann metric for Poincaré’s half-plane:
Proposition 4.2 (Riemann Metric for Poincaré’s half-plane). In the Poincaré half-plane, the
infinitesimal hyperbolic distance ds of points with coordinates (u, v) and (u + du, v + dv) is
du2 + dv2
(dsH )2 = (4.4)
v2
Proof. The metric is determined by the requirement that
i−w
z= (4.2)
i+w
provides an isometry from the half-plane to the disk:
dsD = dsH (4.5)
Hence we need to convert the known metric
dx2 + dy2
ds2 = 4 (??)
(1 − x2 − y2 )2
of the Poinaré disk model to a metric in the half-plane. We calculate the denominator
|i + w|2 − |i − w|2 (w + i)(w − i) − (i − w)(−i − w) 4v
1 − | z|2 = = =
|i + w|2 |w + i| 2 |w + i|2
and the derivative of the mapping (4.2):
dz 2
=−
dw (w + i)2
Putting these two results into formula (??) yields
4(dx2 + dy2 ) 4| dz|2
ds2 = =
(1 − x2 − y2 )2 (1 − |z|2 )2
!2 !2
|w + i|2 |w + i|2
2
2 2

dz
= 4 | dw|
2
= 4 | dw|2
dw 4v (w + i)2 4v
| dw|2 du2 + dv2
= 2 =
v v2
Thus formula (4.4) arises from the isometry (4.5) of the half-plane and the disk.
4.2. The Euler-Lagrange equation The basic problem of the calculus of variations is to
determine the curve y = y(x) between two given points (a, y(a)) and (b, y(b)) for which the
prescribed functional
Z b
L[y] := F(x, y, y0 ) dx
a
assumes an extremum (minima or maxima), or simply becomes stationary. 1 It turns out that the
stationary curves for the functional L[y] satisfy the Euler-Lagrange equation
d ∂F ∂F
− = 0
dx ∂y 0 ∂y
1
In physical applications, the functional is obtained from first principles of physics.
To derive the Euler-Lagrange equation, we take a pencil of connecting curves y = y(x, p) depending
smoothly on a parameter p, and differentiate the functional L[y(., p)] by the parameter p. It is
customary to denote the derivative of any quantity by p with the symbol δ and call it the variation
of this quantity. One obtains
Z b
d d F(x, y, y0 )
δ L[y] = L[y] = dx
dp a dp
∂ F(x, y, y0 ) ∂ y ∂ F(x, y, y0 ) ∂ y0
Z b " #
= · + · dx
a ∂y ∂p ∂y0 ∂p
#b Z b "
∂ F(x, y, y0 ) ∂ y ∂ F(x, y, y0 ) ∂ y d ∂ F(x, y, y0 ) ∂ y
" ! #
= · + · − · dx
∂y0 ∂p a a ∂y ∂p dx ∂y0 ∂p
The boundary terms vanish for a problem with prescribed endpoints (a, y(a)) and (b, y(b)) of the
curve. Hence we obtain
∂F d ∂F
Z b" !#
δ L[y] = − δy dx
a ∂y dx ∂y0
Since the variation δy of the curve can be chosen to be any smooth function of x, the Lemma of the
calculus of variations shows that the expression in the bracket has to vanish identically. Thus we
obtain the Euler-Lagrange differential equation.
Lemma 4.1 (Lemma of the calculus of variations). Let g(x) be a piecewise continuous function
and suppose that Z 1
η(x)g(x)dx = 0
0
for all functions η ∈ C ∞ . Then the function g is identically zero.
Proof. We show that for a continuous function g , 0 the assertion
Z 1
η(x)g(x)dx = 0
0
does not hold for all functions η ∈ C ∞ .
Assume g , 0 and g continuous. Hence there exists 0 < a < 1 such that g(a) > 2ε > 0.
There are cases where you need to go with the negative −g and get the following reasoning for the
negative function −g.
Because of the continuity of g there exists δ > 0 such that |x − a| < 2δ implies |g(x) − g(a)| < ε
and hence g(x) > g(a) − ε > ε.
There exists a continuous, and even C ∞ function η ≥ 0 such that η(x) = 0 for |x − a| > 2δ and
η(x) = 1 for |x − a| < δ. Hence
Z 1 Z a+2δ Z a+δ
η(x)g(x)dx = η(x)g(x)dx ≥ η(x)g(x)dx = 2δε > 0
0 a−2δ a−δ
Hence the assumption that
Z 1
η(x)g(x)dx = 0
0
for all functions η ∈ C ∞ does not hold.
As a contrapositive, the assumptions that a continuous function g , 0 satisfies
Z 1
η(x)g(x)dx = 0
0
for all functions η ∈ C ∞ imply g(x) = 0 for all x.
60 F. Rothe
4.3. The curve of minimal hyperbolic length We want to find the curve of minimal hyper-
bolic length connecting two given points. In this problem, it turns out to be more convenient to use
the right half plane {(x, y) : x > 0} as model of hyperbolic geometry. The corresponding Riemann
metric is
dx2 + dy2
p
ds = (4.6)
x
The hyperbolic length of any curve y = y(x) between two given points (a, y(a)) and (b, y(b)) is
given by the functional
dx + dy2
Z b p 2
L[y] :=
a x
Choosing the variable x as independent, we get
dx + dy2 1 + y02
Z b p 2 Z b p
= dx
a x a x
In the variational problem occurs the function
1 + y02
p
F(x, y, y ) =
0
x
The Euler-Lagrange equation becomes particularly simple. Since the variable y does not occur in
the functional F, we can immediately perform one integration and get
d  ∂ 1 + y02 
 p 
 = 0
dx  ∂y0

x
∂ 1 + y02
p
= c
∂y0 x
y0
= c
x 1 + y02
p
y02 = (1 + y02 )c2 x2

y02 (1 − c2 x2 ) = c2 x2
cx
y0 = √
1 − c2 x 2
Here c denotes a constant independent on x. Of course the value of c can still depend on the
coordinates of the endpoints. The last line is a first order differential equation. If c = 0, we get
the solution y = const. The minimizing curve is a Euclidean line perpendicular to the boundary. If
c , 0, we do a further integration and obtain
Z
cx dx
y = y0 + √
1 − c2 x 2
We substitute v = 1 − c2 x2 and dv = −2c2 xdx to obtain
Z
1 dv
y = y0 − √
2c v
√
v
= y0 −
c
√
= y0 ∓ c−2 − x2
This is the equation of an circular arc with center (0, y0 ) and radius |c|−1 .
4.4. The minimum of hyperbolic length I go now back to the more commonly used upper
half-plane. For the convenience of the reader, I use variables x and y. The upper half plane is
{(x, y) : y > 0} and has the metric
dx2 + dy2
p
ds = (4.7)
y
The minimum of the hyperbolic length of any connecting curve determines the hyperbolic distances
between two points. Given are two points A with Euclidean coordinates (xA , yA ) and B with
coordinates (xB , yB ). In the case xA , xB , the minimizing curve of connection is a circular arc
with center on the x-axis. 1 The equation of such an arc is
p
y = + r2 − (x − x0 )2 (4.8)
The radius r > 0 and the center (x0 , 0) are to be determined from the coordinates of the two points
A and B.
Problem 4.2. We can check directly that the function (4.8) is a solution of the Euler-Lagrange
equation for the functional
1 + y02
p
F(x, y, y ) =
0
y
(a) Confirm that
d ∂F ∂F 1 + y02 + y00 y
− =
dx ∂y0 ∂y y2 (1 + y02 ) 1 + y02
p
(b) Check that the derivatives of function (4.8) for the upper half-circle satisfy
y02 + yy00 + 1 = 0.
The hyperbolic length of the arc is
dx + dy2 1 + y02
Z xB p 2 Z xB p
s(A, B) = = dx
xA y xA y
We need to differentiate the square root composite function (4.8) occurring inside the integral and
get
x − x0
y0 = p
r − (x − x0 )2
2
r2 r2
1 + y02 = 2 2
= 2
r − (x − x0 ) y
We plug into the distance functional and obtain
1 + y02
Z xB p
s(A, B) = dx
xA y
Z xB
r dx
=
x y2
Z AxB
r dx
= 2 − (x − x )2
xA r 0
1
I leave the special case xA = xB as an exercise.
62 F. Rothe
This integral can be calculated by means of the partial fraction decomposition

r 1 1
= +
2
r − (x − x0 ) 2 2(r − x + x0 ) 2(r + x − x0 )
Z xB Z xB
dx dx
s(A, B) = +
x 2(r + x − x 0 ) xA 2(r − x + x0 )
" A # xB
1 1
= ln(r + x − x0 ) − ln(r − x + x0 )
2 2 xA
It remains to check that this result agrees with formula (4.3). We calculate the logarithm of the
cross ratio of the two endpoints A and B of the given segment, and the ideal endpoints P and
Q of the hyperbolic line through these two points. We assume xA < xB . The points A, B, P, Q
occur during a clockwise turn around the circle. The Euclidean coordinates of the endpoints are
(xP , yP ) = (x0 + r, 0) and (xQ , yQ ) = (x0 − r, 0). Hence the cross ratio, and its logarithm are
|AP| · |BQ|
(AB, PQ) =
|AQ| · |BP|
1 |AP|2 1 |BP|2
ln(AB, PQ) = ln − ln
2 |AQ|2 2 |BQ|2
#x
(r − x + x0 )2 + y2 A
"
1
= ln
2 (r + x − x0 )2 + y2 xB
#x
−2(x − x0 )r + 2r2 A
"
1
= ln
2 2(x − x0 )r + 2r2 xB
#x
−x + x0 + r A
"
1
= ln
2 x − x0 + r x B
in agreement with the result (??). In the special case xA = xB , the minimizing curve is a Euclidean
line perpendicular to the x-axis. We leave the calculation of the distance in the special case as an
exercise.
4.5. Some useful reflections in the half-plane
Problem 4.3. Check how the mapping
1−z
w=i (4.1)
1+z
maps the boundaries ∂D 7→ ∂H. Confirm that a point z = eiθ is mapped to w = tan 2θ .
Use the inverse mapping
i−w
z= (4.2)
i+w
to confirm the identities
θ
1 − tan2 2 tan 2θ
cos θ = 2
and sin θ =
1+ tan2 2θ 1 + tan2 θ
2
Problem 4.4. Let S α denote the reflection across the line with ends α ∈ R and ∞. Confirm that
S α (w) = 2α − w (4.9)
for any α ∈ R. Use this result for an easy check that
S α+β = S α S 0 S β (4.10)
holds for any α, β ∈ R.
Problem 4.5. Let Mγ denote the reflection across the line with ends γ > 0 and −γ. Confirm that
γ2 w
Mγ (w) = (4.11)
|w|2
for any γ > 0. Use this result for an easy check of
Mγδ = Mγ M1 Mδ (4.12)
Problem 4.6. Confirm that S 0 Mγ = Mγ S 0 and

Mγ M1 S α M1 Mγ = S αγ2 (4.13)
for any α, γ ∈ R.
64 F. Rothe
5. Equation of motion
The motion of a particle or photon in a gravitational field can in principle be determined by
either one of the following three principles:
(A) "The trajectory is even on itself".
(B) "The geodesic is an extremal for the proper time or distance."
(C) The quadratic Lagrangian integral is stationary.

5.1. Affine geodesic Alternative (A) leads to the equation of parallel transport to be postulated
for the tangent of the trajectory. With any parameter u, consider the trajectory u 7→ xa (u) with
a
tangent ta = dxdu . Using the definition 1.12 of the intrinsic derivative of a vector along a curve,
postulate (A) gives the equation of motion
Dt
= λ(u)t (5.1)
du
The arbitrary function λ(u) depends on the choice of the parameter. Using equation (1.26), we may
write out (5.1) in coordinates:
d 2 xa b
a dx dx
c
dxa
+ Γ bc = λ(u) (5.2)
du2 du du du
a
Definition 5.1. The solutions of the equation of motion (5.2) with [ dx
du ] , 0 everywhere define the
affine geodesics. A parameter with λ(u) ≡ 0 is called an affine parameter.
Lemma 5.1. There exist affine parameters. Any two affine parameters v and v0 are related
by v0 = Av + B with any constants A , 0 and B . One may find the bijective substitution
(reparametrization) v = v(u) for which λ(v) = 0 holds along the entire curve as a solution of
the linear differential equation
d2 v dv
2
= λ(u) (5.3)
du du
Proof. Putting v = v(u) into the equation (5.2) for the affine geodesic, one obtains
d 2 xa b
a dx dx
c
dxa
0= + Γ bc − λ(u)
du2 du du du
" 2 a !2
b c
dxa d2 v
# " #
d x a dx dx dv dv
= + Γ bc + − λ(u)
dv2 dv dv du dv du2 du
dv
A nonconstant solution of the linear differential equation (5.3) exists and has du , 0 everywhere.
Hence one obtains
d 2 xa b c
" a#
a dx dx dx
+ Γ bc = 0 and ,0
dv2 dv dv dv
as expected.
Lemma 5.2. For an affine parameter, the affine equation of motion (5.2) has the following
equivalent forms
ẍa + Γa bc ẋb ẋc = 0 (5.4)
b
d (gab ẋ )
− Γc ab ẋc ẋb = 0 (5.5)
du
where the Christoffel symbol is defined by equation (1.41). In the second form, we use the covariant
velocity (or momentum) ẋc = gcd ẋd .
5.2. Metric geodesic Following alternative (B) the geodesic,—more precisely metric geodesic,—
is defined to be an extremal for the proper time or distance. The Euler-Lagrange equation for this
variational problem is the geodesic equation. Let xa = xa (u) be a parametric equation for the curve
using any, not necessarily affine parameter u. For fixed endpoints P and Q, we have to solve the
variational problem Z Q Z Q
δ ds = 0 or δ dτ = 0 (5.6)
P P
√ √
with the line element ds = Ldu or cdτ = Ldu and the Lagrangian
L := gab ẋa ẋb (5.7)
Here the derivative by any arbitrary parameter u is denoted by a dot. Without any further
assumption, the quantity
ds √
ṡ = = L
du
may depend on the parameter u. We see immediately that only trajectories may be considered for
which the tangent vector is either everywhere√time-like (material particles), or everywhere space-
like. In the second case one needs to put ṡ = −L. Only the first alternative will be pursued here.
The Euler-Lagrange equation for the variational problem (5.6) is
√ √
d ∂ L ∂ L
− =0 (5.8)
du ∂ ẋa ∂xa
Lemma 5.3. Even without any further assumption on the parameter u, the Euler-Lagrange
equation (5.8) has the following equivalent forms
d gab ẋb (∂a gbc ) ẋb ẋc
!
√ − √ =0 (5.9)
du L 2 L
d gab ẋb 1 s̈
− (∂a gbc ) ẋb ẋc = gab ẋb (5.10)
du 2 ( ) ṡ
a b c s̈ a
ẍa + ẋ ẋ = ẋ (5.11)
bc ṡ
where the Christoffel symbol is defined by equation (1.41).
Checking the calculations.

d gab ẋb (δa gbc ) ẋb ẋc
!
√ − √ =0
du L 2 L
d b b √
du (gab ẋ ) ṡ − (gab ẋ ) s̈ (∂a gbc ) ẋb ẋc
− =0 from L = ṡ and the quotient rule;
ṡ2 2 ṡ
d gab ẋb 1 s̈
− (∂a gbc ) ẋb ẋc = gab ẋb
du 2 ṡ
1 s̈
(∂c gab ) ẋc ẋb + gab ẍb − (∂a gbc ) ẋb ẋc = gab ẋb from product and chain rule;
2 ṡ
1 s̈
gab ẍ + (∂c gab + ∂b gac − ∂a gbc ) ẋb ẋc
b
= gab ẋb multiplying by gda one gets
2 ( ) ṡ
d b c s̈
ẍ +
d
ẋ ẋ = ẋd
bc ṡ

66 F. Rothe
Let the rest mass be m > 0 and let c denote the speed of light. We assume the tangent vector is
everywhere time-like along the geodesic. This case occurs for the motion massive particles. Here
the most convenient choice of curve parameter is the proper time τ defined by
dxa dxb
L := gab = c2 (5.12)
dτ dτ
to hold.
Lemma 5.4. There exists parameters for which s̈ ≡ 0 and hence the Lagrangian is constant along
the geodesic. Any two such parameters v and v0 are related by v0 = Av + B with any constants
A , 0 and B . One may find the bijective substitution (reparametrization) τ = τ(u) by integrating
dτ √
c = L (5.13)
du
and thus introduce the proper time τ as curve parameter.
5.3. The quadratic Lagrangian We now consider alternative (C), which turns out to be a very
attractive possibility. For fixed endpoints P and Q, we have to solve the variational problem
Z Q
δ Ldu = 0 (5.14)
P
for the quadratic Lagrangian

One obtains the Euler-Lagrange equation
d ∂L ∂L
− a =0 (5.15)
du ∂ ẋ a ∂x
Lemma 5.5. The Euler-Lagrange equation (5.15) has the following equivalent forms
d (gab ẋb ) 1
− (∂a gbc ) ẋb ẋc = 0 (5.16)
du 2 ( )
a b c
ẍa + ẋ ẋ = 0 (5.17)
bc
( )
d ẋa c
− ẋc ẋb = 0 (5.18)
du ab
where the Christoffel symbol is defined by equation (1.41). In the third form, we use the covariant
velocity (or momentum) ẋc = gcd ẋd .
Proposition 5.1. The Euler-Lagrange equation (5.15) is equivalent to the affine geodesic equa-
tion (5.4) if and only if the torsion tensor satisfies
T bca + T cba = 0 (5.19)
Especially, for a symmetric connection, the Euler-Lagrange equation (5.15) is equivalent to the
affine geodesic equation (5.4).
Proof. From formula (1.44) one concludes

) (
a b c
ẍ + Γ bc ẋ ẋ = ẍ +
a a b c a
ẋ ẋ + T bca ẋb ẋc
bc

Lemma 5.6. Along the trajectories of equation (5.16), the quadratic Lagrangian
L ≡ gab ẋa ẋb = Const (5.20)
is a constant of motion. Under the assumption that this constant of motion L > 0 is positive, the
parameter u is automatically u = As + B or u = Aτ + B with some constants A , 0 and B. Thus we
have obtained as solution for equation (5.10) with s̈ ≡ 0, too.
Proof. We differentiate the quantity L by the parameter u, denoting any derivatives by u with a
dot. The Leibnitz product rule yields
d gab ẋa ẋb d gab a b
= ẋ ẋ + gab ẍa ẋb + gab ẋa ẍb
du du
d gab ẋb a d gab a b
=2 ẋ − ẋ ẋ = (∂a gbc ) ẋb ẋc ẋa − (∂c gab ) ẋa ẋb ẋc = 0
du du
and confirms that L is a constant of motion.
The situation with L > 0 holds always for a positive definite metric, as usually assumed in
classical differential geometry. Too, in general relativity one gets L > 0 for the time-like geodesics,
which are the paths of massive particles. In all these cases the parameter is automatically the arc-
length or proper time, up to a linear transformation. Indeed, under the assumption that L > 0, the
definition of arc-length,—respectively proper time,—gives
√
d2 s d L 1 dL
2
= = √ =0
du du 2 L du
and hence u = As + b, or u = Aτ + B, respectively.
Theorem 5.1. Each solution of the variational problem

Z Q
δ L du = 0 (5.14)
P
with the quadratic Lagrangian L has the Lagrangian as a constant of motion. Under the additional
assumption that this constant of motion L > 0 is positive, the curve parameter u is automatically
proportional to the arc-length or proper time; and the trajectory satisfies the variational problem
Z Q
δ ds = 0 (5.6)
P
for the arc-length or proper time, too.

Conversely, from solution of the variational problem (5.6) by introducing the arclength or
proper time as parameter, one obtains a solution of the variational problem (5.14) for the quadratic
Lagrangian L.
The geodesic equation has the two following two equivalent forms, which I nickname the
"physical form" and the "Christoffel form":
d 1
gab ẋb − (∂a gbc ) ẋb ẋc = 0 (5.21)
dλ 2
d ẋa
+ Γa bc ẋb ẋc = 0 (5.22)
dλ
68 F. Rothe
Problem 5.1. As a first benefit, one can use the equivalence of the two forms (5.21) and (5.22)
for the geodesic equation in order to calculate the connection coefficients Γa bc of a symmetric
connection. As an example, we take the spherical coordinates (r, θ, φ) of the Euclidean R3 . They
have the metric
ds2 = dr2 + r2 dθ2 + r2 sin2 θdφ2
Get all non-zero connection coefficients.
Answer. The r-component of the physical form of the geodesic equation yields
d 1 1
(grr ṙ) − (∂r gθθ )θ̇2 − (∂r gφφ )φ̇2 = 0
dλ 2 2
r̈ − r θ̇ − r sin2 θ φ̇2 = 0
2
Γr rr = 0 , Γr θθ = −r , Γr φφ = −r sin2 θ
The θ-component of the physical form of the geodesic equation yields
d 1
gθθ θ̇ − (∂θ gφφ )φ̇2 = 0
dλ 2
r2 θ̈ + 2rṙθ̇ − r2 sin θ cos θ φ̇2 = 0
Γθ θr + Γθ rθ = 2r−1 , Γθ φφ = − sin θ cos θ
one has to use that the connection is symmetric to end up with Γθ θr = r−1 .
The φ-component of the physical form of the geodesic equation yields
d
gφφ φ̇ = 0
dλ
2r sin2 θṙφ̇ + 2r2 sin θ cos θ θ̇φ̇ + r2 sin2 θ φ̈ = 0
Γφrφ = r−1 , Γφθφ = cot θ
again one has used that the connection is symmetric to cancel the factor 2.
5.4. Null geodesics In general relativity the null-geodesics with L ≡ 0 are the paths of photons.
Here are two among several possibilities to find and justify the equation of motion for the photon:
• The affine geodesic equation (5.4) may be applied, and an affine parameter is still available;
• In the variational problem Z Q
δ dτ = 0 (5.6)
P
for the proper time, one takes the limit rest mass to zero.
I define the physical affine parameter as
τ
λ=
m
where m denotes the rest mass. The equations governing the four-momentum
dxa dx0 E dt
pa = and p0 = = =c and pa pa = m2 c2 (5.23)
dλ dλ c dλ
hold independently of the rest mass. In the limit m → 0, they remain valid and are still physically
meaningful. Indeed, in this way one obtains correct dynamical equations for the photon.
The affine geodesic equation (5.4) may be written as the system
dxa
= pa
dλ
(5.24)
d pa dxb
= Γc ab pc
dλ dλ
which is an initial value problem for the functions xa (λ), pa (λ). Quite similarly, at least for m > 0,
the Euler-Lagrange equation (5.15) may be written as
dxa
= pa
dλ
(5.25)
d pa 1 dxb
= (∂a gbc )pc
dλ 2 dλ
Proposition 5.2 (Local smoothness). The solutions of both initial value problems (5.24) and
(5.25) depend smoothly on the rest mass and the physical affine parameter. Especially, both
problems are well posed for rest mass m = 0.
More precise: assume the respective initial value problem has a solution for some mass m ≥ 0
and some initial value [xa ], [pa ] such that pa pa = m2 c2 and [pa ] , 0, existing on an open interval
including the closed interval [Λ1 , Λ2 ] 3 0 including zero, and having everywhere nonzero four-
momentum [pa ] , 0. Then there exist open neighborhoods for this initial value, this mass m and
the interval [Λ1 , Λ2 ] on which the initial value problem is solvable. Moreover, the solution depends
smoothly on these parameters and the four-momentum [pa ] , 0 is nonzero everywhere.
Remark. Still a blow-up of the solution after a finite "parameter-lifetime" is possible. The end of
existence may occur for a photon even slightly sooner than for a massive particle.
Proof. Because of the conservation law

gab pa pa = m2 c2 = Const
and the signature of the metric, one obtains bounds cp0 ∈ [Emin , Emax ] > 0 for all the energy values
occurring within parameters m ∈ [0, M] and λ ∈ [−Λ, +Λ]. Hence
dt
= p0 /c ∈ [c−2 Emin , c−2 Emax ]
dλ
which allows to use the coordinate time as independent variable. With this substitution, the Euler-
Lagrange equation (5.15) are transformed into
dxa pa
=c 0
dt p
d pa 1 dxb
= (∂a gbc )pc
dt 2 dt
We have set up a system which remain meaningful for rest mass m = 0. Choose T ≥ c−2 Emax .
The smooth dependence of the solution for m ∈ [0, M] and t ∈ [−T, +T ] is a standard theorem
from differential equations. Hence smooth dependence for the solution of equation (5.25) and
the Euler-Lagrange equation (5.15) holds within any bounded parameter ranges m ∈ [0, M] and
λ ∈ [−Λ, +Λ],—under the only assumption that pa , 0 holds for all four-momentum values
occurring. The proof for the affine geodesic equation (5.4) is similar.
5.5. The method The equation of motion can be derived either from
(A) parallel transport;
(B) the extremum for proper time;
(C) the variational principle for the quadratic Lagrangian integral.

70 F. Rothe
Methods (A) and (C) lead to well-posed initial value problems, with solutions which depend
smoothly on the rest mass m ≥ 0 and the physical affine parameter λ. Moreover, under the
additional assumption of a symmetric connection, we have shown that methods (A) and (C) lead to
the same initial-value problem.
For the (somehow most attractive) method (B) we have obtained a well-posed initial value
problem only for massive particles. Indeed, for positive rest mass m > 0, the solution of the initial
value problem depends smoothly on m > 0 and the proper time τ. Moreover, under the restriction
of positivity of mass, methods (B) and (C) turn out to be essentially equivalent.
The method (C) allows to include the case of zero rest mass continuously. On the other
hand, method (B) may be physically more justified because of its relation to the principles of
superposition of waves. These principles of interference are well-known to hold both for massive
particles and light. Hence there are reasons to believe that the method (B) can be extended
smoothly, to include the case of massless particles, too.
In case, one accepts such a postulate, the unique smooth extension to rest mass m → 0 obtained
from method (C), is the physically meaningful extension of method (B) to rest mass m → 0, too,
since for positive mass, methods (B) and (C) are essentially equivalent.
Corollary 6 ("The method"). Assuming the connection is symmetric, the same equation of
motion are obtained either from parallel transport, the extremum of proper time, or the variational
principle for the quadratic Lagrangian. Both for photons and massive particles, the equation of
motion may be obtained as solution of the variational problem
Z Q
δ L dλ = 0 (5.26)
P
with the quadratic Lagrangian

dxa dxb
L := gab (5.27)
dλ dλ
In place of u, we have to use the physical affine parameter λ = τ/m; and denote derivative by λ
with a dot. The physical momentum of the particle or photon is pa = ẋa , and may be used,—and
indeed shows up,—both in its original contravariant, as well as the covariant form pa = gab pb .
One obtains (and easily checks) the equations of motion
dxa
= pa
dλ (5.28)
d pa 1
= (∂a gbc )pc pb
dλ 2
Automatically, the Lagrangian is a constant of motion. Because of the choice of the physical affine
parameter, its value is fixed to be
L = pa pa = m2 c2 (5.29)
5.6. Killing vector
Definition 5.2 (Isometry). A point transformation x0 = x0 (x) on a Riemannian manifold is called

an isometry iff g0ab (x) = gab (x) holds for all coordinates x and indices a, b.
Definition 5.3 (Killing vector). A vector field [Ka ] is a Killing vector field or simply Killing vector
iff the flow
∂Φa (ε, x)
= K a (x) (5.30)
∂ε
produces isometries.
Proposition 5.3. A vector field K is a Killing vector field if and only if the Lie derivative of the
metric along the vector field vanishes:
LK gab = 0 (5.31)
K ∂c gab + (∂a K )g sb + (∂b K s )gas = 0
c s
(5.32)
Independent proof. The flow defined by the initial value problem (5.30) gives for small ε the
infinitesimal transformations
x0a = Φa (ε, x) = xa + εK a (x) + O(ε2 )
Under these point transformations the metric is transformed by
∂xd ∂xe
g0ab (x0 ) = gde (x) = gab (x) − ε(∂a K d )gdb − ε(∂b K e )gae + O(ε2 )
∂x0a ∂x0b
At the same coordinate x one gets as transformation of the metric
g0ab (x) − gab (x) = [g0ab (x) − g0ab (x0 )] + [g0ab (x0 ) − gab (x)]
= −εK c ∂c gab − ε(∂a K d )gdb − ε(∂b K e )gae + O(ε2 )
Assuming that the flow (5.30) of the vector field K induces isometries for all ε, we conclude that
the first order term is zero and get equation (5.32). Conversely, let us assume that equation (5.32)
holds identically on the entire manifold. Then the Killing flow induces transformations of the
metric such that
Φ(ε,
e g)(x) = g0 (x) = Product o f Jakobians · g(Φ(−ε, x))
h i
= g(x) − ε K c ∂c gab + (∂a K d )gdb + (∂b K e )gae ωa ⊗ eb + O(ε2 )
= g(x) + O(ε2 )
for all ε and hence g0 = g for all ε on the entire manifold. This means that the Killing flow induces
isometries.
Proposition 5.4. Given is the vector field [Ka ] on a Riemannian manifold. The scalar product
S = Ka ẋa is a constant of motion along all geodesics λ 7→ xa (λ) if and only if the Killing equation
holds:
∇a Kb + ∇b Ka = 0 (5.33)
Proof. The covariant derivative of the scalar product S = Ka ẋa = K a ẋa along the geodesic
is calculated. Since Leibniz’ rule holds for covariant derivatives, and the geodesic equation is
assumed
dS
Ṡ = = (∇b K a ẋa ) ẋb = (∇b K a ) ẋa ẋb + K a (∇b ẋa ) ẋb (5.34)
dλ
D ẋa 1
= (∇b K a ) ẋa ẋb + K a = (∇a Kb + ∇b Ka ) ẋa ẋb (5.35)
dλ 2
Hence the scalar product S = Ka ẋa is a constant of motion along all geodesics if and only if the
Killing equation (5.33) holds for the vector field [Ka ].
Remark. Note that for the velocities and corresponding momenta, the index is lowered and lifted
by the rules
ẋa = gab ẋb and ẋa = gab ẋb
pa = gab pb and pa = gab pb
even if the metric is not constant.
72 F. Rothe
Corollary 7. Assume the metric does not depend on the index c, and the connection is symmetric.
Under that assumption, the vector field Ka := δac is a Killing vector. Hence the corresponding
covariant velocity u̇c = gca u̇a and momentum pc = gca pa is constant of motion.
Proof.
∇a K b = Γb sa K s = Γb ca
∇a Kb = Γbca
∇a Kb + ∇b Ka = Γbca + Γacb = Γbac + Γabc = δc gab = 0

Corollary 8. Assume the metric satisfies K c δc gab = 0 for a constant vector [K c ], and the
connection is symmetric. Under that assumption, the product
S = gab K a u̇b
and the corresponding momentum is a constant of motion.
Proof.
∇a K b = Γb sa K s
∇a Kb = Γbsa K s = Γbas K s
∇a Kb + ∇b Ka = K s δ s gab = 0
Hence K is a Killing vector and the scalar product S = gab K a u̇b is a constant of motion.
Theorem 5.2. Let K be a vector field on a Riemannian manifold with symmetric connection.
Equivalent are
(i) K is a Killing vector, which means the flow
∂Φa (ε, x)
= K a (x) (5.30)
∂ε
produces isometries.
(ii) The infinitesimal point transformations x0a = xa + εK a (x) + O(ε2 ) are isometries up to order
O(ε2 ).
(iii) The Lie derivative of the metric along the vector field vanishes LK gab = 0.
(iv) K c ∂c gab + gbc ∂a K c + gac ∂b K c = 0
(v) The Killing equation ∇a Kb + ∇b Ka = 0 holds identically.
(vi) The scalar product S = Ka ẋa = K a ẋa is constant along all geodesics.
Proof. We need still to check that the Killing equation is equivalent to item (iv).
∇a Kb + ∇b Ka = ∇a (gbc K c ) + ∇b (gac K c ) = gbc ∇a K c + gac ∇b K c
= gbc (∂a K c + K s Γc sa ) + gac (∂b K c + K s Γc sb )
= gbc ∂a K c + K s Γbsa + gac ∂b K c + K s Γcsb = gbc ∂a K c + gac ∂b K c + K s (Γbas + Γcbs )
= K s ∂ s gab + gbc ∂a K c + gac ∂b K c

Problem 5.2. Check that the metric
ds2 = dr2 + r2 dθ2 + r2 sin2 θdφ2 (??)
has the Killing vector K = r2 sin φ dθ + r2 cos θ sin θ cos φ dφ.
Answer. We may directly check the Killing equation (5.33). The non-zero Christoffel symbols have
been calculated in problem 5.1
Γr rr = 0 , Γr θθ = −r , Γr φφ = −r sin2 θ
Γθ θr = r−1 , Γθ φφ = − sin θ cos θ
Γφrφ = r−1 , Γφθφ = cot θ
To check whether ∇a Kb + ∇b Ka = 0 calculate
∇r Kθ + ∇θ Kr = ∂r Kθ − Γ s θr K s + ∂θ Kr − Γ s θr K s = ∂r Kθ − 2Γθ θr Kθ = 2r sin φ − 2r−1 r2 sin φ = 0

∇r Kφ + ∇φ Kr = ∂r Kφ − 2Γ s φr K s = 2r cos θ sin θ cos φ − 2r−1 r2 cos θ sin θ cos φ = 0
∇θ Kθ = ∂θ Kθ − Γ s θθ K s = ∂θ Kθ = 0
∇φ Kφ = ∂φ Kφ − Γθ φφ Kθ = −r2 cos θ sin θ sin φ + sin θ cos θr2 sin φ = 0
∇θ Kφ + ∇φ Kθ = ∂θ Kφ + ∂φ Kθ − 2 cot θKφ = 0
Problem 5.3. Find a more inspired solution of problem 5.3 without use of Christoffel symbols.
Problem 5.4. Fear Schopenhauer’s mousetraps and do no mousetrap proofs! Give the reason
behind the last calculation. Why does the Killing flow maps geodesics into geodesics, and even
why do these have the same two constants of motion: both Ka ẋa as well as gab ẋa ẋb .
Lemma 5.7. For any vector field [K a ] holds LK K a = 0. For a Killing vector field holds LK Ka = 0,
too.
Proof.
LK K a = (K s ∂ s )K a − K s (∂ s K a ) = 0
LK Ka = LK (gab K b ) = (LK gab )K b + gab LK K b = 0
The first term is zero by proposition 5.3. The second term is zero by first line above.
Lemma 5.8. Let xa = xa (λ, ε) be a family of curves satisfying

∂xa (λ, ε)
= va (x) (5.36)
∂ε
In other words, the curve xa = xa (λ, 0) is transported by the flow of vector field [va ]. Then Lv ẋa = 0,
where the dot denotes partial derivative by curve parameter λ.
Proof.
∂2 x a dva ∂x s
Lv ẋa = (v s ∂ s ) ẋa − ẋ s (∂ s va ) = − ẋ s (∂ s va ) = − (∂ s va ) = 0
∂ε ∂λ dλ ∂λ

74 F. Rothe
Proposition 5.5. Let xa = xa (λ, ε) be a family of curves satisfying

∂xa (λ, ε)
= K a (x) (5.37)
∂ε
Moreover, assume the curve xa = xa (λ, 0) a geodesic and it is transported by the flow of Killing
field [K a ].
(i) The curves λ 7→ xa (λ, ε) are geodesics for all ε.
(ii) The Lagrangian L = gab ẋa ẋb is a independent of both λ and ε.
(ii) The quantity S = gab K a ẋb is a independent of both λ and ε.
Proof of item (i). This is left to the reader.
Proof of item (ii).

∂L
= LK (gab ẋa ẋb ) = (LK gab ) ẋa ẋb + 2gab ẋa LK ẋb = 0
∂ε
The first term is zero by proposition 5.3. The second term is zero by lemma 5.8.
Proof of item (iii).

∂S
= LK (gab K a ẋb ) = (LK gab )K̇ a ẋb + gab (LK K a ) ẋb + gab K a LK ẋb = 0
∂ε
The first term is zero by proposition 5.3. The second term is zero by lemma 5.7. The third term is
zero by lemma 5.8.
6. Geodesics in the Schwarzschild metric

The Schwarzschild metric is the solution of Einstein’s field equation for a mass M at the center.
!−1
r∗ r∗
!
ds2 = c2 1 − dt2 − 1 − dr2 − r2 dθ2 − r2 sin2 θdφ2 (6.1)
r r
Here
2GM
r∗ = (6.2)
c2
∗
is the Schwarzschild radius. For the sun r ≈ 2.96 km.
Problem 6.1. Write down the "physical form" of the geodesic equations in the Schwarzschild
metric, with proper time as parameter.
r∗ dt
" ! #
d
1− =0 (6.3)
dτ r dτ
 !−1 
d  r∗ dr  1
 1 −  = − (∂r gbc ) ẋb ẋc (6.4)
dτ r dτ 2
" # !2
d 2 dθ dφ
r = r sin θ cos θ
2
(6.5)
dτ dτ dτ
" #
d dφ
r2 sin2 θ =0 (6.6)
dτ dτ
Remark. The right-hand side of the radial equation is a nuisance, but can be avoided.
The orbits lie in plane. For simplicity I may choose the plane θ ≡ 90◦ . Since the Schwarzschild
metric does not depend on time t nor angle φ, one gets two constants of motion
r∗ dct Eesc
!
1− = ≡ const (6.7)
r dτ mc
dφ l
r2 = ≡ const (6.8)
dτ m
The physical meaning of these integration constants can be determined from the limit r → ∞
where special relativity holds. It turns out that Eesc is the energy of the particle or planet escaped to
infinity. The relativity parameter is γ = Eesc /(mc2 ). One gets the slow-down of proper time t = γτ
as known from special relativity.
Let pesc be the momentum of the escaped particle. From special relativity we know the
important formula
2
Eesc = m2 c4 + c2 p2esc
It turns out that l is the angular momentum of the particle or planet around the z-axis. Let us
introduce the impact parameter b to be the perpendicular distance of the line of straight motion of
the particle from the center mass. Suppose that the particle escapes along a line x(τ)e x + bey . With
x = cos φ, y = r sin φ, one gets the angular momentum
l = mr2 φ̇ = −xmẏ + ym ẋ ' bm ẋ = bpesc (6.9)
as expected.
76 F. Rothe
Remark. For bounded orbits we may still use the parameters pesc ∈ i[0, ∞) and b = lpesc ∈ i[0, ∞),
but with imaginary values. The boundary case of parabolic motion has pesc = 0 and any value
A ∈ [0, ∞) for the semilatus rectum and l ∈ [0, ∞).
The integration of the equations of motions becomes possible since the Lagrangian is the
further constant of motion. With the proper time τ as curve parameter
dxa dxb
L := gab = c2 (5.12)
dτ dτ
is the constant value of the Lagrangian. For the Schwarzschild metric this identity reduces to
!2 !−1 !2 !2 !2
r∗ dt r∗
!
dr 2 dθ dφ
2
c 1− − 1− −r − r sin θ
2 2
= c2
r dτ r dτ dτ dτ
and is further simplified by use of θ ≡ 90◦ . Too, I use from now on the simpler dot notation.
!−1
r∗ 2 r∗
!
2
c 1− t˙ − 1 − ṙ2 − r2 φ̇2 = c2 (6.10)
r r
A bid of arithmetic is still needed. Then I further simplify by use of the constants of motion from
equations (6.7) and (6.8).
!2
r∗ 2 r∗ 2 r∗
! !
2
c 1− 2 2
t˙ − ṙ − r 1 − φ̇ = c 1 −
2
r r r
∗ 2 ∗
!2
r∗ 2
!
r cr
ṙ + r 1 −
2 2
φ̇ −
2
=c 1−
2
t˙ − c2
r r r
r ∗ l2 c2 r ∗ E2 p2
!
ṙ2 + 1 − 2 2
− = 2esc2 − c2 ≡ esc
r mr r mc m2
The last equation may be multiplied by m/2 to get the energy balance for the Kepler motion, with
just one extra term! Too, we shall need the angular equation of motion.
r∗ GMm p2esc
! 2
m 2 l
ṙ + 1 − − = (6.11)
2 r 2mr2 r 2m
mr2 φ̇ = l ≡ bpesc (6.12)
For the discussion of motion of photons, I use the physical affine parameter λ = τ/m, for which
as expected, one gets equations which are meaningful in the limit m → 0. Equation (6.11)
is multiplied by 2m and the affine parameter λ is introduced. Too, we need the equation of
motion (6.7)for the time and the angular equations of motion (6.8). Altogether we obtain as
equations of motion for photons and massless particles
!2
r ∗ l2 l2
!
dr
+ 1− = p 2
esc ≡ (6.13)
dλ r r2 b2
r∗ dct Eesc
!
1− = ≡ pesc (6.14)
r dλ c
dφ
r2 = l ≡ bpesc (6.15)
dλ
Problem 6.2. Write a paragraph on the rotation of the perihelion of mercury.
• How does one proceed from equations of motion (6.11) and (6.12) to get an equation about
the shape of the orbit.
• One does an expansion in powers of r∗ , respectively c−2 , which is a small parameter. What is
the zeroth order term obtained with r∗ = 0.
• One puts r−1 =: u = u0 + u1 + . . . . Which equation does one get for u1 .
• Which formula for the perihelion rotation is obtained?
6.1. The equation for the shape of relativistic orbits For the equation about the shape of the
orbit, we use the variable u := r−1 and need an equation of motion for the function u = u(φ). Since
du 1 dr ṙ mṙ
=− 2 =− 2 =− (6.16)
dφ r dφ r φ̇ l
we multiple equation (6.11) by 2ml−2 and substitute to obtain

!2
du 2GMm2 p2
+ (1 − r∗ u)u2 − 2
u = esc
dφ l l2
This step excludes the case l = 0 of radial motion. As a convenient geometric quantity, one may
introduce the semilatus rectum
l2
A := ≥0 (6.17)
GMm2
and gets
!2
du 2u p2
+ u2 − − r∗ u3 = esc (6.18)
dφ A l2
Remark. Equation (6.18) is valid both for unbounded as well as bounded orbits, as long as l , 0.
Note that in the latter case, one needs to use imaginary values pesc ∈ i[0, ∞) for the escape
momentum.
For hyperbolic motion the impact parameter b = lpesc from equation (6.9) and the escape
momentum pesc are convenient parameters. Since equations (6.17), (6.9) and (6.2) imply
l2 b2 p2esc 2b2 pesc 2

A= = =
GMm2 GMm2 r∗ mc
one obtains
!2 !2
du r∗ mc 1
+u − 2 2
u − r∗ u3 = (6.19)
dφ b pesc b2
Remark. Again the equation (6.19) is valid both for unbounded orbits with pesc > 0, as well
as bounded orbits. But in the latter case, one needs to use imaginary values pesc ∈ i(0, ∞) and
b = lpesc ∈ i(0, ∞).
Remark. The boundary case of parabolic motion occurs for pesc = 0 and any value of the angular
momentum l ∈ [0, ∞). The semilatus rectum from equation (6.17) still exists and takes any value
A ∈ [0, ∞). For A , 0, one get an impact parameter formally b = ∞. For the parabolic motion and
A , 0, one gets the equation of shape
!2
du 2u
+ u2 − − r∗ u3 = 0 (6.20)
dφ A
78 F. Rothe
6.2. Kepler’s classical nonrelativistic orbits One does an expansion in powers of r∗ , respec-
tively c−2 , which is a small parameter. The zeroth order term is obtained by putting r∗ = 0 into
equation (6.18). One obtains the classical approximation of Kepler motion
!2
du 2u p2
+ u2 − = Const ≡ esc (6.21)
dφ A l2
from which the orbits are obtained to be conic sections. The equation of motion (6.21) is also called
Binet equation. All cases lead to the orbits
1 + e cos φ
u(φ) = (6.22)
A
with the eccentricity e ∈ [0, ∞) and the semilatus rectum A ∈ (0, ∞) as convenient parameters. Only
the purely radial motion with l = 0 needs to be treated separately. Plugging the solution (6.22) into
the equation of motion (6.21), the reader should check that the equation of motion holds. Moreover
one obtains
e2 − 1 p2esc 1
= 2 = 2 (6.23)
A2 l b
to be the relation between the gets geometrical and physical parameters and the impact parameter.
The first equation holds in all cases with l > 0. For e , 1 the major half-axis a > 0 satisfies
A2
b2 = = ∓Aa = (e2 − 1)a2 (6.24)
e2 − 1
with the upper minus sign for the ellipsis. Hence the equation
√
b = a e2 − 1 (6.25)
holds wonderfully in all cases.

These are the different shapes of orbits for l > 0:
• The case p2esc < 0 and u ≡ const gives circles. Eccentricity is e = 0 and the radius is
|l|
1/u ≡ r = A =
|pesc |
• The case p2esc < 0 but u not constant gives ellipses. The eccentricity lie in the range e ∈ (0, 1).
Reusing relation (6.23) and the almost magical semilatus rectum from equation (6.17), the
major half axis turns out to be
A l2 GMm2 r∗ pesc −2

a= = − = − = −
1 − e2 Ap2esc p2esc 2 mc
into which formula even the Schwarzschild radius from (6.2) fits!
• The case p2esc = 0 gives parabolas, here e = 1. The impact parameter is formally b = ∞.
• The case p2esc > 0 gives hyperbolas, here the eccentricity lies in the range e ∈ (1, ∞). the
major half axis is
A l2 r∗ pesc −2
a= 2 = =
e − 1 Ap2esc 2 mc
6.3. Scattering in Newtonian dynamics From the equation (6.25), we see that the impact
parameter is the minor half axis of the hyperbola. The scattering angle θ is supplementary to the
angle between asymptotes of the hyperbola. Hence the Cartesian equation
y2 x2
= −1
b2 a2
implies
θ b
cot =
2 a
I have shall use spherical coordinates with the incoming beam in +z direction. One wants to express
the ray properties in terms of impact parameter b and escape momentum pesc . To this end, we need
b2 = aA from good old conics (6.24), the semilatus rectum from equation (6.17), and l = bpesc and
produce
θ b A l2 bp2esc
cot = = = =
2 a b GMm2 b GMm2
For small angles, we get the first order approximation
θ 2GMm2 r∗ m2 c2
θ = 2 tan = =
2 bp2esc b p2esc
which is only half of the physical value calculated in equation (6.32) below.
For arbitrary scattering angles, but nonrelativistic dynamics, we may obtain the Rutherford
scattering formula. The argument below works both for attractive as well as repulsive inter
interactions.
For an incoming ray of intensity I, some fraction dI is scattered into the space angle dΩ =
sin θdθ dφ, and is measured at large distance R from the scatterer as an intensity IR−2 dσ. Thus one
defines the differential cross section dσ. These scattered particles correspond to impact parameter
[b, b + db] and rotation angles [φ, φ + dφ]. Hence
dσ = b db dφ
dσ b db
=
dΩ sin θ dθ
Simply differentiate
GMm2 θ
b= 2
cot
pesc 2
db GMm 2
θ
=− sin−2
dθ 2p2esc 2
db G2 M 2 m4 θ θ
b =− 4
cos sin−3
dθ 2pesc 2 2
dσ b db G2 M 2 m4 −4 θ
= =− sin
dΩ sin θ dθ 4p4esc 2
6.4. Perturbation expansion for relativistic bounded orbits One expands in powers of the
small parameter r∗ and puts u = u0 + u1 + . . . into the equation of shape (6.18). The zeroth order
80 F. Rothe
term is the classical Kepler ellipse, given by equation (6.22). For the next term of order r∗ we get a
linear the equation, which one may solve
du0 du1 2u1
2 + 2u0 u1 − = r∗ u30
dφ dφ A
du1 r∗ (1 + e cos φ)3
−e sin φ + e cos φ u1 =
dφ 2A2
3r eφ sin φ
∗
u1 = + periodic terms with only sin φ, cos φ, sin2 φ, cos2 φ
2A2
One needs only the secular term proportional to φ sin φ. The period terms with sin φ, cos φ, sin2 φ, cos2 φ
are below the accuracy of observation.
Problem 6.3. Since expansions work best for linear equation, some people may want to see the
following approach. We differentiate the equation of shape (6.18) by the angle φ and obtain
d2 u 1 3r∗ 2
+ u − − u =0 (6.26)
dφ2 A 2
One expands in powers of the small parameter r∗ and puts u = u0 + u1 + . . . . The zeroth order
term is the classical Kepler ellipse, given by equation (6.22). Determine and solve the first order
equation. Get once more the secular term of first order and thus check the above calculation.
6.5. The mercury perihelion rotation The solution
1 + e cos φ 3r∗ eφ sin φ
u= +
A 2A2
has a tiny perihelion advance ε per rotation. Successive maxima of the solution take place for φ = 0
and φ = 2π + ε.
e sin φ 3r∗ e(sin φ + φ cos φ)
u0 = − +
A 2A2
A 0 3r∗
u (2π + ε) = − sin ε + (sin ε + (2π + ε) cos ε)
e 2A
3r∗
' −ε + [ε + (2π + ε)] + O(ε2 )
2A
We need to put this derivative to zero and solve for ε. The terms of order r∗2 are neglected, once
more. One gets
3πr∗ 3πr∗
ε= =
A a(1 − e2 )
∗
This value is approximately ε = 3r −7
2A ≈ 5 · 10 . Next I introduce the major half axis and the
eccentricity, easily to be obtained from astronomical tables. For the sun the Schwarzschild radius
is approximately r∗ ≈ 2.96 km. But we may instead use the known velocity c of light, and obtain
r∗ from Kepler’s third law. Finally, one needs to convert to arcseconds per century and gets
4π2 a3 c2 r ∗
= GM =
T2 2
∗ ∗ 2 3
3πr 6πr 4π a 24π2 a2
ε= = =
a(1 − e2 ) a(1 − e2 ) c2 T 2 (1 − e2 )c2 T 2
3600 · 180 T century 24π2 a2
ε=
π (1 − e2 )c2 T 2
e
T
00
which you may now check with easily obtainable data. The famous value is ε ≈ 43 century−1 .
6.6. Perburtation expansion for the angle of deflection We start from the equation (6.19)
!2 !2
du r∗ mc 1
+ u2 − 2 u − r∗ u3 = 2 (6.19)
dφ b pesc b
for the shape of the orbit. We assume that the Schwarzschild radius r∗ b is a small quantity
compared to the impact parameter, and the particle is either a photon, or at least moving fast
enough such that say mc/pesc ≤ 10. To start an perturbation expansion, we put r∗ := 0 and obtain
for the zeroth order term the equation of motion
!2
du 1
+ u2 = 2
dφ b
which has the solution
cos φ
u0 =
b
This is simply a straight line with impact parameter b. The goal is to set up the expansion
u = u0 + u1 + . . . by powers of r∗ . We need the first order term u1 in order to calculate the
deflection angle. From the terms linear in r∗ , we get a linear differential equation for u1 , which one
has to solve.
!2
du0 du1 r∗ mc
2 + 2u0 u1 = 2 u0 + r∗ u30
dφ dφ b pesc
!2
du1 r∗ mc r∗
−2 sin φ + 2 cos φu1 = 2 cos φ + 2 cos3 φ
dφ b pesc b
I take the Ansatz u1 = A + B cos2 φ and compare coefficients of A and B. The reader should check
the following result:
!2
r∗ mc r∗
(2A + 4B) cos φ − 2B cos φ = 2
3
cos φ + 2 cos3 φ
b pesc b
cos φ
u= + A + B cos2 φ
b
!2
cos φ r∗ r∗ mc r∗
u= + 2+ 2 − 2 cos2 φ
b b 2b pesc 2b
The ingoing and outcoming rays correspond to r → ∞ and hence u = 0. They occur for the polar
angles ±φ = 90◦ + ε/2, where ε is the total deflection. Since cos(90◦ + ε/2) ' −ε/2 for small
ε 1, one gets
!2
ε r∗ r∗ mc
0=− + 2 + 2 + O(ε2 )
2 b 2b pesc
and finally the first order approximation for the small bending angle
 !2 
2r∗  1 mc 
ε' 1 +  + O(r∗2 ) (6.27)
b  2 pesc 
6.7. The bending of light The equation of shape (6.19) is valid for photons, too. One may
simply put m = 0 and obtains a "Binet-type" equation for the inverse radius u = r−1 as a function
of φ
!2
du 1
+ u2 − r∗ u3 = 2 (6.28)
dφ b
82 F. Rothe
The bending of light is obtained from the perturbation expansion, similarly as in the last section.
One obtains the small bending angle in first approximation to be
2r∗
ε' + O(r∗2 ) (6.29)
b
For illustration of this important result, I give the entire reasoning independently, once more.
Clearly we may start with equations of motion (6.13) and (6.15) and eliminate the proper time
by means of
!−1
du 1 dr dr 2 dφ 1 dr 1 dr
=− 2 =− r =− =− (6.30)
dφ r dφ dλ dλ l dλ bpesc dλ
Thus one arrives at the same equation (6.28) about the shape of the orbit. An expansion in terms of
powers of the small parameter r∗ is needed. The zeroth order term is obtained putting r∗ = 0
!2
du 1
+ u2 = 2
dφ b
which equation has the solution
cos φ
u0 =
b
As to be expected, this is simply a straight line with impact parameter b. We set up the expansion
u = u0 + u1 + . . . by powers of r∗ . We need only the first order term u1 in order to calculate the
small deflection angle. I illustrate a variant method which is sometimes useful since expansions
work best for linear equations. We differentiate the equation of shape (6.28) by the angle φ and
obtain
d2 u 3r∗ 2
2
+u− u =0 (6.31)
dφ 2
From the terms linear in r∗ , we get a linear differential equation for u1 , which one has to solve. To
this end, I make the Ansatz u1 = A + B cos2 φ and compare coefficients of A and B.
d 2 u1 3r∗ 2
+ u1 = u
dφ2 2 0
3r∗
2B(1 − 2 cos2 φ) + A + B cos2 φ = 2 cos2 φ
2b
cos φ
u= + A + B cos2 φ
b
cos φ r∗ r∗
u= + 2 − 2 cos2 φ
b b 2b
The reader should check the above calculation.
We may now calculate the total deflection ε of the light ray. The ingoing and outgoing rays
correspond to r → ∞ and hence u = 0. They occur for the polar angles ±φ = 90◦ + ε/2. Since
cos(90◦ + ε/2) ' −ε/2 for small |ε| 1, one has to solve
ε r∗
0 = − + 2 + O(ε2 )
2 b
and finally the first order approximation for the small bending angle
2r∗
ε' + O(r∗2 ) (6.32)
b
7. Gauss’ Differential Geometry and the Pseudo-Sphere

7.1. Introduction Through the work of Gauss on differential geometry, it became clear—
after a painfully slow historic process—that there is a model of hyperbolic geometry on surfaces of
constant negative Gaussian curvature. One particularly simple such surface is the pseudo-sphere.
According to Morris Kline, it is not clear whether Gauss himself already saw this non-
Euclidean interpretation of his geometry of surfaces. Continuing Gauss work, Riemann and
Minding have thought about surfaces of constant negative curvature. Neither Riemann nor Minding
did relate curved surfaces to hyperbolic geometry (Morris Kline III, p.888 etc). But, independently
of Riemann, Beltrami finally recognized that surfaces of constant curvature are non-Euclidean
spaces. Due to the ideas forwarded by Gauss, mathematicians have in the end advanced to the
concept of a curved surface as a space of its own interest. Gauss’ work implies that there are non-
Euclidean geometries on surfaces regarded as spaces in themselves. An obvious and important idea
is finally spelled out!
As we explain in detail below, Beltrami shows that one can realize a piece of the hyperbolic
plane on a rotation surface of negative constant curvature. This surface is called a pseudo-sphere.
But this new discovery comes with a disappointment: by a result of Hilbert, there is no regular
analytic surface of constant negative curvature on which the geometry of the entire hyperbolic
plane is valid (see Hilbert’s Foundations of Geometry, appendix V).
Concerning models of hyperbolic geometry, the final outcome turns out to be a trade off
between the pseudo-sphere and the Poincaré disk. Both have their strengths and weaknesses. The
pseudo-sphere is a model for a limited portion of the hyperbolic plane. Both angle and length are
represented correctly. The arc length of a geodesic is the correct hyperbolic distance. Furthermore,
because of the constant Gaussian curvature, on the pseudo-sphere a figure may be shifted about and
just bending will make it conform to the surface. The situation is similar to the more familiar case
of Euclidean geometry on a circular cylinder or cone. As everybody knows, on a circular cylinder,
a plane figure can be fitted by simply bending it, without stretching and shrinking.
On the other hand, only the Poincaré disk is a model for the entire hyperbolic plane. Here only
angles are still represented correctly, but the price one finally has to pay is that hyperbolic distances
are distorted. The hyperbolic lines become circular arcs, perpendicular to the ideal boundary. One
can see the distortion easily in Esher’s superb artwork, based on tiling of the hyperbolic plane with
congruent figures.
The trade off just explained makes the isometry between the pseudo-sphere into the Poincaré
disk especially interesting. One such isometric mapping is explicitly constructed below. Hilbert’s
result gets rather natural, too. As explained below, in the sense of hyperbolic geometry, the
boundary of the pseudo-sphere turns out to be an arc of a horocyle.
7.2. About Gauss’ differential geometry Karl Friedrich Gauss had devoted an immense
amount of work to geodesy and map making, starting 1816. This stimulus leads to his definitive
paper in differential geometry of 1827: "Disquisitiones Generales circa Surperficies Curvas" . In
this work, Gauss introduces the basics of curved surfaces, and goes far beyond. The real benefit is
that, due to the ideas forwarded by Gauss, mathematicians have in the end advanced to the concept
of a curved surface as a space of its own interest.
To begin with, one imagines a curved surface to be embedded into three dimensional space R3 ,
and given by some parametric equations
x = x(u, v) , y = y(u, v) , z = z(u, v) (7.1)
The distance ds of neighboring points on the surface with parameters (u, v) and (u + du, v + dv) is
given by the first fundamental form
ds2 = Edu2 + 2Fdudv + Gdv2 (7.2)
84 F. Rothe
The first fundamental form is straightforward to calculate from the parametric equations (7.1) since
E = xu2 + y2u + z2u

F = xu xv + yu yv + zu zv (7.3)
G= xv2 + y2v + z2v
follows from elementary vector calculus.

The geodesics on curved surfaces are defined to be the shortest curves lying on the given
surface, connecting any two given points. Gauss’ work sets up the differential equation for
the geodesics. Gauss introduces the two main curvatures, called c1 , c2 . They turn out to be
simply the extremal curvatures of normal sections of the surface. A new important feature is the
2
Gaussian curvature, called K. Gauss shows that K = LN−M
EG−F 2
, the quotient of the determinants of the
second and first fundamental form. But, even simpler, the Gaussian curvature turns out to be the
product of the two principle curvatures:
K = c1 c2 (7.4)
Gauss shows the remarkable fact that this curvature is preserved during the process of bending the
curved surface inside a higher dimensional space, without stretching, contracting or tearing it. On
the contrary, the two main curvatures are changed by flexing the surface.
There are actually at least two different proofs for this fact contained in Gauss’ work. The first
one depends on Gauss’ characteristic equation
1 ∂ F ∂E 1 ∂G 1 ∂ 2 ∂F 1 ∂E F ∂E
" # " #
K= − + − − (7.5)
2H ∂u EH ∂v H ∂u 2H ∂v H ∂u H ∂v EH ∂u
√
where H = EG − F 2 . Obviously, any such equation implies that the Gaussian curvature depends
only on the first fundamental form. The first fundamental form is preserved, if one bends the
curved surface in three space, without stretching, contracting or tearing it. Therefore the functions
E, F, G, H which determine the first fundamental form depend only on the parameters (u, v), but do
not depend at all on how—or even whether at all—the surface lies in a three dimensional space.
Because of the Gauss’ characteristic equation (7.5), the same is true for the Gaussian curvature
K. Because of all that, one says that the Gaussian curvature is an intrinsic property of the curved
surface.
7.3. Riemann metric of the Poincaré disk
Proposition 7.1 (Riemann Metric for Poincaré’s Model). In the Poincaré model, the infinitesimal
hyperbolic distance ds of points with coordinates (x, y) and (x + dx, y + dy) is
4(dx2 + dy2 )
(dsD )2 = (7.6)
(1 − x2 − y2 )2
Reason. The fact that angles are measured in the usual Euclidean way implies that ds2 =
py)(dx + dy ). The rotational symmetry around the center O implies that E(x, y) =
2 2
E(x,
E( x2 + y2 , 0). Hence
q
ds2 = E( x2 + y2 , 0)(dx2 + dy2 ) (7.7)
Now it is enough to calculate the distance of the points (x, 0) and (x + dx, 0). The hyperbolic
distance of a point (x, 0) from the center (0, 0) is 2 tanh−1 x, as we have derived in Proposition ??
in the section on the Poincaré disk model. See formula (??) there, which is of course the primary
origin of the hyperbolic distance! Taking the derivative by the variable x yields
ds d d 1+x 1 1 2
= (2 tanh−1 x) = ln = + =
dx dx dx 1 − x 1 + x 1 − x 1 − x2
(7.8)
4
ds2 = dx2
(1 − x2 )2
Hence E(x, 0) = 4
(1−x2 )2
and
4
q
E( x2 + y2 , 0) = (7.9)
(1 − x2 − y2 )2
Now formulas (7.7) and (7.9) imply the claim (7.14).
Problem 7.1 (Hyperbolic circumference of a circle). Calculate the circumference of a circle of

hyperbolic radius R. We use the Poincaré disk, put the center of the circle at the center O of the
disk. In polar coordinates, the Riemann metric is
dx2 + dy2 dr2 + r2 dθ2
ds2 = 4 = 4
(1 − x2 − y2 )2 (1 − r2 )2
(a) Calculate the hyperbolic length R of a segment OA with Euclidean length |OA| = r < 1.
(b) Get the circumference C = ds of the circle around O, at first in terms of the Euclidean
R
radius |OA| = r < 1.
(c) Get the circumference C of this circle in terms of the hyperbolic radius R.
Solution. We take O as center of the circle, and point A on the circumference. Let r = |OA| denote
the Euclidian radius, and R = s(O, A) be the hyperbolic radius.
The hyperbolic radius R can be found directly for the Riemann metric (7.14). One needs partial
fractions to do the integral.
Z r Z r Z r" #
2dr 1 1
R= ds = = + dr = [− ln(1 − r) + ln(1 + r)]r0
0 0 1 − r 2
0 1 − r 1 + r
= 2 tanh−1 r
Remark. Of course, we can go back once more to Proposition ??, formula (??) from the section on
the Poincaré disk model and get R = s(O, A) = 2 tanh−1 |OA| = 2 tanh−1 r.
We solve R = 2 tanh−1 r for the Euclidean radius and get
R eR/2 − e−R/2 eR − 1
r = tanh = =
2 eR/2 + e−R/2 eR + 1
For the usual Euclidean polar coordinates (r, θ) we get the Euclidean arc length:
Z 2π q Z 2π √ Z 2π
LEucl = dx + dy =
2 2 dr + r dθ = r
2 2 2 dθ = 2πr (7.10)
0 0 0
The first line holds for any smooth curve. In the second line, we go to the special case of a circle
. For a circle, the coordinate r is constant and hence dr = 0, and the factor r can be pulled out of
the integral.
86 F. Rothe
Now the distance along the circumference is measure in the hyperbolic metric (7.14) from
Proposition 7.1. Hence the calculation above is modified to
Z 2π √ 2
dx + dy2
Z 2π p 2
dr + r2 dθ2
Lhyp = 2 = 2
0 1 − x 2 − y2 0 1 − r2
Z 2π (7.11)
2r 4πr
= dθ =
1 − r2 0 1 − r2
eR −1
We have found the correct hyperbolic arc length. But still, one needs to use the formula r = eR +1
to
express r in terms of the hyperbolic distance R.
4πr 4π(eR − 1)
Lhyp = =
1 − r2 [1 − ( eRR −1 )2 ](eR + 1)
e +1
4π(eR − 1)(eR + 1) 4π(e2R − 1) (7.12)
= =
(eR + 1)2 − (eR − 1)2 4eR
= π(e − e ) = 2π sinh R
R −R
Proposition 7.2 (The circumference of a circle). In hyperbolic geometry, the circumference of a

circle of hyperbolic radius R is 2π sinh R.
Problem 7.2. The hyperbolic circumference of a circle is much larger than the Euclidean circum-
ference. Let R = 1, 2, 5, 10 and estimate how many times the radius fits around the circumference
of a circle of that radius.
Answer. A simple calculation yields
R (2πsinh R)/R
1 7.38
2 11.39
5 93.25
10 6919.82
Problem 7.3 (Hyperbolic area of a circle). For a circle of hyperbolic radius R, calculate the area
A. Again, we use the Poincaré disk, put the center of the circle at the center O of the disk. For the
area, we use the formula from differential geometry
Z 2π Z r √
A= EG − F 2 dr dθ
0 0
The first fundamental form is the Riemann metric. It has been already given by formula (7.14), and
transformed to polar coordinates in the previous problem.
dx2 + dy2 dr2 + r2 dθ2
ds2 = 4 = 4 = Edr2 + 2Fdrdθ + Gdθ2 (7.13)
(1 − x2 − y2 )2 (1 − r2 )2
(a) Get the area of the circle, at first in terms of the Euclidean radius |OA| = r < 1.
(b) Get the area A of this circle in terms of the hyperbolic radius R.
(c) Check that

dA
=C
dR
(a) The first fundament form (7.13) yields

√ 4r
H= EG − F 2 =
(1 − r2 )2
and hence the hyperbolic area of a circle of Euclidean radius r is
Z 2π Z r √ Z r
4r dr
A= EG − F 2 dr dθ = 2π
0 0 0 (1 − r2 )2
This integral is solved with the substitution u = r and du = 2rdr. 2
Z u #u
4πr2
"
2 du 4π 4π
A = 2π 2
= = − 4π =
0 (1 − u) (1 − u) 0 (1 − r2 ) (1 − r2 )
This is the area in terms of the Euclidean radius |OA| = r < 1.
(b) The hyperbolic radius R has already been calculated in the previous problem. We solve
R = 2 tanh−1 r for the Euclidean radius and get
R eR/2 − e−R/2 eR − 1
r = tanh = =
2 eR/2 + e−R/2 eR + 1
(eR − 1)2 (eR + 1)2 − (eR − 1)2 4eR
r2 = and 1 − r2 = =
(eR + 1)2 (eR + 1)2 (eR + 1)2
Now plug this formula into the result from part (a) and get
4πr2 π(eR − 1)2
A= = = π(eR − 1 + e−R ) = 2π(cosh R − 1)
2
(1 − r ) eR
An alternative formula is
π(eR − 1)2 R
A= R
= π(eR/2 − e−R/2 )2 = 4π sinh2
e 2
(c) We have obtained in Proposition 7.2 from the section on the Poincaré disk model, that the
hyperbolic circle of radius R has the circumference C = 2π sinh R. On the other hand,
differentiating the result of (b) gives
dA d cosh R
= 2π = 2π sinh R = C
dR dR
as to be shown.
Problem 7.4. Use the fundamental form for the Poincaré disk model
4(dx2 + dy2 )
(dsD )2 = (7.14)
(1 − x2 − y2 )2
to calculate its Gaussian curvature.
88 F. Rothe
Answer. Formula (7.14) implies that the functions in the first fundamental form are E = G = H =
4(1 − x2 − y2 )−2 and F = 0. Hence, with x = u and v = y, we get from formula (7.5)
1 ∂ 1 ∂E 1 ∂ 1 ∂E ∂2 ∂2
" # " # !
1
K= − + − =− + ln E
2E ∂x E ∂x 2E ∂y E ∂y 2E ∂x2 ∂y2
1 ∂2 ∂2 1 ∂ ∂
! !
−2x −2y
=+ + ln(1 − x 2
− y 2
) = + +
E ∂x2 ∂y2 E ∂x (1 − x2 − y2 ) ∂y (1 − x2 − y2 )
−2 1 − x2 − y2 + 2x2 1 − x2 − y2 + 2y2
!
(−2) · 2
= 2 2 2
+ 2 2 2
= = −1
E (1 − x − y ) (1 − x − y ) 4
By the way, the result K = −1 motivates the annoying factor 4 in formula (7.14).
7.4. Riemann metric of Klein’s model
Proposition 7.3 (Hilbert-Klein Metric). In the Klein model, the infinitesimal hyperbolic distance
ds of points with coordinates (X, Y) and (X + dX, Y + dY) is
dX 2 + dY 2 − (XdY − YdX)2
ds2 = (7.15)
(1 − X 2 − Y 2 )2
Proof. We shall derive this metric using the transformation from the Poincaré to the Klein model.
As stated in Proposition ??, the mapping from a point P in Poincaré’s model to a point K in Klein’s
model is
2 |OP|
|OK| = (??)
1 + |OP| 2
−−→ −−→
requiring that the rays OP = OK are identical. We use Cartesian coordinates and put P = (x, y) for
Poincaré’s model and K = (X, Y) for the points in Klein’s model. Finally we put r2 = x2 + y2 and
R2 = X 2 + Y 2 . From the mapping (??), we get
2x 2y
X= and Y = (7.16)
1+r 2 1 + r2
The Riemann metric for Poincaré’s model has been derived in Proposition 7.1 to be
dx2 + dy2
ds2 = 4 = E dx2 + 2F dxdy + G dy2
(1 − x2 − y2 )2
Here E, F, G denotes the fundamental form for the Poincaré model in terms of (x, y). In the
following we shall use the matrix
" # " #
E F 4 1 0
= (7.17)
F G (1 − x2 − y2 )2 0 1
From the fact that the transformation from Poincaré’s to Klein’s model is a passive coordinate
transformation, we know that the infinitesimal hyperbolic distance ds of points is left invariant.
Because of the invariance, the fundamental form E, F, G for the Klein model has to satisfy
ds2 = E dX 2 + 2F dXdY + G dY 2 = E dx2 + 2F dxdy + G dy2
We take for now (x, y) as independent variables. From calculus, we know that by means of the
chain rule
 ∂ X ∂ Y  " #  ∂ X ∂X
   
 " #
 ∂x  E F  ∂x ∂y
 ∂ X ∂∂xY   =
 E F
(7.18)
 F G  ∂ Y ∂Y

  F G
∂y ∂y ∂x ∂y
It now remains to carry out the arithmetic. The superscript T denotes transposition of matrices and
the superscript −1 denote inversion of matrices. As usual, we use
∂ X ∂ X 
 
D X  ∂x ∂y
=  ∂ Y ∂ Y 

Dx  
∂x ∂y
as shorthand for the Jacobi matrix of the transformation (7.16). We need to solve the equation (7.18)
for the new fundamental form E, F, G to obtain
D X T " # " #
E F DX E F

=
Dx F G Dx F G
E F D X −1
" # D X T,−1 " #
E F

=
F G Dx F G Dx
The Jacobi matrix of the transformation (7.16) is obtained explicitly from equations (7.16) to be
1 − x2 + y2
" #
DX 2 −2xy
=
Dx (1 + x2 + y2 )2 −2xy 1 + x2 − y2
The determinant is
DX 4 h i 4 h i 4(1 − r2 )
Det = 1 − (x2 − y2 )2 − 4x2 y2 = 1 − r4 =
Dx (1 + r )
2 4 (1 + r )
2 4 (1 + r2 )3
Hence the inverse turned out to be
D X −1 (1 + r2 )3 1 + x2 − y2
" #
2 2xy
=
Dx 4(1 − r2 ) (1 + r2 )2 2xy 1 − x 2 + y2
1 + r2 1 + x2 − y2
" #
2xy
=
2(1 − r2 ) 2xy 1 − x 2 + y2
With the fundamental form from formula (7.17) and the inverse Jacobi matrix just obtained plugged
into equation (??), we calculate
#2
(1 + r2 )2 1 + x 2 − y2
" # "
E F 4 2xy
=
F G 4(1 − r2 )2 (1 − r2 )2 2xy 1 − x 2 + y2
(1 + r2 )2 1 + 2x2 − 2y2 + (x2 − y2 )2 + 4x2 y2
" #
4xy
=
(1 − r2 )4 4xy 1 − 2x2 + 2y2 + r4
(1 + r2 )2 (1 + r2 )2 − 4y2
" #
4xy
=
(1 − r2 )4 4xy (1 + r2 )2 − 4x2
This is the new fundamental form. We need still to introduce the new coordinates (X, Y). We use
the short hands r2 = x2 + y2 and R2 = X 2 + Y 2 . By means of equation (7.16) we get
(1 + r2 )2 − 4x2 (1 + r2 )2 − 4y2
1 − X2 = , 1 − Y 2
= ,
(1 + r2 )2 (1 + r2 )2
4xy (1 − r2 )2
XY = , 1 − R 2
=
(1 + r2 )2 (1 + r2 )2
Thus the new fundamental form miracously simplifies to be
" # " #
E F 1 1 − Y2 XY
= (7.19)
F G (1 − R2 )2 XY 1 − X2
90 F. Rothe
For the line element we get from this fundamental form
ds2 = E dX 2 + 2F dXdY + G dY 2
(1 − Y 2 )dX 2 + 2XYdXdY + (1 − X 2 )dY 2
=
(1 − R2 )2
dX 2 + dY 2 − (XdY − YdX)2
=
(1 − X 2 − Y 2 )2

Problem 7.5 (Gaussian curvature of the Hilbert-Klein metric). Use Gauss’ characteristic
equation (7.5) and check directly that the Hilbert-Klein metric (7.15) from proposition 7.3 has
constant Gaussian curvature K = −1. We use polar coordinates X = r cos θ, Y = r sin θ and
convert formula to
dr2 + r2 (1 − r2 )dθ2
ds2 = (7.20)
(1 − r2 )2
since this simplifies the calculation considerably.
Answer. We have to put u = r and v = θ. The first fundamental form and its coefficients become
ds2 = Edr2 + 2Fdrdθ + Gdθ2

E = (1 − r2 )−2 , F = 0 , G = r2 (1 − r2 )−1
√
H = EG − F 2 = r(1 − r2 )−3/2
Hence we get for the Gaussian curvature K from the characteristic equation
∂ ∂G ∂ −1 ∂ r2
" # " #
2HK = − H −1 =− r (1 − r2 )3/2
∂r ∂u ∂r ∂r 1 − r2
∂ −1 2 2
∂ h
" #
2 3/2 2r(1 − r ) − r (−2r)
i
=− r (1 − r ) = − 2(1 − r 2 −1/2
)
∂r (1 − r2 )2 ∂r
= (−2)(−1/2)(1 − r2 )−3/2 (−2r) = −2r(1 − r2 )−3/2 = −2H
We get the constant Gaussian curvature K = −1, as expected.
Problem 7.6 (Distortion of angles by the Hilbert-Klein metric). We use polar coordinates to
simplify the calculation, and the corresponding contravariant components for the tangent vectors.
At a point K with polar coordinates X = r cos θ, Y = r sin θ are attached the radial tangent vector
(ar , aθ ) = (1, 0) and any other tangent vector (br , bθ ). Hence the apparent angle α satisfies
br rbθ
cos α = q and tan α =
br
b2r + r2 b2θ
Check with the Hilbert-Klein metric (7.20) that the angle ω between the two vectors in Klein’s
model satisfies
√
tan ω = tan α 1 − r2 (7.21)
Answer. The apparent angle α and the hyperbolic angle ω between the two tangent vectors (ar , aθ )
and (br , bθ ) satisfy
ar br + r2 aθ bθ
cos α = q q
(a2r + r2 a2θ ) · (b2r + r2 b2θ )
ar br + r2 (1 − r2 )aθ bθ
cos ω = q q
a2r + r2 (1 − r2 )a2θ · b2r + r2 (1 − r2 )b2θ
In the given example with ar = 1, aθ = 0 the expressions simplify

br r2 b2θ
cos α = q and tan2 α =
b2r
b2r + r2 b2θ
br r2 (1 − r2 )b2θ
cos ω = q and tan2 ω =
b2r
b2r + r2 (1 − r2 )b2θ
Hence √
tan ω = tan α 1 − r2
7.5. A second proof of Gauss’ remarkable theorem The most enlightened proof that the
Gaussian curvature is an intrinsic property of the surface uses Gauss’ notion of integral curvature.
For
R R any domain G on a given curved surface, the integral curvature is defined as the integral
G
KdA, where dA denotes the area element of the surface.
Take a geodesic triangle 4ABC. Let T denote the region bounded by the geodesics between
any three given points A, B, C on the surface. Let α, β, γ be the angles (between tangents) to the
geodesics at the three vertices. Gauss proves
Z Z
KdA = α + β + γ − π (7.22)
T
where angles are to be measured in radians. The quantity on the right hand side is the deviation of
the angle sum α + β + γ from the Euclidean value 180◦ , respectively π. 1 The quantity α + β + γ − π
is called the excess of the triangle 4ABC. For the hyperbolic case, the excess is negative. In that
case, one calculates using the excess times −1, which is called defect. In words, Gauss’ theorem
tells the following:
For a geodesic triangle, the integral curvature equals the excess of its angle sum.
This theorem, Gauss says, ought to be counted as a most elegant theorem.
I discuss a few immediate, but important consequences of (7.22). First of all, instead of
the complicated characteristic equation (7.5), one has a simple property of a geodesic triangle
from which to derive the Gaussian curvature in a limiting process. Secondly, as an immediate
implications of (7.22), the Gaussian curvature is an intrinsic property of a curved surface. Recall
that both geodesics, as well as measurement of area depend only on the first fundamental form.
Hence, because of (7.22), the same is true for the Gaussian curvature.
Another easy consequence of (7.22) is obtained from the special case of a sphere. For this
surface, the Gaussian curvature is constant, and equal to K = R−2 where R is the radius of the
sphere. Hence one obtains for the area of a spherical triangle A = (α + β + γ − π)R2 . as was already
known before Gauss, e.g. to Lambert.
1
In radian measures, the Euclidean angle sum is π.
92 F. Rothe
Problem 7.7. Tile a sphere by equilateral triangles. It can be done in three ways:
(i) Four triangles with α = β = γ = 120◦ .
(ii) Eight triangles with α = β = γ = 90◦ .
(iii) N triangles with α = β = γ = 72◦ .
Explain and draw these tilings. To which Platonic bodies do the vertices correspond? Determine
the surface area of the sphere from (i) and (ii), then get the number N in (iii).
Answer. From item (i), the area of the sphere is
!
2π
A = 4 · 3 − π R2 = 4πR2
3
The vertices of the four triangles form a tetrahedron. Similarly, item (ii) yields
π
A = 8 · 3 − π R2 = 4πR2
2
The vertices of the eight triangles form a octahedron.
We can now calculate the number N of triangles in the tiling (iii). Because of
!
2π
A = N · 3 − π = 4πR2
5
one gets N = 20. The vertices of the twenty triangles form an icosahedron.
Here is a further important consequence of equation (7.22).
Corollary 9 (A common bound for the area of all triangles). On a surface with negative constant
π
Gaussian curvature K < 0, the area of any triangle is less than .
−K
On December 17, 1799, Gauss wrote to his friend, the Hungarian mathematician Wolfgang
Farkas Bolyai (1775-1856):
As for me, I have already made some progress in my work. However, the path I have
chosen does not lead at all to the goal which we seek [deduction of the parallel
axiom], and which you assure me you have reached. It seems rather to compel me
to doubt the truth of geometry itself. It is true that I have come upon much which
by most people would be held to constitute a proof; but in my eyes it proves as
good as nothing. For example, if we could show that a rectilinear triangle whose
area would be greater than any given area is possible, then I would be ready to
prove the whole of Euclidean geometry absolutely rigorously.
Most people would certainly let this stand as an axiom; but I, no! It would indeed
be possible that the area might always remain below a certain limit, however far
apart the three angular points of the triangle were taken.
From about 1813 on Gauss developed his new geometry. He became convinced that it was logically
consistent and rather sure that it might be applicable. His letter written in 1817 to Olbers says:
I am becoming more and more convinced that the physical necessity of our Euclidean
geometry cannot be proved, at least not by human reason nor for human reason.
Perhaps in another life we will be able to obtain insight into the nature of space,
which is now unattainable. Until then we must place geometry not in the same
class with arithmetic, which is purely a priori, but with mechanics.
Problem 7.8. (a) Find two further enlightening statements of Gauss, and comments on all four
statements.
(i) Are they courageous?
(ii) Are they to the benefit of the scientific community?
(iii) Are they helpful for the person he addresses?
(iv) Are they just against other people?
(v) What would you have done in Gauss’ place?
(b) Choose two of Gauss’ comments. Write a letter as you imagine you would have written in
place of Gauss.
Problem 7.9. To test the applicability of Euclidean geometry and his non-Euclidean geome-
try, Gauss actually measured the sum of the angles of the triangle formed by three mountain
peaks in middle Germany: Broken, Hohenhagen, and Inselberg. the sides of the triangle were
69, 85, 197 km. His measurement yielded that the angle sum exceeded 180◦ by 14”.85.
√
(a) Use Herons formula A = s(s − a)(s − b)(s − c) to calculate the area of the triangle, in a very
good approximation.
(b) Take R = 6378 km as radius of the earth. Calculate the angle excess for a spherical triangle
between the three mountain peaks. You need to convert angular measurements! 1 radian
180◦ 180 · 3600”
equals = .
π π
(c) Is the triangle that Gauss measured actually a spherical triangle. Why or why not?
(d) Reflect on the motives why Gauss did his measurement. Find and read some further sources.
Think of the following and further motives and possibilities. Did Gauss really just want to
(i) check accuracy?
(ii) check geometry?
(iii) It was just a theoretical thought experiment, not really performed.
√
Answer. (a) Herons formula give the area A = s(s − a)(s − b)(s − c) = 2929.42 km2 .
(b) Take R = 6378 km as radius of the earth. The angle excess for a spherical triangle between the
three mountain peaks is
A
α + β + γ − π = 2 = 7.201 10−5 (7.23)
R
This is the value in radian measure. Converted to degrees, we get .00413◦ which is 14.9”.
(c) Of course the triangle one measures is not a spherical triangle, since light rays do not follow
the curvature of the earth.
94 F. Rothe
7.6. Principal and Gaussian curvature of rotation surfaces Before introducing the pseudo-
sphere, we need some facts about the curvature of general rotation surfaces. We take the graph of
an arbitrary function y = f (x), and rotate it about the y-axis to produce a rotation surface in three
dimensional space. The first principle curvature of a rotation surface in the xy plane is
y00
c1 = 3
(7.24)
(1 + y0 2 ) 2
This is just the curvature of the graph y = f (x).
Recall that the perpendicular to the tangent of a curve is called the normal of the curve. The
second principal curvature occurs for a section of the surface by a plane P2 , which intersects the
xy plane along the normal of the curve y = f (x), and is perpendicular to the xy plane. The second
principal curvature is
y0
c2 = 1
(7.25)
x(1 + y0 2 ) 2
Proposition 7.4. The Gaussian curvature of the rotation surface produced by rotating the graph
of y = f (x) around the y-axis, is the product
y0 y00
K= (7.26)
x(1 + y0 2 )2
The formula (7.24) for a curvature of a plane curve is standard. Finally, since K = c1 c2 ,
formulas (7.24) and (7.25) imply the claim (7.26).
Here is an argument to justify (7.25): Let tan β = y0 be tangent of the slope angle for y = f (x),
as usual. Calculate sin β. Calculate the hypothenuse AB of the right 4ABC, with vertex A = (x, y)
on the curve, leg AC parallel to the x-axis, leg BC on the axis of rotation, and hypothenuse AB
perpendicular to the curve. One can show that point B is the center of the best approximating circle
in the plane P2 . Hence c2 = AB
1
. Use this idea to get the main curvature c2 .
Answer. tan β = AC
BC
= y0 Hence
AC AC y0
sin β = = q = p
AB 2 2 1 + y0 2
BC + AC
1 sin β y0
c2 = = = 1
AB x x(1 + y0 2 ) 2
A second proof of Proposition 7.4 . On the surface of rotation, we choose as parameters u = φ the
rotation angle, and v = r the distance from the rotation axis. Since the y-axis is the axis of rotation,
the surface of rotation gets the parametric representation
x = v cos u
y = f (v)
z = v sin u
The derivatives by the parameters are
xu = −v sin u xv = cos u
yu = 0 yv = f 0 (v) (7.27)
zu = v cos u zv = sin u
Figure 7.1. Curvature of a rotation surface.
From these derivatives, one gets the first fundamental form. We use the general formulas
E = xu2 + y2u + z2u

F = xu xv + yu yv + zu zv (7.3)
G= xv2 + y2v + z2v
valid for any surface, and specialize to the surface of rotation given above. Now calculate
E, F, G from (7.3) and
√ (7.27), to get the first fundamental form (7.2). Next get the root of the
determinant H = EG − F 2 . Finally calculate the Gaussian curvature from the characteristic
equation (7.5).
Problem 7.10. Use the approach as indicated to confirm formula (7.26).
Answer. One gets E = v2 , F = 0 and G = 1 + f 0 2 and
ds2 = v2 du2 + (1 + f 0 2 )dv2 = r2 dφ2 + (1 + f 0 (r)2 )dr2

q
The root of the determinant is H = v 1 + f 0 2 . Because all four quantities E, F, G, H depend
only on v, the partial derivatives by u all vanish. Thus Gauss’ characteristic equation (7.5) can be
simplified to yield
1 ∂ F ∂E 1 ∂G 1 ∂ 2 ∂F 1 ∂E F ∂E
" # " #
K= − + − −
2H ∂u EH ∂v H ∂u 2H ∂v H ∂u H ∂v EH ∂u
1 ∂ 1 ∂E
" # " #
1 d 2v
= − =−
2H ∂v H ∂v 2H dv H
96 F. Rothe
1
Now use H = v(1 + f 0 2 ) 2 and go on. We arrive at
1
d(1 + f 0 2 )− 2
" #
1 d 2v 1
K=− =− q
2H dv H dv
v 1 + f 02
1 h 3i f 0 f 00
= (1 + f 0 2 )− 2 2 f 0 f 00 =
v(1 + f 0 2 )2
q
2v 1 + f 0 2
This result is equivalent to formula (7.26), since x = v = r is the distance from the axis of rotation
and y = f (x).
Problem 7.11. Calculate the Gaussian curvature of a three dimensional sphere of radius a.
Answer. The sphere is provided by rotating the graph of x2 + y2 = a2 about the y-axis. Implicit
differentiation yields 2x + 2yy0 = 0 and hence
x
y0 = −
y
1 · y − xy0 xy0 − y −x2 /y − y a2
y00 = − = = = −
y2 y2 y2 y3
y0 y00 a2 a2 1
K= = = =
x(1 + y0 2 )2 y4 1 + x2 /y2 2 (x2 + y2 )2 a2

7.7. The pseudo-sphere The issue is now to find a rotation surface of constant negative
Gaussian curvature K = −a−2 . Such a surface is called pseudo-sphere.
Problem 7.12. Use the formula (7.26) for the Gaussian curvature, and get a differential equation
of first order for the function u := y0 2 . You may begin by getting the derivative du
dx .
Answer. The derivative of the function u := y0 2 is du

dx = 2y0 y00 . Next, I put the requirement K = −a−2
into the formula (7.26). One gets
y0 y00 1
=− 2 (7.28)
x(1 + y0 2 )2 a
2x(1 + y0 2 )2
2y0 y00 = − (7.29)
a2
du 2x(1 + u)2
=− (7.30)
dx a2
Problem 7.13. Solve the differential equation
du 2x(1 + u)2
=− (7.30)
dx a2
by separation of variable. For simplicity, we use the initial data u(a) = 0, and get a curve through
the point x = a, u = 0.
Answer.
du 2x(1 + u)2
=−
dx a2
du 2xdx
=− 2
(1 + u)2 a
Z u Z x
du 2xdx
du = − dx
a (1 + u)2
0 a2
" #u " 2 # x
1 x
− = − 2
1+u a a 0
1 x2
− +1=− 2 +1
1+u a
a2
1+u= 2
x
a2 − x2
u=
x2
Problem 7.14. Check that the differential equation
√
a2 − x 2
y =−
0
x
with |x| ≤ a, has the solution
 √ 
 a + a2 − x2  √
y = a ln    − a2 − x2 + C (7.31)
x
Too, find the general solution of the equation

√
a2 − x 2
y =+
0
x
Answer.  √ 
 a − a2 − x2  √
y = a ln   + a2 − x2 + C
x
Definition 7.1 (Pseudo-sphere). The rotation surface with constant negative Gaussian curvature
is called a pseudo-sphere. With curvature K = −a−2 and the y-axis as axis of rotation, its equation
is √
 a + a2 − x2 − z2  p 2
 
y = a ln  √  − a − x2 − z2
x 2 + z2
Problem 7.15. Check, once more, that the Gaussian curvature of the specified surface is K = −a−2 .
Answer.
Problem 7.16. Check the following fact: The segment on the tangent to the curve (7.31), between
the touching point T , and the intersection S of the tangent with the y-axis has always the same
length a. For that reason, the curve (7.31) is called tractrix.
98 F. Rothe
Figure 7.2. The tractrix has a segment on its tangent of constant length.
Answer. Take the right 4T S C, formed by the segment T S on the tangent at point T , and the
perpendicular from T onto the y-axis. 1 We know that
SC
y0 = tan α =
TC
2 2 2
T S = TC + S C = x2 (1 + y0 2 ) = a2
Problem 7.17. The surface area of of rotation surface, made by rotating y = f (x) about the y-axis,
for x1 ≤ x ≤ x2 is Z q x2
S = 2πx 1 + y0 2 dx
x1
Calculate the surface of the pseudo-sphere for bounds 0 < x ≤ a.
a2
Answer. Because of 1 + y0 2 = 1 + u = x2
,
we get
Z a
a
S = 2πx dx = 2πa2
0 x
We introduce now (φ, r) as two convenient coordinates on the pseudo-sphere. As first
coordinate, we choose the angle of rotation φ about the y-axis. The second coordinate is the radius
1
This triangle is different from triangle 4ABC in the figure on page 95.
√
r = x2 + z2 measured from the axis of rotation, Up to now it was called x, but now I choose to
name it r. The three parameters r, φ, y are cylindrical coordinates of three dimensional space. The
first two of them are convenient parameters on the pseudo-sphere.
Proposition 7.5 (Riemann Metric for the Pseudo Sphere). The infinitesimal distance ds of points
with coordinates (φ, r) and (φ + dφ, r + dr) is
a2 2
ds2 = r2 dφ2 + dr (7.32)
r2
Proof. The distance on the pseudo-sphere is calculated from the usual Euclidean distance for
points of the three dimensional space into which the surface is embedded. At first, I convert
the distance from Cartesian to cylindrical coordinates. Because the y-axis is the rotation axis,
its coordinate stays, but the pair (x, z) is converted to polar coordinates. Hence one gets
ds2 = dx2 + dy2 + dz2 = dr2 + r2 dφ2 + dy2 (7.33)
We restrict to points on the pseudo-sphere.
√
Hence the coordinates r and y are related in the same
way as x and y before. Thus y = − x gets
0 a2 −x2
√
dy a2 − r 2
=− (7.34)
dr r
Now we use (7.34) to eliminate y from (7.33) and get
!2
dy
ds2 = dr2 + r2 dφ2 + dy2 = dr2 + r2 dφ2 + dr2
dr
a2 − r 2 2 a2 2
= dr2 + r2 dφ2 + dr = r 2
dφ 2
+ dr
r2 r2
as to be shown. As an alternative, we can use the first fundamental form calculated above. Since
2 2 2
1 + f 0 (r)2 = 1 + a r−r
2 = ar2 , one gets again
a2 2
ds2 = r2 dφ2 + (1 + f 0 (r)2 )dr2 = r2 dφ2 + dr
r2

7.8. Poincaré half-plane and Poincaré disk Throughout, we denote the upper open half-
plane by H = {(u, v) : v > 0}. Its boundary is just the real axis ∂H = {(u, v) : v = 0}. The open unit
disk is denoted by D = {z = x + iy : x2 + y2 < 1}, and its boundary is ∂D = {z = x + iy : x2 + y2 = 1}.
The following isometric mapping of the half-plane to the disk is used in this section. It differs
from the one used in the previous section by a rotation of the disk by a right angle. We repeat for
convenience.
Proposition 7.6 (Isometric Mapping of the Half-plane to the Disk). The linear fractional
function
iw + 1
z= (7.35)
w+i
is a conformal mapping and a bijection from C ∪ {∞} to C ∪ {∞}. The inverse mapping is
1 − iz
w= (7.36)
z−i
100 F. Rothe
These mappings preserves angles, the cross ratio, the orientation, and map generalized circles to
generalized circles.
The upper half-plane H = {w = u + iv : v > 0} is mapped onto the unit disk D = {z = x + iy :
x2 + y2 < 1}. Especially
w = 0 7→ z = −i , w = 1 7→ z = 1 , w = ∞ 7→ z = i , w = i 7→ z = 0
Proposition 7.7 (Riemann Metric for Poincaré’s half-plane). In the Poincaré half-plane, the
infinitesimal hyperbolic distance ds of points with coordinates (u, v) and (u + du, v + dv) is
du2 + dv2
(dsH )2 = (7.37)
v2
The mapping (7.35) provides an isometry between the half-plane and the disk:
dsD = dsH (4.5)
Proof. The metric of the half plane is calculated from the known metric
4(dx2 + dy2 )
(dsD )2 = (7.14)
(1 − x2 − y2 )2
of the Poinaré disk model. The mapping
iw + 1
z= (7.35)
w+i
provides an isometry from the half-plane to the disk. The denominator is
(w + i)(w − i) − (iw + 1)(−iw + 1) 2iw − 2iw 4v
1 − | z|2 = = =
|w + i|2 |w + i| 2 |w + i|2
The derivative of the mapping (7.35) is
dz 2
=−
dw (w + i)2
Putting the last two formulas into (7.14) yields
4(dx2 + dy2 ) 4| dz|2
ds2 = =
(1 − x2 − y2 )2 (1 − |z|2 )2
!2 !2
|w + i|2 |w + i|2
2
2 2

dz
= 4 | dw|
2
= 4 | dw|2
dw 4v (w + i)2 4v
| dw| 2
du + dv
2 2
= 2 =
v v2
Hence formula (7.37) arises from the isometry (7.35) between half-plane and the disk.
7.9. Embedding the pseudo-sphere into Poincaré’s half-plane
Proposition 7.8. The mapping
a
w=φ+i (7.38)
r
transforms the line element dsPS of the pseudo-sphere to the line element dsH of the half-plane
such that
dsPS = a (dsH ) (7.39)
For a = 1, we get an isometry. This is just the case with Gaussian curvature K = −a−2 = −1.
Because an isometry conserves the Gaussian curvature, this shows that the Poincaré half-plane
has Gaussian curvature −1.
Proof. We separate equation (7.38) into its real- and imaginary part to get
a
u=φ, v= (7.40)
r
Using its derivatives, we plug into
du2 + dv2
dsH 2 = (7.37)
v2
One gets
2
dφ2 + −ar−2 dr2 a2
!
dsH =
2
=a −2
r dφ + 2 dr2
2 2
a2 r−2 r
Now comparing with
a2 2
ds2PS = r2 dφ2 + dr (7.32)
r2
one concludes
dsH 2 = a−2 (dsPS )2
and hence equation (7.39) holds.
7.10. Embedding the pseudo-sphere into Poincaré’s disk The next goal is to construct an
isometric mapping from the pseudo-sphere to the Poincaré disk. It is convenient to get this mapping
as composition of a mapping from the pseudo-sphere to the half-plane, and the conformal mapping
from the half-plane to the disk.
Proposition 7.9. We take a pseudo-sphere with a = 1. This normalizes the Gaussian curvature to
be K = −1. The mapping
r − 1 + irφ
z= (7.41)
rφ + i(r + 1)
maps the pseudo-sphere isometrically into the Poincaré disk.
Proof. The mapping (7.41) is constructed as a composition of two mappings PS 7→ H 7→ D. Take

the mapping PS 7→ H given by equation (7.42), and the mapping H 7→ D given by equation (7.35).
The composition of the mapping
1
w=φ+i (7.42)
r
from the pseudo-sphere to the Poincaré half-plane, with the mapping
iw + 1
z= (7.35)
w+i
from the Poincaré half-plane to the Poincaré disk is the required mapping. following mapping

i φ + ri + 1 r − 1 + irφ
z= =
φ+ r +i
i rφ + i(r + 1)
from the pseudo-sphere to the Poincaré disk. Both mappings (7.42) and (7.35) are isometries, as
stated by formulas (7.39) with a = 1 and formula (??). Hence their composition (7.41) conserves
the line element: dsPS = dsH = dsD .
102 F. Rothe
7.11. About circle-like curves We now go back to the Poincaré disk model. At first, here are
a few remarks about circle-like curves. In hyperbolic geometry, there exist three different types of
circle-like curves. I define as a circle-like curve a curve which appears to be a circle in the Poincaré
model.
Recall that ∂D is the boundary circle of the Poincaré disk. Take any second circle C. I call
its Euclidean center M the quasi-center. The meaning of C for the hyperbolic geometry of the
Poincaré disk depends on the nature of the intersection of the two circles C and ∂D. There are three
important cases:
(i) The circle C lies totally in the interior D. In that case, it is a circle for hyperbolic geometry.
This circle has a center A in hyperbolic geometry. Note that the quasi-center M is different from
the center A of C as an object of hyperbolic geometry.
(ii) The circle C touches the boundary ∂D from inside, say at endpoint E. In that case, it is a
horocycle for hyperbolic geometry. A horocycle has no hyperbolic center, instead it contains an
ideal point E. Hence it is unbounded. The hyperbolic circumference of a horocycle is infinite, as
follows from part (c) below.
(iii) The circle C intersects the boundary ∂D at two endpoints E and F. In that case, the
circular arc inside the disk D is either an equidistance line or a geodesic for hyperbolic geometry.
A geodesic intersects ∂D perpendicularly. In the case of non perpendicular intersection of ∂D and
C, one gets an equidistance line. Actually all points of that equidistance line have the same distance
from the hyperbolic straight line with ends E and F.
Problem 7.18. Take points Y+ = (1, 0) and O = (0, 0). Find the analytic equation for a horocycle
H with apparent diameter OY+ .
Answer. In complex notation, point Y+ is i. The quasi-center is M = 2i , and the apparent radius is
1
2 . Hence one gets the equation
2
z − i = 1
2 4
!2
1 1
x2 + y − − =0
2 4
x2 + y(y − 1) = 0
We need another more convenient parametric equation for the horocycle H. Let Z = (x, y) be
any point on H and define the circumference angle d ∠OY+ Z. Calculate tan d in terms of (x, y).
Then express x and y in terms of the central angle 2d ∠OMZ. Use double angle formulas, and
finally express x and y in terms of tan d.
Answer.
x y
tan d = =
1−y x
sin 2d tan d
x= = sin d cos d = (7.43)
2 1 + tan2 d
1 − cos 2d tan2 d
y= = sin2 d =
2 1 + tan2 d
Problem 7.19. Confirm that the hyperbolic arc length of the arc OZ on the horocycle H is just
s = 2 tan d.
Figure 7.3. Measuring an arc of a horocycle.
Answer. Let t = tan d. Differentiation yields

t dx 1 − t2
x= , =
1 + t2 dt (1 + t2 )2
t2 dy 2t
y= , =
1 + t2 dt (1 + t2 )2
!2 !2
1 dx dy 1
1 − x 2 − y2 = 1 − y = + =
1 + t2 dt dt (1 + t2 )2
Hence the hyperbolic metric (7.14) implies
!2
 dx 2
 ! !2 
ds dy 
= 4(1 − x2 − y2 )−2  + =4
dt dt dt 
Hence by elementary integration s = 2t = 2 tan d.
Problem 7.20. Give the representation of the horocycle H with this arc length as parameter.
Explain in a drawing, how to measure the arc length on this horocycle.
Answer. We get the parametrization
2s
x=
4 + s2
(7.44)
s2
y=
4 + s2
The hyperbolic arc length of OZ on the horocycle H is the Euclidean length |Y− Z 0 | since s =
2 tan d = |Y− Z 0 |.
104 F. Rothe
Figure 7.4. Isometry of the sliced pseudo-sphere to a half infinity strip.
7.12. Mapping the boundaries There cannot exist an isometry of between the pseudo-sphere
and the half-plane, since they have different topologies.
A corresponding problem already arises in Euclidean geometry, for the construction of for an
isometry between the cylinder and the plane. At least, there exists a non-invertible homomorphism
from the plane onto the cylinder. This homomorphism can be restricted to on isomorphism between
a strip of the plane and the sliced cylinder.
We return to the hyperbolic case. By slicing the pseudo-sphere, we get an isomorphism of the
sliced pseudo-sphere into a strip of the half-plane, and furthermore into part of the disk, too.
The pseudo-sphere is sliced along the geodesic in the negative (x, y)-plane, restricting the
rotation angle to the half-open interval −π ≤ φ < π. The mapping
1
w=φ+i (7.42)
r
maps the sliced pseudo-sphere onto a half open rectangular domain PS H in the upper half-plane.
The boundary of PS H consists of a segment AB with the endpoints A = −π + i, B = π + i, and two
−−→ −−→
unbounded rays A∞ and B∞ with vertices A and B parallel to the positive v axis. Furthermore, we
map the sliced pseudo-sphere to the Poincaré disk via the isometry (7.41). The image PS D of the
pseudo-sphere is a part of the interior of the horocycle H with apparent diameter 0 to i.
Problem 7.21. On the pseudo-sphere, we use as parameters the cylindrical coordinates r and φ.
The boundary of the sliced pseudo-sphere is given by r = 1, and −π < φ < π. To which curve in
the disk D is the boundary mapped by the isometry (7.41)?
Answer. The boundary ∂PS D of image PS D consists of three circular arcs. A segment AB of a
horocycle H with endpoints
iπ iπ
A= , B=
π + 2i π − 2i
−−−→ −−−→
as well as two geodesic rays AY+ and BY+ with vertices A and B pointing to the ideal endpoint
Y+ = i.
Figure 7.5. Isometric image of the sliced pseudo-sphere in the Poincaré disk.
Problem 7.22. Give a parametric equation for the boundary, with parameter φ, at first in complex
notation for z = x + iy. Then separate into real and imaginary parts to get equations for x and y.
Answer. Simply put r = 1 into equation (7.41). One gets
iφ
z=
φ + 2i
To separate real and imaginary parts, one needs a make the denominator real:
iφ iφ(φ − 2i)
z= = (7.45)
φ + 2i (φ + 2i)(φ − 2i)
iφ2 + 2φ
x + iy = 2 (7.46)
φ +4
2φ
x= 2 (7.47)
φ +4
φ2
y= 2 (7.48)
φ +4
Problem 7.23. Check that your parametric equation is a circle with center 2i .
i
Answer. This is a parametric equation of a circle with center 2 because
i iφ i i(φ − 2i)
z− = − =
2 φ + 2i 2 2(φ + 2i)

z − = 1
i
2 2
(d) Compare (7.46) with the result
tan d
x=
1 + tan2 d
tan2 d
y=
1 + tan2 d
106 F. Rothe
and check that φ = 2 tan d. Hence, because of s = 2 tan d was shown, one concludes that
φ = s = 2 tan d. Of course φ = s follows directly because of the isometries PS 7→ H 7→ D.
Problem 7.24. Draw sketches of the pseudo-sphere as it appears in the domains PS 7→ H 7→ D.

Use different colors for the different vertices and edges of the boundary, but same colors for
corresponding objects in all three domains PS 7→ H 7→ D, as they are mapped by our isometries
from above.
Problem 7.25. The Poincaré disk can be tiled with congruent triangles. Indeed, there exist
infinitely many different types of such tilings. I choose a tiling with congruent equilateral triangles,
such that at each vertex seven triangles meet. Use Gauss’ remarkable theorem to calculate the
hyperbolic area of one such triangle.
Answer. I measure angles in radian measure. The angles of one triangle are all
2π
α=β=γ=
7
and hence the defect of the angle sum is
6π π
α+β+γ−π= −π=−
7 7
Since the Gaussian curvature of Poincaré’s model is K = −1, the area is just the negative of the
excess, (this is also called the defect) and is π7 .
Problem 7.26. Overlay two drawings, of the tiling by equilateral triangles, and a second drawing
of the image PS D of the pseudo-sphere and its boundary ∂PS D .
How many of those triangles of the tiling fit entirely into PS D ?
How many triangles make up (with bids and pieces!) the total area of PS D ?
Answer. Using item 12, one calculated that the total area of the pseudo-sphere is 2π. This equals
the area of 14 triangles.
List of Figures
2.1 Aberration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Parallax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Aberration of the rain—and the light. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 The one-dimensional Doppler effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 The principal setup of Compton’s experiment. . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 The kinematics of the collision of a photon with an electron, initially at rest. . . . . . . . . . 37
7.1 Curvature of a rotation surface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.2 The tractrix has a segment on its tangent of constant length. . . . . . . . . . . . . . . . . . 98
7.3 Measuring an arc of a horocycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.4 Isometry of the sliced pseudo-sphere to a half infinity strip. . . . . . . . . . . . . . . . . . 104
7.5 Isometric image of the sliced pseudo-sphere in the Poincaré disk. . . . . . . . . . . . . . . . 105

(Rothe) Topic From Relativity (2010)

Uploaded by

Copyright:

Available Formats

(Rothe) Topic From Relativity (2010)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Rothe) Topic From Relativity (2010)

Uploaded by

Copyright:

Available Formats

Topics from Relativity 1

SEVERAL TOPICS FROM RELATIVITY

2010 Mathematics subject classification: 51Fxx; 51Pxx; 70H40.

3 The Lorentz Group 45

4 The Poincaré Half-Plane Model 56

6 Geodesics in the Schwarzschild metric 75

7 Gauss’ Differential Geometry and the Pseudo-Sphere 83

Franz Rothe, Department of Mathematics and Statistics

Definition 1.1 (Components of a contravariant vector). The components of a contravariant

for a, b = 1 . . . n and extending by linearity.

Lemma 1.2. Under any point transformation

h∇S , vi = h(∂a S )ωa , vb eb i = (∂a S )vb δab = (∂a S )va

Lemma 1.4. By the rule

and extending by linearity. An injective linear mapping id : T P 7→ T P∗∗ is defined by setting

with any a1 . . . aq , c1 . . . cq and b1 . . . br , d1 . . . dq in 1 . . . n and extending by linearity. The

Lemma 1.5. The rule

holds for any a1 . . . aq ∈ T p , c1 . . . cq ∈ T P∗ and b1 . . . bq ∈ T P∗ , d1 . . . dq ∈ T P .

Γabc = hωa , ∂c eb i (1.13)

Lemma 1.6. An affine connection yields

h∂c ωa , eb i = −Γabc (1.14)

∇c (va fa ) = ∂c (va fa ) = (∂c va ) fa + va (∂c fa )

If vabd f c = ∇c tabd f and one contracts to ya f = tabb f then ∇c ya f = vabb f c

But we need the covariant derivatives

Definition 1.12 (Intrinsic derivative of a vector along a curve). For a field v = va ea of

Proof. The contraction

1.4. Riemannian manifold

Definition 1.14 (Riemannian manifold). A Riemannian manifold M is a differentiable manifold

Proof. For the base vectors, the requirement (1.33) gives

Given a point P on the pseudo-Riemannian manifold, as used in general relativity. We know

Definition 1.16 (Tetrad). The base vectors satisfying

are called a tetrad. They are denoted by carotsb .

or the flow x0 = Φ(ε, x) and its induced flow t0 = Φ(ε,

The first term comes from partial derivatives:

= (v s ∂ s ha )ka + ha (v s ∂ s ka ) = v s ∂ s (ha ka ) = Lv (ha ka )

In special relativity, it is customary to introduce the dimensionless parameters

Answer. • x = vt ⇔ x0 = 0 yields D = −Ev;

0 = ct0 − x0 = (cA − D)t + (cB − E)x = (cA − D + c2 B − cE)t

• x = −ct ⇔ x0 = −ct0 yields −cA − D + c2 B + cE = 0 since

0 = ct0 + x0 = (cA − D)t + (cB − E)x = (cA − D − c2 B + cE)t

Definition 2.1 (Isochronic Lorentz-, proper Lorentz transformation). An isochronic Lorentz

(i) The proper Lorentz transformations are a group.

(iii) The rotations without boost leave the time invariant.

Thus both the boost

Problem 2.4. Convince yourself that

The emission and reflection events have S 0 -coordinates

ctA = γct0 + βγx0 = γcd

The S -coordinates of reflection event B are

ctB = γct0 + βγx0 = 2γcd

ct = γct0 + βγx0 = γct0

In the S -system, the equations of light ray AB are

Figure 2.1. Aberration.

Figure 2.2. Parallax.

Figure 2.3. Aberration of the rain—and the light.

A second source is dtv Atlas der Astronomie.

Figure 2.4. The one-dimensional Doppler effect.

Definition 2.2 (Perpendicular subspace). For any subspace U ⊆ R4 , the Minkowski-perpendicular

Definition 2.3 (Present, future, future light-cone)..

The future cone F uture = {(ct, x, y, z) : c2 t2 − x2 − y2 − z2 > 0 and t > 0}.

The future light-cone Light+ = {(ct, x, y, z) : c2 t2 − x2 − y2 − z2 = 0 and t ≥ 0}.