aqm-23

Advanced Quantum Mechanics
— Incomplete Notes —
DAVID G ROSS
Institute for Theoretical Physics
University of Cologne
M AY 5, 2024
Contents
Contents 1
1 Multi-partite quantum systems 5

1.1 Mixed states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Visualizing mixed states: The Bloch ball . . . . . . . . . . . . . 6
1.1.2 Time evolution of density operators . . . . . . . . . . . . . . . . 8
1.1.3 Dynamics of a noisy spin . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Multi-partite Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Tensor product Hilbert spaces . . . . . . . . . . . . . . . . . . . 11
1.2.2 The partial trace . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Dynamics of coupled systems . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.1 The measurement and the classicality problem . . . . . . . . . . 14
1.3.2 A quantum model for measurements . . . . . . . . . . . . . . . . 16
1.4 Quantum many-body systems as computers . . . . . . . . . . . . . . . . 19
1.4.1 Grover’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Bell inequalities and their implications . . . . . . . . . . . . . . . . . . . 26
1.5.1 The CHSH scenario . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.2 Operational consequences of Bell inequality violations . . . . . . 30
1.5.3 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 Indistinguishable particles 36
2.1 Bosonic and Fermionic Hilbert spaces . . . . . . . . . . . . . . . . . . . 36
2.1.1 Permutations and occupation numbers . . . . . . . . . . . . . . . 37
2.1.2 Single-particle operators . . . . . . . . . . . . . . . . . . . . . . 40
2.1.3 The exchange interaction . . . . . . . . . . . . . . . . . . . . . . 41
2.2 Second quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.1 Fock space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.2 Creation and annihilation operators . . . . . . . . . . . . . . . . 44
2.2.3 Single- and two-particle operators . . . . . . . . . . . . . . . . . 47
2.3 Quasiparticles and collective excitations . . . . . . . . . . . . . . . . . . 51
2.3.1 Phonons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2 Global phase gauge symmetry and particle number conservation . 53
2.4 Bose gas: Take 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.1 Approximate solution part 1 . . . . . . . . . . . . . . . . . . . . 55
2.5 Detour: Spontaneous symmetry breaking . . . . . . . . . . . . . . . . . 55
2.5.1 Ferromagnetism . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5.2 SSB and Bose-Einstein condensation . . . . . . . . . . . . . . . 57
1
CONTENTS 2
2.6 Bose gas: Take 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

2.7 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3 Field quantization and quantum theory of light 63

3.1 Phonon continuum limit . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Quantization of the EM field . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3 States of the EM field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.1 Number states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.2 Coherent states . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.4 Light-matter interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4.1 Spontaneous emission . . . . . . . . . . . . . . . . . . . . . . . 70
3.5 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4 Relativistic quantum mechanics 73

4.1 Special relativity recap . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.1.1 Space-time symmetries . . . . . . . . . . . . . . . . . . . . . . . 74
4.1.2 Minkowski geometry . . . . . . . . . . . . . . . . . . . . . . . . 76
4.1.3 Transformation behavior of vectors and fields . . . . . . . . . . . 77
4.1.4 Ricci calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 The Klein-Gordon Equation . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.1 Superluminal solutions to the Klein-Gordon equation . . . . . . . 80
4.2.2 The Klein-Gordon field . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.3 Microcausality . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.4 Anti-particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 The Dirac equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.1 Momentum representation of the Dirac equation . . . . . . . . . 88
4.3.2 Lorentz invariance of the Dirac equation . . . . . . . . . . . . . . 91
4.3.3 Spin representations of the Lorentz group . . . . . . . . . . . . . 92
4.4 Non-relativistic limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5 Symmetries in Quantum Mechanics 97
A Quantum mechanics recap 98

A.1 Linear algebra of Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . 98
A.1.1 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.1.2 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.1.3 Dirac notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.1.4 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
A.1.5 The adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
A.1.6 Spectral decomposition (discrete case) . . . . . . . . . . . . . . . 104
A.1.7 Spectral decomposition (continuous case) . . . . . . . . . . . . . 105
A.1.8 More on delta distributions . . . . . . . . . . . . . . . . . . . . . 108
A.1.9 More on Fourier transforms . . . . . . . . . . . . . . . . . . . . 109
A.1.10 Functions of operators . . . . . . . . . . . . . . . . . . . . . . . 112
A.1.11 Unitary operators . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.1.12 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.1.13 The trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.1.14 Commuting operators . . . . . . . . . . . . . . . . . . . . . . . 114
A.2 Some concrete systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.2.1 A single harmonic oscillator . . . . . . . . . . . . . . . . . . . . 115
CONTENTS 3
A.2.2 Normal modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

A.2.3 Central potentials . . . . . . . . . . . . . . . . . . . . . . . . . . 118
A.2.4 Fermionic oscillator . . . . . . . . . . . . . . . . . . . . . . . . 119
A.3 Perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.3.1 Fermi’s golden rule . . . . . . . . . . . . . . . . . . . . . . . . . 119
B Miscellaneous Integrals 122

B.1 Gaussian and Fresnel integrals . . . . . . . . . . . . . . . . . . . . . . . 122
B.2 Some Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C Function spaces and distributions 124

C.1 Square-integrable functions . . . . . . . . . . . . . . . . . . . . . . . . . 124
C.1.1 Why go beyond L2 ? . . . . . . . . . . . . . . . . . . . . . . . . 125
C.2 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.2.1 Schwartz space . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.2.2 Tempered distributions . . . . . . . . . . . . . . . . . . . . . . . 127
C.2.3 Operations on distributions . . . . . . . . . . . . . . . . . . . . . 129
C.3 Topological aspects, more pedantry, and generalizations . . . . . . . . . . 133
D Green’s functions 136

D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
D.2 Green’s functions from Fourier transforms . . . . . . . . . . . . . . . . . 137
D.2.1 Direct integration . . . . . . . . . . . . . . . . . . . . . . . . . . 138
D.2.2 Complex integration . . . . . . . . . . . . . . . . . . . . . . . . 139
D.2.3 Using the resolvent . . . . . . . . . . . . . . . . . . . . . . . . . 140
D.2.4 Using the principal value . . . . . . . . . . . . . . . . . . . . . . 141
This symbol indicates that you may skip forward without missing much.
CONTENTS 4
Warm-up
To warm up, let’s recall the most basic notations from undergraduate quantum mechanics.
For more details, consult Sec. A of the Appendix.
• With every quantum system, one associates a Hilbert space H.

• Each state corresponds to a normalized vector |ψ⟩ ∈ H.
• Observable quantities are associated with Hermitian operators on H. If A = A† is
Hermitian, it has an eigendecomposition
X
A= λi |ϕi ⟩⟨ϕi |,
i
where the {|ϕi ⟩}i are an ortho-normal basis for H and the λi ∈ R the eigenvalues
of A. The possible numerical outcomes of a measurement process are then the λi ,
the i-th one occurring with probability
Prψ [i] = |⟨ϕi |ψ⟩|2 = tr(|ψ⟩⟨ψ|ϕi ⟩⟨ϕi |)
if the system is in a state described by |ψ⟩. If one repeats the measurement many
times, the average of the observed outcomes will then tend to the expectation value
X X
⟨A⟩ψ = λi Prψ [i] = λi |⟨ϕi |ψ⟩|2 = ⟨ψ|A|ψ⟩ = tr(|ψ⟩⟨ψ|A).
i i
• Also, every system is associated with a distinguished Hermitian operator, the Hamil-
tonian H. It serves two roles:
– It is the observable describing energy measurements.
– It determines the time evolution of the system via Schrödinger’s equation
iℏ∂t |ψ(t)⟩ = H|ψ(t)⟩.
• We usually choose a preferred basis for every Hilbert space H, ideally with a clear
physical interpretation. If the Hamiltonian is non-degenerate, the eigenbasis of H is
a natural choice. In this case, saying that the system is in an eigenstate with given
energy completely specifies the state vector.
– Example: The harmonic oscillator, with |n⟩ defined by
1
H|n⟩ = ℏω(n + )|n⟩.
2
If the Hamiltonian is degenerate, it is natural to add additional observables commut-
ing with H until their common eigenbasis is unique.
– Example: The bound states of the hydrogen atom, for which |n, l, m⟩ is defined
by
H|n, l, m⟩ = En |n, l, m⟩,

L2 |n, l, m⟩ = ℏ2 l(l + 1)|n, l, m⟩,
Lz |n, l, m⟩ = ℏm|n, l, m⟩.
Chapter 1
Multi-partite quantum systems
1.1 Mixed states

Goals
In undergraduate QM, “state” means “Hilbert space vector”. To describe noisy
systems (not terribly deep, but practically important) or entangled systems (much
deeper and increasingly important!) one needs to widen the concept of “state” to
include mixed states represented by density operators.
Imagine a process that prepares the state |ψj ⟩ with probability qj . The probabilities
could e.g. reflect fluctuations of control fields, see below. The collection of states |ψj ⟩ and
probabilities qj is called an ensemble. We do not require that the states |ψj ⟩ be orthogonal
to each other.
If we measure an observable A on this ensemble, the expected value will be
X X
qj tr |ψj ⟩⟨ψj |A = tr qj |ψj ⟩⟨ψj | A .
j j
Thus, the statistics of the experiment are described by replacing the projection |ψ⟩⟨ψ| with
the more general density operator
X
ρ := qj |ψj ⟩⟨ψj | (1.1)
j
so that ⟨A⟩ = tr(ρA). The density operator ρ has the following properties:
1. It is Hermitian ρ† = ρ,
2. Its eigenvalues form a probability distribution (which is equal to the qj if and only
if the states |ψj ⟩ are orthogonal).
Conversely, every operator with these two properties can be realized by an ensemble as in
(1.1).
Here’s the proof. Equation (1.1) implies the normalization property

X X
tr ρ = qj tr |ψj ⟩⟨ψj | = qj = 1. (1.2)
j j
5
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 6
and the positivity property

X X
⟨ϕ|ρ|ϕ⟩ = qj ⟨ϕ|ψj ⟩⟨ψj |ϕ⟩ = qj |⟨ϕ|ψj ⟩|2 ≥ 0. (1.3)
j j
The density operator ρ is Hermitian

P because every summand in (1.1) is. It thus has
an eigendecomposition ρ = j λj |ϕj ⟩⟨ϕj |, and the above implies
X
λj = tr ρ = 1, λj = ⟨ϕj |ρ|ϕj ⟩ ≥ 0,
j
P
which shows the first claim. Conversely, if ρ = j pj |ϕ⟩⟨ϕ| with pj a distribution,
then the eigendecomposition already forms an ensemble realization.
If ρ is a density operator with only one non-zero eigenvalue, then ρ = |ψ⟩⟨ψ|. In this
case, we say that ρ describes a pure state. Otherwise, the state is mixed.
Example: Canonical ensemble. Consider a classical system where the i-th microstate
has energy Ei . Then, in the Gibbs ensemble, we expect to find the i-th state with probabil-
ity
1 −Ei /(kT ) X
pi = e , Z= e−Ei /(kT ) .
Z i
Here, k is P
the Boltzmann constant, T the temperature, and Z the partition function. Now
let H = i Ei |Ei ⟩⟨Ei | be a quantum-mechanical Hamiltonian. The quantum Gibbs
ensemble is, by definition, the one described by the density operator
1 X 1
ρ= pi |Ei ⟩⟨Ei | = e−H/(kT ) , Z = tr e−H/(kT ) .
Z i Z
Thus, ρ is the operator that is diagonal in the eigenbasis of the Hamiltonian and has the
classical canonical probabilities as eigenvalues. Convince yourself: ρ is pure if and only if
T = 0 and there is a unique ground state.
von Neumann entropy. Density matrices allow us to define a quantum-mechanical no-

tion of entropy. Indeed, recall that with a classical probability distribution p, one associates
the Shannon entropy
X
S(p) = − pi log pi , with convention: 0 log 0 = 0.
i
If ρ is a density operator, then the von Neumann entropy H(ρ) is defined as the Shannon
entropy of its eigenvalues. In addition to its central role in statistical physics, von Neumann
entropy can also be used to quantify entanglement, as we will see later.
1.1.1 Visualizing mixed states: The Bloch ball

For spin-1/2 degrees of freedom, one can easily visualize the set of density operators.
Indeed, any 2 × 2 matrix is of the form
3
1 a0 + a3 a1 − ia2 1X
A= = ai σi , ai = tr σi A (1.4)
2 a1 + ia2 a0 − a3 2 i=0
Figure 1.1: Up to a factor of ℏ2 , the i-th component ai = tr ρσi of the Bloch vector
is the expectation values of the angular momentum along the ei -axis. The length of the
Bloch vector encodes the “purity” of the state. Take an ensemble decomposition ρ =
(j)
P
j j j ⟩⟨ψj | of a density operator ρ. If a
q |ψ is the Bloch vector of the j-th state, then
the Bloch representation of ρ is the convex combination a = j qj a(j) .
P
(this is just saying that the Pauli matrices form a basis of the linear space of matrices). One
directly sees that the matrix is Hermitian iff the ai are real and has trace equal to one iff
a0 = 1. Thus density operators are of the form
1
ρ= (1 + a · σ), (1.5)
2
where a ∈ R3 is the Bloch vector. The eigenvalues of ρ are non-negative iff the Bloch
vector lies in the unit ball of R3 ; it lies on the unit sphere exactly if ρ is pure.
To see this, use (1.4) to compute det ρ = 41 (1−∥a∥2 ). Because tr ρ = 1, the eigen-
values are of the form λ, (1 − λ). The determinant is the product of the eigenvalues,
so that
1 1
λ(1 − λ) = (1 − ∥a∥2 ) ⇔ λ= (1 ± ∥a∥).
4 2
The maximally mixed state. The center point of the Bloch ball seems special. From
(1.5), it corresponds to ρ = 21 1. For a d-dimensional Hilbert space, ρ = d1 1 is called the
maximally mixed state. It has eigenvalues (1/d, . . . , 1/d) and thus entropy log d, which
is the highest one can get in d dimensions. In statistical physics language, the maximally
mixed state is thus the Gibbs state for T → ∞.
Non-uniqueness of ensemble decompositions. From Fig. 1.1, it is geometrically obvi-

ous that there are many different ensembles that realize any given mixed state. In particular,
the maximally mixed state in d dimensions can be expressed as
d
1
1=
X
|ψj ⟩⟨ψj |
d i=1
for any ONB {|ψj ⟩}j . (This is just the completeness relation for the basis). What seems
like a geometric curiosity at this point is in fact fundamental for a number of uniquely
quantum phenomena, in particular to quantum steering. We’ll come back to this point
later.
1.1.2 Time evolution of density operators

The noise-free dynamics of pure states is described by the Schrödinger equation. What is
the analogue for density matrices?
t
Applying the formal solution |ψ(t)⟩ = e iℏ H |ψ(0)⟩ of the Schrödinger equation to an
ensemble, we get
t t t t
X X
ρ(t) = qi |ψi (t)⟩⟨ψi (t)| = qi e iℏ H |ψi (0)⟩⟨ψi (0)|e− iℏ H = e iℏ H ρ(0)e− iℏ H .
i i
Differentiating with respect to t:

t t
1
∂t ρ = ∂t e iℏ H ρ(0)e− iℏ H = (Hρ − ρH)
iℏ
which gives the quantum Liouville equation:
iℏ∂t ρ = [H, ρ]. (1.6)
It is called so, because it is the quantum analogue of the classical Liouville equation ∂t ρ =
{H, ρ}, which governs the time evolution of a probability density ρ on phase space. Up to
a sign, the quantum Liouville equation is the same as the Heisenberg picture time evolution
for observables (why?):
t t
A(t) = e− iℏ H Ae iℏ H , iℏ∂t A = [A, H].
1.1.3 Dynamics of a noisy spin

Lamor precession
Recall the noise-free time evolution of a spin in a magnetic field. Plugging the Hamiltonian
γℏ
H=− B·σ
2
and the Bloch ball description of the state ρ = 1
2( 1 + a · σ) into the Liouville equation
gives
3
1 γℏ X γℏ X
iℏ ∂t a(t) · σ = [H, ρ] = − [Bi σi , aj σj ] = −i ϵijk Bi aj σk ,
2 4 ij=1 2
ijk
or ∂t a = γa × B. In particular, for B = Bez , with ω := γB the Lamor frequency,

 
ax cos(ωt) + ay sin(ωt)
(ax − iay )eiωt

1 1 + az
a(t) = −ax sin(ωt) + ay cos(ωt) ⇒ ρ(t) = .
2 (ax + iay )e−iωt 1 − az
az
Thus, the main diagonal of the density matrix (corresponding to the spin component paral-
lel to the field) remains constant, while the off-diagonal (corresponding to the spin compo-
nents orthogonal to the field) picks up a complex phase factor oscillating with the Lamor
frequency.
Dephasing of a spin
So far, we have just re-packaged undergrad calculations in new language. Let’s go further,
by treating a noisy time evolution of a spin-1/2 system in a magnetic field.
Assume that during the time period t ∈ [0, T ], the field strength is not B, but B + ∆B.
Then the Lamor frequency adapts accordingly, so that the phase factor picked up by the
upper-right term of the density matrix during the time interval changes as
eiT ω 7→ eiT ω eiϕ , ϕ = γT ∆B.
We say the system experiences of phase kick by eiϕ .

Now imagine the changes ∆B and thus the phases ϕ fluctuate probabilistically. For
concreteness assume that the ϕ follow a Gaussian distribution
1 2
p(ϕ) = √ e−ϕ /(4Λ)
4πΛ
with mean 0 and variance 2Λ for some Λ ∈ R. Then the expected value of the phase factor
is
Z ∞
1
E[e ] = √
2
iϕ
eiϕ e−ϕ /(4Λ) dϕ = e−Λ
4πΛ −∞
(c.f. Eq. (B.1)). We see that random phase kicks cause the off-diagonal terms of the density
operator to attenuate:
(ax − iay )eiωT e−Λ

1 + az
ρ(T ) = .
(ax + iay )e−iωT e−Λ 1 − az
Now assume that during the following time periods t ∈ [(n − 1)T, nT ], the system expe-
riences independent phase kicks. Taking expectations again gives rise to additional factors
of e−Λ so that the upper-right matrix element at time nT reads (ax − iay )eiωnT e−nΛ .
Under reasonable independence assumptions on the distribution of the fluctuations, it is
then justified to interpolate to arbitrary times, so that, with λ = T /Λ, the system evolves
according to
(ax − iay )eiωt e−λt

1 + az
ρ(t) =
(ax + iay )e−iωt e−λt 1 − az
(see Fig. 1.2).
We have seen that unavoidable fluctuations in the control fields lead to the off-diagonal
elements of the density matrix to tend to zero exponentially fast. The characteristic time
scale 1/λ is called the T2 relaxation time. (As the name suggests, there’s also a T1 time,
which is the time scale during which the diagonal elements of ρ tend to their thermal
equilibrium values.) The limiting density matrix

1 + az 0
ρ(t → ∞) = (1.7)
0 1 − az
is a probabilistic mixture of energy eigenstates. The lack of superposition terms means
that (1.7) can be interpreted as a “classical state”.
Quantum computers rely on interference effects. Therefore, a system can serve as

a qubit only if its T2 -time is long enough that the computation can conclude before
phase coherence is lost (else, costly quantum error correction procedures become
necessary).
Figure 1.2: Left panel: Decoherence time measurement on a spin qubit operated at the
Research Center Jülich and RWTH Aachen with support of the project Matter and Light
for Quantum Computing. From [Struck et al., Low-frequency spin qubit energy splitting
noise in highly purified 28 Si/SiGe, npj Quantum Information (2020). Right panel: The
trajectory of a dephasing spin in the Bloch ball.
It is instructive to work out how the Liouville equation has to be modified to take
dephasing into account. Noting that one can write the projection of the density
matrix onto its off-diagonal as
1
(ρ − σz ρσz† ),
2
it is easy to verify that ρ(t) satisfies the differential equation
i λ
σz ρσz† − ρ .

∂t ρ(t) = − [H, ρ] +
ℏ 2
Such differential equations that describe the time evolution of noisy quantum sys-
tems are called quantum master equations.
Summary
• General states are described by density operators, Hermitian operators

whose eigenvalues form a probability distribution.
• Reversible time evolution of density operators is determined by the quantum
Liouville equation iℏ∂t ρ = [H, ρ].
• In d = 2, density operators can be described using their Bloch vector a as
ρ = 21 (1 + a · σ).
• Phase noise attenuates off-diagonal coefficients of density operators.
1.2 Multi-partite Hilbert space
Goals
We will introduce tensor product Hilbert spaces, and argue why this is the right
space for multiple distinguishable particles. We’ll have to spend a lot of time on
notation (boring, but necessary) and have a first look at entanglement.
1.2.1 Tensor product Hilbert spaces

Two particles are distinguishable if one can construct measurement devices that are sensi-
tive to one of the particles, but are not influenced by the other. (In contrast, try to build a
detector that will be triggered only by one specific electron!)
More precisely, let H1 , H2 be the Hilbert spaces of two particles. We say they are
distinguishable if:
X
For any state |α⟩ ∈ H1 and observable A = ai |ei ⟩⟨ei | on H1 ,
i
X
and any state |β⟩ ∈ H2 and observable B = bj |fj ⟩⟨fj | on H2 ,
j
it makes physical sense to prepare the first particle in the state |α⟩, the second one in the
state |β⟩, and perform the measurements A and B. We also demand that in this case, the
outcome probabilities are independent:
Pr[ai and bj ] = |⟨α|ei ⟩|2 |⟨β|fi ⟩|2 . (1.8)
We now construct the Hilbert space H12 associated with the combined system. The
above implies that H12 contains vectors associated with the outcomes ai , bj . Let’s call
them |ei , fj ⟩. Because they correspond to different outcomes of an observable, they have to
be orthogonal. The Hilbert space must also contain a vector associated with the preparation
procedure, let’s call it |α, β⟩. The independence condition (1.8) is fulfilled if, for
X X
|α⟩ = αi |ei ⟩, |β⟩ = αj |fj ⟩,
i j
we define
X
|α, β⟩ = αi βj |ei , fj ⟩ (1.9)
ij
(One can show that this is essentially the only way to satisfy independence.) The resulting
Hilbert space
nX o
H12 = ψij |ei , fj ⟩ | ψij ∈ C ,
ij
together with the rule (1.9), is called the tensor product space H1 ⊗ H2 .
States that describe independent preparations of the particles, i.e. those of the form
given in (1.9), are called product states. Alternative notations:
|α, β⟩ = |αβ⟩ = |α⟩|β⟩ = |α⟩ ⊗ |β⟩,

and, if the bases referenced are (hopefully) clear from context:
|ei , fj ⟩ = |i, j⟩.

P
For general elements |ψ⟩ = ij ψij |ij⟩ ∈ H12 , the coefficients ψij need not factorize as
in (1.9). Such states are called entangled, and we’ll have more to say about them.
The observables A, B associated with the individual particles act on product vectors in
the obvious way:
A|α, β⟩ = (A|α⟩)|β⟩, B|α, β⟩ = |α⟩(B|β⟩).
This defines A, B on all of H12 , because the product vectors |ei , fj ⟩ form a basis.
Notation and conventions If not clear from context, the system on which an operator
acts is explicitly specified
C (1) |α, β⟩ = (C|α⟩)|β⟩, C (2) |α, β⟩ = |α⟩(C|β⟩).
There’s also the “tensor product of operators” notation (sometimes called the Kronecker
product, in particular in computer algebra systems):
C (1) D(2) = C ⊗ D, C (1) = C ⊗ 1, C (2) = 1 ⊗ C.
This implies that “tensor products of outer products” equal “outer products of tensor prod-
ucts” (yeah, I know... you’ll get used to it):
|α⟩⟨γ| ⊗ |β⟩⟨δ| = |αβ⟩ ⟨γδ|. (1.10)
Example: The singlet state. In the theory of the addition of angular momentum (every
student’s favorite topic!), one comes across the singlet state
1
|Ψ− ⟩ = √ (| ↑↓⟩ − | ↓↑⟩)
2
in H1 ⊗ H2 , where the Hi are two-dimensional with basis {| ↑⟩, | ↓⟩}.
1.2.2 The partial trace

Let’s recall the classical notion of a marginal distribution. The statistics of a pair X1 , X2
of random variables is described by their joint distribution p(2) :
p(2) (x1 , x2 ) = Pr[X1 = x1 and X2 = x2 ].
If one has access only to the first variable, one can obtain its distribution from the joint one
by summing over the irrelevant outcomes
X
p(1) (x1 ) = Pr[X1 = x1 ] = p(2) (x1 , x2 ). (1.11)
y
The result p(1) is called the marginal distribution associated with X1 .

Let’s work out the quantum analogue of the “partial sum” in Eq. (1.11). Assume we
are given a joint state of two particles, described by a density operator ρ(12) on the tensor
product Hilbert space H1 ⊗ H2 . We would like to compute an effective state ρ(1) that
describes measurements performed on the first particle alone. More precisely, for every
observable A on H1 , we demand
tr ρ(1) A = tr(ρ(12) (A ⊗ 1)). (1.12)
To solve this problem, define the partial trace tr2 of a product operator by computing
the usual trace of the second factor only:
tr2 (C ⊗ D) = C tr(D).
Note that the partial trace maps an operator on the tensor product Hilbert space to an
operator on the first system alone. Next, because any operator M on H1 ⊗ H2 can be
expanded in terms of product operators
X X
M= Mijkl |ij⟩⟨kl| = Mijkl |i⟩⟨k| ⊗ |j⟩⟨l|,
ijkl ijkl
one can extend tr2 linearly to all operators:

X XX
tr2 M = Mijkl |i⟩⟨k| ⊗ tr(|j⟩⟨l|) = Mijkj |i⟩⟨k|.
ijkl ik j
Using this expression:

tr ρ(12) (A ⊗ 1) = ⟨ij|ρ(12) |kl⟩ ⟨kl|A ⊗ 1|ij⟩
X
| {z }| {z }
ijkl (12)
ρijkl ⟨k|A|i⟩δlj
X
(12)
X
ρijkj ⟨k|k⟩ = tr A tr2 ρ(12) .

= ⟨k|A|i⟩
ik j
We have found that

ρ(1) = tr2 ρ(12)
solves Eq. (1.12). In this sense, the partial trace is the quantum analogue of the “partial
sum” of Eq. (1.11). The density matrix ρ(1) is called the marginal state or the reduced
density matrix.
Pure product states. For pure product states, we find

(12)
ρ = |αβ⟩⟨αβ| = |α⟩⟨α| ⊗ |β⟩⟨β| ⇒ ρ(1) = |α⟩⟨α| tr(|β⟩⟨β|) = |α⟩⟨α|.
Physically, this says that measurements on the first particle are sensitive only to the prepa-
ration of the first particle, which reflects the independence property (1.8) we have required
in the distinguishable case.
The singlet state. The partial trace of the singlet state is much more interesting:
1
tr2 |Ψ− ⟩⟨Ψ− | = tr2 |↑↓⟩⟨↑↓| − |↑↓⟩⟨↓↑| − |↓↑⟩⟨↑↓| + |↓↑⟩⟨↓↑|

2
1 1 1
= |↓⟩⟨↓| + |↑⟩⟨↑| = 1.
2 2 2
While the global state ρ = |Ψ ⟩⟨Ψ | was pure, the partial trace tr2 ρ = 21 1 is maxi-
− −
mally mixed! If this were a thermodynamic equilibrium state, the total system would be
at temperature 0, while Alice’s subsystem had temperature ∞. In classical physics, this is
impossible. This example shows that mixed states can occur in QM even in the absence
of any form of classical randomness. We’ll explore the conceptual implication in the next
section.
Entropy of entanglement Let |Ψ⟩ ∈ H1 ⊗ H2 . If |Ψ⟩ = |αβ⟩ is a product state, then

the reduced density matrix ρ(1) = tr2 |Ψ⟩⟨Ψ| = |β⟩⟨β| is pure and thus has vanishing
von Neumann entropy S(ρ(1) ) = 0. Since we have defined “entanglement” to be the
property of not-being-a-product-state, it is natural to define S(tr2 |Ψ⟩⟨Ψ|) as a quantita-
tive measure of entanglement. For the singlet state, this entropy of entanglement is 1 bit,
the highest value realizable in two dimensions. For this reason, the singlet is called a
maximally entangled state.
Summary
• The global Hilbert space of particles with individual Hilbert spaces H1 , H2

with bases {|ei ⟩}, {|fj ⟩} is the tensor product space
nX o
H12 = ψij |ei fj ⟩ | ψij ∈ C .
ij
• The restriction of a global density operator ρ(12) to one subsystem is given

by the partial trace ρ(2) = tr2 ρ(12) .
• Globally pure states can look locally mixed. This is a sign of entanglement.
The rest of this chapter present some topics in multi-partite quantum systems,
which are, I think, conceptually highly interesting. But we won’t build on them
in the remainder. So it’s fine to skip ahead to Chapter 2.
1.3 Dynamics of coupled systems
Goals
Things will get much more interesting! Our direct objective in this section is to
work out a model in which entangled states arise naturally. Even though the model
is extremely simple, we will, as a by-product, be able to make progress on issues
that seem to pose conceptual problems to QM: The measurement problem and the
question of why the world looks classical, even though it seems to be fundamen-
tally governed by QM.
1.3.1 The measurement and the classicality problem

The measurement problem
Elementary QM provides two very different rules for time evolution:
1
• Hamiltonian time evolution: |ψ⟩ 7→ e iℏ ∆tH |ψ⟩. Change is continuous in time,
reversible, deterministic, and linear in the state vector.
√
• Projective measurements: |ψ⟩ 7→ Pj |ψ⟩/ pj with probability pj = ⟨ψ|Pj |ψ⟩.
Change is discontinuous in time, irreversible, non-deterministic, non-linear in the
wave function.
Figure 1.3: The formalism of QM divides the universe into degrees of freedom that are
modeled quantum-mechanically and those that are classical. The boundary between these
two regimes is the Heisenberg cut. For a Stern-Gerlach experiment, the quantum side could
include just the spin (1), but also the motional degrees of freedom (2) of the silver atom, or
the measurement device (3) that records its final position, or even the experimentalist (4)
observing the outcome.
Given that these are completely different, quantum physicists take great care to very care-
fully explain when to use the one and when to use the other. ... Huh huh, just kidding.
Check out your introductory textbook and try to find a definition of which properties ex-
actly a physical process has to fulfill in order to qualify as a “measurement”. I wish you
good luck!
The standard presentation of quantum mechanics divides the world into a “quantum
part” and a “classical part”. The measurement rules connect the two. But it is not clear
which degrees of freedom belong to which side of this cut.
Example: In the standard treatment of the Stern-Gerlach experiment, the spin is mod-
eled quantum mechanically, but the spatial position of the atom classically. The spin-
dependent movement of the atom is treated as a measurement. But it also seems reasonable
to put the atom’s position to the quantum side of the cut (Fig. 1.3). The interaction between
spin and spatial coordinates is then described by a coherent Hamiltonian time evolution.
A measurement only takes place once an observer records the atom’s position.
We can now state to aspects of quantum mechanic’s measurement problem:
• The pragmatic problem: Why can physicists get away with being so vague about
the notion of “measurement”? Why don’t different modeling decisions produce
different predictions? (We’ll be able to answer this).
• The philosophical problem: Given that quantum mechanics is supposedly more fun-
damental than classical theories, how do we deal with the fact that its predictions
are stated with respect to a classical world? Who’s measuring the wave function of
the universe? (We won’t make progress here. In fact, there’s no agreement what’s
the best solution to this issue. Or whether there is a solution. Or whether there was
a problem in the first place. It’s a mess.)
Why is the macroscopic world classical?

The gravitational potential describing planetary motion and the Coulomb potential binding
an electron to a nucleus are mathematically equivalent. Why then is it the case that we
describe states of bound electrons in terms of delocalized orbitals, whereas Venus seems
to occupy a pretty definite spot in the night sky (however, see Fig. 1.4)?
Figure 1.4: Why do planets and electrons behave differently? An unconventional take.
Source: xkcd.com.
Likewise, why do marbles seem to be in one place at any one time, while from the
perspective of elementary QM, it would be much more natural to assign a momentum
eigenstate to them (which diagonalizes the free Hamiltonian)? Due to the their macro-
scopic mass, it is compatible with Heisenberg’s uncertainty relation that a marble can be
in a state in which both position and momentum are very precisely determined – but it is by
no means necessary that such a state be adopted. So why then does this seem to happen?
More generally: Which process breaks the unitary invariance of quantum state space
and selects the basis in which we encounter physical objects?
1.3.2 A quantum model for measurements

With these fundamental questions at the back of our heads, let’s start with the Hamiltonian
for a particle with spin interacting with an external magnetic field:
P2 γℏ
H= − B·σ
2m 2
Assume that B = Bzez . Then only the z-coordinate participates in the interaction, so
nothing is lost by only treating the spin and the spatial z-coordinate explicitly. The time
evolution is best calculated in interaction picture. Decompose the Hamiltonian as
Pz2 γℏB
H = H0 + HI , H0 = , HI = − zσz .
2m 2
Then the Schrödinger and the interaction–picture wave functions are
1 1 1
|ψS (t)⟩ = e iℏ tH |ψS (0)⟩, |ψI (t)⟩ = e− iℏ tH0 |ψS (t)⟩ = e iℏ tHI |ψS (0)⟩,
where |ψI (t)⟩ describes the change of dynamics caused by an interaction term HI .
First treat the case where the particle is initially in a momentum-0 eigenstate:
|ψS (t = 0)⟩ = (α|↑⟩ + β|↓⟩)|k = 0⟩.

ℏγB
Then, with δ := 2 ,
iγ
|ψI (t)⟩ = e 2 tBzσz |ψS (t = 0)⟩
iγ iγ
= α e 2 tBz |↑⟩|k = 0⟩ + β e− 2 tBz |↓⟩|k = 0⟩

= α|↑⟩|k = δt⟩ + β|↓⟩|k = −δt⟩. (1.13)
This is an entangled state! A measurement of spin and momentum gives correlated out-
comes:
|α|2 (s, k) = (↑, +δt)

Pr[s, k + dk] = .
|β|2 (s, k) = (↓, −δt)
The marginal distribution for the spin variable alone is
|α|2 s = ↑

Pr[s] = .
|β|2 s = ↓
This is exactly what we would have obtained by treating just the spin quantum mechan-
ically! Thus: Using a quantum model for the spatial z-component does not change the
prediction about the measured spin state. All it does is to entangle the measured and the
measuring degree of freedom so that the global state becomes a superposition of consistent
configurations. Indeed, we could have included further degrees of freedom – e.g. the ex-
perimentalist observing the particle momentum. If we model them – simplifying slightly
– as a two-dimensional system with (mental) states |,⟩ when seeing an upwards mov-
ing atom, and |/⟩ when encountering one moving downwards, an analogous calculation
would have resulted in
|ψI (t)⟩ = α|↑⟩|δt⟩|,⟩ + β|↓⟩| − δt⟩|/⟩, (1.14)
with a similar interpretation if now the experimentalist’s state gets measured.

Instead of a momentum eigenstate, let’s use a more realistic Gaussian initial state.
Write |ψk0 ⟩ for a Gaussian wave packet centered around k0 in momentum space:
(k−k0 )2
⟨k|ψk0 ⟩ = (2π)−1/4 e− 4 .
Then
iγ
e 2 Bz |ψk0 ⟩ = |ψk0 +δt ⟩
so that, if we take |ψS (0)⟩ = |ψ0 ⟩,
|ψI (t)⟩ = α|↑⟩|ψδt ⟩ + β|↓⟩|ψ−δt ⟩.
The correlations between spin and position now build up over time. Indeed:
(k−δt)2
(
1 |α|2 e− 2 dk s = ↑
Pr[s, k + dk] = √ (k+δt)2
.
2π |β|2 e− 2 dk s = ↓
At t = 0, the momentum distribution is independent of the spin state. For times t ≃ 1/δ,
the two spin-dependent Gaussian distributions become distinct, but overlap significantly.
Only for t ≫ 1/δ does the sign of a measured momentum value identify the spin state
with certainty.
Let’s summarize: The coupling term (zσz ) caused the spin and the positional degree
of freedom to become entangled over time. A measurement on the entangled state in
the eigenbases of the two factors z, σz leads to correlated outcomes. Asymptotically, the
correlations are perfect, and a direct measurement of one observable on the initial state
is equivalent to a measurement of the other observable on the final state. One can then
define a measurement to be any process to which the above analysis applies. In this case,
the measuring degree of freedom (called a pointer in this context) can be treated either
classically or quantum-mechanically.
The same framework can be used to identify the basis in which objects present. To see
how, compute the reduced density matrix for the spin. From
|ψI (t)⟩⟨ψI (t)| = |α|2 |↑⟩⟨↑| ⊗ |ψδt ⟩⟨ψδt | + αβ ∗ |↑⟩⟨↓| ⊗ |ψδt ⟩⟨ψ−δt | + . . .
and
Z
1 (−δt−k)2 +(δt−k)2
tr |ψδt ⟩⟨ψ−δt | = ⟨ψ−δt |ψδt ⟩ = √ e− 4 dk
2π
Z
1 k2 +(δt)2 2
=√ e− 2 dk = e−(δt) /2 ,
2π
we can read off the reduced density matrix in the {|↑⟩, |↓⟩}-basis:
2
!
|α|2 αβ ∗ e−(δt)
ρspin (t) = trspace |ψI (t)⟩⟨ψI (t)| = 2 .
α βe−(δt)
∗
|β|2
Thus, the state of the spin part alone dephases from a pure state at t = 0 to a probabilistic
mixture of |↑⟩ and |↓⟩ for times t ≫ 1/δ. The entropy (of entanglement) gradually builds
up from S(t = 0) = 0 to
S(t → ∞) = −|α|2 log |α|2 − |β|2 log |β|2 .
Let’s again interpret this calculation from a broader perspective. After the dephasing
time, an unrelated observer will find the spin in a σz -eigenstate and will not encounter
superpositions. Recall what distinguishes the z-axis: It is the one in which the interac-
tion takes place! The bases which we perceive as “classical” are the ones in which the
interaction terms are diagonal, and the emergence of probabilistic mixtures is a result of
entanglement building up. Interactions are local, which is why quantum systems usually
appear to be well-localized in space. However, some interactions select for different bases:
e.g. electrons bound in an atom couple to the environment via the electromagnetic field.
This interaction is sensitive to atomic energy scales and angular momentum – but the wave
lengths of the involved photons is too large for the position of the electron within the atom
to make a meaningful difference. Therefore, the semi-classical description of electrons
in terms of atomic quantum numbers (“n, l, m”) makes sense in this case. In contrast,
whether or not a photon is scattered off the surface of venus depends on the planet’s posi-
tion within its orbit, not on its internal energy or angular momentum.
Further conceptual points:
• Q.: Are measurements discontinuous in time?
A.: Nope! Correlations between the measured system and the environment are built
up at a time scale proportional to the inverse coupling strength. The instantaneous
process postulated in introductory QM can be understood as an effective description
valid for times much larger than that.
• Q: Are these these processes irreversible?

A.: The dynamics on the quantum side of the cut is reversible in theory – the final
measurement still isn’t. This doesn’t lead to practical contradictions, though. As-
sume we put the whole universe, except for ourselves, on the quantum side. What
would it take to reverse the measurement after a blob of silver (in the Stern-Gerlach
case) has been deposited on a screen, but before we have looked at it? The deposit
will have interacted with an enormous number of degrees of freedom: phonons in the
screen, the cosmic background radiation, thermal photons that have since zoomed
off into the sky at the speed of light. Clearly, for all practical purposes (“FAP”),
it is impossible to reverse those interactions. Thus, once a macroscopic record of
an event exists, the irreversibility introduced by QM’s measurement postulate does
not change anything FAP. Philosophically, it might still be a thorny issue though!
This is all good news if you like to compute things (no immediate contradiction).
It’s bad news if you like to understand foundational questions, because there seems
little empirical guidance on offer for how to handle this conceptual inconsistency.
• Q.: In thermodynamics, there’s tension between the fact that entropy increases,
while microscopic dynamics is reversible. The buildup of entanglement seems like
an elegant solution: Local randomness is created from globally reversible dynamics.
Maybe “all entropy is entanglement entropy”! Is that a good way to think about the
apparent increase of entropy?
A.: You betcha!
Concrete estimates for decoherence rates can be found in Table 2 of Tegmark, Apparent
wave function collapse caused by scattering.
1.4 Quantum many-body systems as computers
Goals
Quantum computing is all the rage! We’ll introduce the basic concepts here and
discuss one very cool and comparatively
√ simple application: Grover’s algorithm,
which can search through N times in N time. You heard me right.
One can iterate the construction of the two-particle Hilbert space to find the space for
n > 2 systems. Assume, for simplicity, that every single-system Hilbert space Hi has
dimension d and basis {|1⟩, . . . , |d⟩}. Then a general state vector in the joint Hilbert space
H = H1 ⊗ · · · ⊗ Hn is of the form
d
X
|ψ⟩ = ψi1 ,...,in |i1 , . . . , in ⟩.
i1 ,i2 ,...,in =1
You should immediately notice that the sum is over dn terms, i.e. the dimension of the
joint space is exponentially large in the number of constituents! For a collection of spin-
1/2s arranged on a cube with side length only 10, this gives an way-bigger-than-merely-
astronomical 21000 . This is:
• Bad news if you work in computational physics. It is absolutely out of the question
even to just store the coefficients ψi1 ,...,in in memory. Fortunately, one can some-
times use clever tricks to make statements about large-n systems without having to
work with explicit representations. More on this: See rest of these notes.
• Potentially good news if you can carefully control large quantum systems. Because
Nature seems to be able to track quantum states that our classical computers can’t, it
stands to reason that quantum systems could be used to solve otherwise intractable
computational problems. More on this: See Sec. 1.4.
Many-body Hamiltonians can usually be efficiently represented, though. The reason
is that physical interaction involve only few particles at a time. A Hamiltonian with only
single- and two-body terms is of the form
X 1 X (k,l)
H= h(k) + h
2
k k̸=l
where h(ij) acts non-trivially only on the i-th and j-th Hilbert spaces and can therefore be
specified as a d2 × d2 -matrix (or, if d = ∞, will typically be a simple function of position
and momentum operators).
Given the Hamiltonian, typical questions of interest are:
1. Obtain information about the eigenvalues of H, e.g. the energies of the ground states
and of low-lying excitations.
2. Compute thermodynamical potentials, e.g. the free energy
log Z = log tr e−βH .
3. Compute the expectation value ⟨ψ(t)|A(i) |ψ(t)⟩ of a local observable. Here, |ψ(t)⟩ =
t
e iℏ H |ψ(0)⟩ is the time evolution of a state that started out in a simple form, say
|ψ(0)⟩ = |i1 , . . . , in ⟩.
In general, finding answers to these questions is intractable. The task of quantum many-
body theory is to find special cases or approximations where progress can be made.
Quantum algorithms
It is not obvious that simulating the time evolution of quantum many-body systems actu-
ally is classically intractable. Sure, we have argued above that storing a many-body wave
function in memory is impossible. But we have also seen that any physical time evolution
can be described using only a small number of parameters (the local terms of the Hamilto-
nian, a simple initial state). So it is conceivable that there exists a smart universal way of
keeping track of |ψ(t)⟩ that does not involve working with the full state vector.
Today, there is strong evidence that such a universal strategy does not exist1 .
One piece of evidence is given by the existence of quantum algorithms. These are
methods that allow one to solve a difficult classical computational problem efficiently by
outsourcing parts of the calculation to a quantum device.
1 There is no rigorous proof of this, though! The issue is that is has so far been beyond the wit of humankind
to prove any reasonable problem to be computationally hard. For example, the infamous “P vs NP” problem asks
for a proof that finding solutions to problems is generally harder than verifying that a proposed solution indeed
works. An imprecise analogue would be: Appreciating classical music is easier than becoming the next Mozart.
Of course this is true – so the fact that there is no mathematical proof that “P ̸= NP” is not generally be taken to
be indicative of there being serious doubts about the statement, but rather as testament to the limitations of the
human mind. Sadly, a detailed account is beyond the scope of this lecture.
davidg@repos:˜$ sudo grep davidg /etc/shadow

davidg:$6$2kjoNwafEvibRYry$BzfBTGfk0wY2nk3c05OAucPXQPIP8
dWMFGbCNoAs7B1dacUNNUn5fDMyOorDu4QSaxOWZskpObEz3dlBbfI3f
/:19425:0:99999:7:::
davidg@repos:˜$
Figure 1.5: Stored SHA-512 hash of my actual university login. If you find a pre-image,
you can read my emails and adjust your grades. Knock yourself out! [If you do succeed,
you could also answer my emails. Come to think of it, maybe I should just post my
password...].
1.4.1 Grover’s algorithm

Here, we will look at one example: Grover’s algorithm. As we’ll see, it is simultaneously:
(i) comparatively easy to understand, (ii) highly surprising in what it achieves, and (iii)
probably of limited practical value even if big quantum computers can be constructed.
Overview
There are some computational puzzles for which the best-known approach is to just try
every possible input to see whether it is a solution.
The most clear-cut cases are used in cryptography. For example, your computer does
not actually know your password! Instead, it stores an n-bit image y ⋆ = h(x⋆ ) of the
password x⋆ under a cryptographic hash function h. It is designed such that computing
y = h(x) for an input x is easy, but the best-known way of finding a pre-image x ∈
h−1 ({y}) given y is to try ≃ 2n random inputs (Fig. 1.5). To authenticate a user who
claims their password is x, the computer compares y = h(x) to the hash y ⋆ on file. The
advantage of such an indirect procedure is that not much harm is done if the stored hashes
fall into the wrong hands: A typical value of n is 512 and 2512 ≫ (hadrons in universe),
so recovering the passwords x⋆ is impractical. (Unless, of course, the user chooses a
password that can be guessed with reasonable effort. No hash magic makes “birthday-of-
romantic-partner123lol” a secure choice.)
Finding an inverse by trying random inputs does not require that we understand any-
thing about the inner workings of h. All we need is the ability to compute h(x) given x.
Methods that interact with h only in this way are called black box (or oracle) algorithms.
Are black box algorithms really the best way to invert a hash function? Don’t take my
word for it! The vast wealth stored in “crypto currencies” is secure only to the degree that
this assumption is true. BitCoin is effectively a multi-billion dollar bounty on an improved
algorithm. It hasn’t been claimed as of 2023 (Fig. 1.6).
In light of this, it is truly remarkable√that in 1996, Lov Grover showed that a quantum
computer can find x from y in roughly 2n = 2n/2 time steps. In fact, this square root
speedup is possible for any puzzle for which a solution can be efficiently recognized!
Here’s a high-level overview: We model “puzzle for which a solution can be recog-
nized” by a function f that maps n-bit strings x (“candidates”) to 0 (“no solution”) or 1
(“solution!”). In the above example: f (x) = 1 if h(x) = y ⋆ and 0 else. Assume you
have a piece of code that evaluates f on a classical computer in Tn time steps. Then
Grover’s recipe turns that code into a time-dependent two-body Hamiltonian H(t) such
that if |ψ(0)⟩ = |0, . . . , 0⟩, then
√
ψ t = cTn 2n ≃ |x⋆1 , . . . , x⋆n ⟩,
Figure 1.6: Left: Cryptocurrency mines consists of racks of computers that try random
inputs hoping to find a solution to a mathematical puzzle. Right: If you can do better,
there’s 300b dollars on the table (as of early 2023). Credit: Wikipedia, Statista.
where x⋆ is such that f (x⋆ ) = 1 and c a (reasonably small) constant. A measurement will
then reveal the bits of x⋆ with high probability.
Grover’s algorithm is also “black box” in the sense that no understanding of “the inner
workings” of f beyond the ability to compute it is required. So how is it possible to find a
solution in drastically less time than it would take to consider a fixed fraction of all inputs?
The answer is that Grover constructs a quantum black box Uf : |x, 0⟩ 7→ |x, f (x)⟩ that
can be run on a superposition of inputs
X X
Uf cx |x, 0⟩ = cx |x, f (x)⟩.
x x
Thus, just a single invocation of the quantum black box results in a wave function that
carries information about all possible inputs. The tricky part is then to read this information
out. Grover’s contribution was to find a clever trick for getting the amplitudes for all
|x, f (x)⟩ with f (x) = 0 to interfere destructively, so that only the solutions survive.
We’ll work our way through the details next.
The gate model

The connection between classical computer code and quantum Hamiltonians goes via the
gate model of computation.
A classical computer operators on bits, physical systems that can be in one of two
states. Traditionally, these are labeled 0 and 1. A logic gate (or just gate) is a process
that changes the state of a small number of bits in a defined way, see Fig. 1.7. It is known
in classical computer science that any function that can be computed at all can also be
computed by a circuit formed by concatenating reversible logic gates.
We now consider a quantum generalization. To this end, replace each bit by a two-level
quantum system and fix some basis with states labeled |0⟩, |1⟩. Voilà, a quantum bit (or
qubit). We assume that we have detailed control over the dynamics: By adjusting classical
control parameters (external fields, position of the qubits, etc.), we are able to switch off
the time evolution H = 0, or to realize single qubit H = h(i) or two-qubit H = h(ij)
Hamiltonians.
(i)
Let’s look at the single-qubit case first. If we set, e.g., H = h(i) = ℏ2 σx for a duration
π (i) (i)
δt = π, then the time evolution U (δt) = e−i 2 σx = −iσx acts on the i-th qubit like the
classical N OT gate (ignoring a global phase factor of −i):

0 1
σx = : |0⟩ 7→ |1⟩, |1⟩ 7→ |0⟩ or |xi ⟩ 7→ |N OT(xi )⟩.
1 0
Figure 1.7: Classical gates and circuits. (i) The N OT gate inverts the state of a single
bit. (ii) The X OR gate computes the exclusive or x ⊕ y of its inputs. (iii) The C NOT
(or controlled not) gate toggles the state of the second bit if and only if the first bit is
in the 1-state. Note that the C NOT and the N OT gate are reversible: I.e. the input can
be reconstructed given the output. (iv) A reversible circuit. It turns out that anything a
classical computer can do can be represented in this way.
Likewise, the matrix that is represented in the {|00⟩, |01⟩, |10⟩, |11⟩}-basis by
  |xi xj ⟩ C NOT|xi xj ⟩
1
|00⟩ |00⟩
 1 
C NOT =  : |01⟩ |01⟩
 0 1
|10⟩ |11⟩
1 0
|11⟩ |10⟩
acts like the C NOT gate, but on qubits. Because U is unitary, it can be implemented
by a suitable two-qubit Hamiltonian (homework). This construction generalizes to all
reversible logic gates and, as per our previous comment, any classical computation can
thus be realized by a time-dependent Hamiltonian on qubits.
But of course, most unitaries are not permutations! A quantum gate is any unitary that
acts on a small number of qubits. Prominent examples with no classical counterpart are

1 0
Z= Z-gate, (1.15)
0 −1

1 0
P = phase gate, (1.16)
0 i

1 1 1
H=√ Hadamard gate. (1.17)
2 1 −1
The Hadamard gate, e.g., turns basis states into uniform superpositions:
H|0⟩ = 2−1/2 (|0⟩ + |1⟩),

(H ⊗ H)|0⟩|0⟩ = (H|0⟩)(H|0⟩) = 2−1 (|00⟩ + |01⟩ + |10⟩ + |11⟩),
..
.
n
X
H ⊗n |0, . . . , 0⟩ = 2−n/2 |x1 , . . . , xn ⟩.
x1 ,...,xn =0
Grover iterations
Let f : {0, 1}×n → {0, 1} be a classical function as introduced in Sec. 1.4.1. To represent
f in a quantum computer, we have to consider a reversible version. The common choice
is this:
(x, y) 7→ (x, y ⊕ f (x)) x ∈ {0, 1}×n , y ∈ {0, 1}.
As is the case for any reversible function, it can be expressed as a circuit consisting of
reversible classical gates. Re-interpreting these as quantum gates, we arrive at the (n + 1)-
qubit unitary
Uf : |x, y⟩ 7→ |x, y ⊕ f (x)⟩
which can indeed be realized by a time-dependent Hamiltonian running in time propor-

tional to Tn .
At this point, we have constructed the quantum black box, and have seen how to create
superposition states using Hadamard gates. Let’s combine these two steps. For simplicity,
assume that there is a unique x⋆ such f (x⋆ ) = 1. Then:
Uf (H ⊗n ⊗ 1) |0, ..., 0⟩ = 2−n/2

X X
|x, f (x)⟩ = 2−n/2 |x, 0⟩ + 2−n/2 |x⋆ , 1⟩.
x x̸=x⋆
That’s promising, because a single invocation of the quantum black box did indeed leave
information about x⋆ in the output. But it’s not yet useful, because the coefficient in
front of |x⋆ , 1⟩ is exponentially small. Performing a measurement will reveal it only with
probability 2−n , exactly the same as a classical random guess would give.
Grover found a way to amplify the coefficient in front of the solution. His construction
involves the following elements, whose relevance will become clear soon:
1. Instead of Uf , which indicates whether a solution has been first by flipping an aux-
iliary qubit, use
Vf : |x⟩ 7→ (−1)f (x) |x⟩
which changes the sign of the coefficient for the solution. One can construct Vf
from Uf by throwing in an extra Hadamard gate. Verifying this is homework.
2. Introduce a second unitary
Vδ : |x⟩ 7→ (−1)δ(x) |x⟩,
where f is replaced by the “Kronecker delta for bit-strings”, i.e. Vδ flips the sign of
the coefficient for x = 0.
3. Define the Grover operator to be

G = (−H ⊗n Vδ H ⊗n ) Vf .
The big claim now is that starting from H ⊗n |0⟩, every application of the √ Grover operator
G will rotate the state vector closer to |x⋆ ⟩, hitting the target after ≃ π4 2n iterations.
Proof: Define
1 X
|/⟩ = √ n |x⟩
2 − 1 x̸=x⋆
to be the uniform superposition of all non-solutions. Then {|/⟩, |,⟩ := |x⋆ ⟩} form on
ONB for a two-dimension subspace. Remarkably, the state vector will regularly end up
in this 2-dimensional space, so we can track the progress of the algorithm solely by con-
sidering the dynamics in this small space. (This makes Grover comparatively easy to
analyze. Don’t get your hopes up, though. This never happens again). Indeed, the state
|+⟩ = H ⊗n |0⟩ can be expanded as
r
2n − 1 1
|+⟩ = |/⟩ + √ |,⟩.
2n 2n
As you can see, the initial superposition is almost parallel to the non-solutions |/⟩. The
angle they enclose is
1 1
θ := ∠(|ψ0 ⟩, |/⟩) = arcsin √ ≃√
2n 2n
(an excellent approximation for reasonably large n). Now the application of Vf changes
the sign of the coefficient in front of |,⟩. Geometrically, this corresponds to a reflection
about the plane orthogonal to |,⟩. Likewise,
−H ⊗n Vδ H ⊗n = H ⊗n (−1 + 2|0⟩⟨0|)H ⊗n = −1 + 2|+⟩⟨+|
is a reflection about |+⟩. The combinations of two reflections is a rotation, and a simple
geometric analysis in the |,⟩–|,⟩–plane (Fig. 1.8) shows it is by an angle of 2θ toward
the solution vector |,⟩. It is reached after k iterations of G, for
π π 1 π√ n
θ + k2θ = ⇔ k= − ≃ 2 (1.18)
2 4θ 2 4
as claimed.
Remarks:
• Don’t run Grover for too long! Otherwise, you’ll rotate past the solution |,⟩.
• Don’t worry if (1.18) has no integer solution. If ⟨ψ|x⋆ ⟩ = 1 − ϵ, you’ll get a
wrong solution x ̸= x⋆ with probability ≃ 2ϵ. But by assumption, we can check
the solution efficiently by computing f (x). If f (x) = 0, just rerun the quantum
algorithm.
• While impressive in its generality, the practical utility of Grover’s algorithm is lim-
ited. The “square root speedup” isn’t as large as the exponential speedup some
quantum algorithms promise. What is more, quantum computers are much harder
to build than classical ones and might require a substantial overhead to compensate
for errors. On top of all that, Grover, unlike an exhaustive classical search, cannot
√ root advantage might only materialize for n’s such
be parallelized. Thus the square
that not only 2n , but already 2n is astronomical.
Figure 1.8: Time evolution of the Grover algorithm. Left panel: A Grover iteration per-
forms two reflections that combine to a rotation by θ toward the target state |,⟩ = |x⋆ ⟩.
Angles not to typical scale! Right panel: The effect of consecutive Grover rotations.
Summary
• The exponential size of the many-body Hilbert space can potentially be put
to use to solve classically hard computational problems.
• Time evolutions of few qubits are described by small unitaries, which are
called quantum gates and generalize classical logic gates.
• Classical computations can be made reversible and reversible gates re-
interpreted as unitaries. This way, classical subroutines can be can be eval-
uated on superpositions of inputs. The resulting state carries information
about their global behavior. Putting this information into a form that can be
read out may require non-trivial efforts (e.g. Grover iterations).
1.5 Bell inequalities and their implications

Classical mechanics tells you what is happening. Quantum mechanics only tells you what
you will observe when you measure. It does not assign values to unmeasured physical
properties.
From the early days of the theory, some scientists – famously Albert Einstein (Fig. 1.9
– saw this as a sign that quantum mechanics was incomplete, and should be supplanted
by a more detailed description of Nature that does track the time evolution of all physical
properties, measured or not.
In what I feel is one of the most profound findings of modern physics, this program
has since been proven to be impossible: The hypothesis
“Physical properties exist independently of measurements” (1.19)
has been experimentally falsified as a general property of Nature! On top of the surprising
conclusion, this is remarkable because (1.19) feels like a philosophical statement that is
too vague to have testable implications. Yet here we are.
In the following derivation, we have to keep in mind that we want to reason about
theories different from quantum mechanics. This means that we cannot use any concept
Figure 1.9: Left panel: 1935 New York Times headline reporting on Einstein-Podolsky-
Rosen paper arguing that quantum mechanics was incomplete. I wonder how Podolsky
and Rosen felt about the framing. Right panel: 2015 New York Times headline reporting
on Einstein being wrong.
that has a meaning only in the context of QM. “Hilbert space”, “entanglement”, “commu-
tators”, even “photon”... ...all these terms are verboten until further notice.2
Goals
The goals of this section? You got to be kidding me! Understand that, of course.
This has got to be one of the coolest thing physics has to offer.
1.5.1 The CHSH scenario

Our challenge now is to come up with a setting in which the extremely vague statement
(1.19) leads to quantitative predictions that can be compared to experiments. The most im-
portant case is the so-called CHSH scenario (Fig. 1.10). While not difficult to understand,
it does contain quite a number of elements that seem ill-motivated at this point. Please
bear with me for a moment.
The scenario contains two observers, Alice and Bob, located at different ends of a
laboratory. There’s a box in the middle. Every ten seconds, it emits two systems, one
flying to Alice and one to Bob. Each observer has two measurement devices, labeled 1
and 2. The devices work like this: They have an entry port and when one of the systems
coming from the central box enters a device, one of two lights will flash. The lights are
labeled +1 and −1 respectively. Every time a pair of systems leaves the central box, Alice
and Bob choose one of their measurement devices at random, put it in the path, and record
the observed outcomes.
OK, some Q&A’s:
• Q.: So what’s up with the talk of “systems”? What are these? Photons? Spins?
A.: Unspecified. For now, these could be puffs of hot air and the measurement
2 Physicists talking about Bell inequalities have a tendency of emphasizing entanglement, or the singlet state
and how the fact that it’s spin-0 means that angular momentum measurements are anti-correlated, and some such
things. These are not wrong and even mildly helpful for the design of experiments that lead to the falsification
we are after. All this is also completely secondary to the main point; a case of people sticking to their comfort
zone.
Figure 1.10: The ingredients of the CHSH scenario (for Clauser, Horne, Shimony and
Holt). Two experimentalists are located at different ends of a laboratory. Each can perform
one of two measurements on systems emanating from a box in the middle. Surprisingly, the
analysis of the set of correlations that are compatible with this extremely vaguely defined
scenario offers profound insights!
devices random number generators. Our analysis does not depend on assumptions
about their nature. (Also, what’s a photon?)
• Q.: Are Alice’s devices 1 and 2 different? Is Alice’s device 1 different from Bob’s
device 1?
A.: We do not need to make any assumptions about this.
• Q.: Why are the outcomes labeled ±1?
A.: That’s not really essential. This particular choice will work well with our analy-
sis, though.
• Q.: Can Alice rig her boxes together such that she can perform both measurement
on the same incoming system?
A.: For all we know at this point... maybe?
• Q.: Look man. You are clearly just avoiding my questions. Why don’t you study your
system first, and come back once you can give specific answers?!
A.: You got it backwards! The fewer assumptions I need to make, the more generally
applicable my conclusions will be.3
• Q.: How in the world does one come up with this?
A.: Well, it took physics a few decades. Also, literal Einstein missed it.
With the setup established, let’s look at the lab book produced by A&B. Here’s a
possible snapshot:
Alice Bob
i A1 A2 B1 B2
1 + −
2 + +
3 − −
4 + +
.. .. .. .. ..
. . . . .
3I once had a long discussion with colleague who refused to conceit this point, despite me applying all the
logic, persuasion, and appeals to authority I could muster. Very frustrating.
Obviously, in each round i, both Alice and Bob can fill out only the column corresponding
to the measurement they chose to make.
We will now argue that Assumption (1.19) puts quantitative constraints on the type
of data that can appear in this setting. Later, we will see that there are experiments that
violate these constraints—thereby disproving the general validity of (1.19). (Also, QM
predicts the violations correctly. That’s also interesting, but less relevant, because at this
point, we’re open to the idea that QM could be mistaken).
Concretely, if physical properties exist independently of observations, then there exits
a complete table, say
Alice Bob
i A1 A2 B1 B2
1 + − − −
2 + − + +
3 − − + −
4 + + + −
.. .. .. .. ..
. . . . .
and in each round, A&B just decide which of the pre-existing values to uncover.
In what may feel like an unmotivated move even by the standards of the present dis-
cussion, associate the expression
C = A1 B 1 + A1 B 2 + A2 B 1 − A2 B 2
which each complete row. There’s an elegant geometric construction that leads to this
particular formula (the keyword is Bell polytope) – but it takes some time to develop, so
let’s just work with it regardless of where it comes from. In our example:
Alice Bob
i A1 A2 B1 B2 C
1 + − − − −2
2 + − + + 2
3 − − + − 2
4 + + + − 2
.. .. .. .. .. ..
. . . . . .
Despite being the sum of four terms each valued ±1, the expression (in fact: its absolute
value) is upper-bounded by 2: Factoring out Alice’s variables and applying the triangle
inequality,
|C| = |A1 (B1 + B2 ) + A2 (B1 − B2 )| ≤ |B1 + B2 | + |B1 − B2 | = 2.
It may seem that we can’t extract observable predictions out of this discussion, because
the expression C involves all four variables, and by assumption, we only have access to
two of them in every round. But there’s a nice trick to get around this! Indeed, if C ≤ 2 in
every run, then so is the average
N
1 X (i)
⟨C⟩ = C
N i=1
over N runs. But averages are linear, and therefore ⟨C⟩ equals
⟨A1 B1 + A1 B2 + A2 B1 − A2 B2 ⟩ = ⟨A1 B1 ⟩ + ⟨A1 B2 ⟩ + ⟨A2 B1 ⟩ − ⟨A2 B2 ⟩.
Each of the four terms ⟨Ai Bj ⟩ can be estimated by A&B! If they choose their settings
at random, then by the law of large numbers (or, quantitatively, by the Chernoff bound),
their observed mean will converge to the true expected value in the limit of large N . Thus,
Assumption (1.19) implies that the linear combination of these four experimentally acces-
sible numbers be no larger than 2, up to statistical fluctuations that vanish in the large-N
limit. Such a test of (1.19) is called a Bell inequality.
Following up on pioneering works that led to the 2022 Nobel Prize, it is today fairly
routine to perform experiments that are compatible with the CHSH setup and yield a value
of ⟨C⟩ ≃ 2.7.
Thus, Assumption (1.19) must be rejected as a general feature of Nature.
1.5.2 Operational consequences of Bell inequality violations

The existence of Bell inequality violations imply some interesting “no-go theorems”, i.e.
statements showing that certain processes are impossible (similar to how the second law of
thermodynamics rules out the existence of perpetual motion machines). In the literature,
these results are usually derived relying on the quantum mechanical formalism. But it’s
both easier and more fundamental to conclude them purely from empirically observed
violations of (1.19).
Further reading: The exposition in this section is, unfortunately, not commonly found
in textbooks. It is based on this paper, which should be better known!
Joint measurements
2
Recall the Heisenberg uncertainty principle Varψ [X] Varψ [P ] ≥ ℏ4 . It is often verbally
summarized as stating that “position and momentum can’t be measured simultaneously.”
But the relation says no such thing. (Rather, it says that there’s no state |ψ⟩ that would
cause both position and momentum measurements to produce arbitrarily sharply concen-
trated outcomes.)
It is still true, however, that position and momentum cannot be measured simultane-
ously. What is more, this is true for any pair of observables that Alice can use in an
experiment that violates the CHSH inequality. Even better: This no-go statement does not
assume the validity of QM, but is an empirical fact about the universe we live in.
To state the result, we first have to say what we mean by “joint measurement”, again
without using quantum-mechanical concepts. Let’s say two measurement devices are
equivalent if give the same probability distribution over outcomes for every possible input
(Fig 1.11). Now consider two measurements 1, 2, say with two outcomes each. A joint
measurement machine for 1, 2 is a device J with two pairs of outcomes (Fig. 1.12). It
must be such that if one only considers the first pair, one obtains a measurement proce-
dure equivalent to 1; and if one only considers the second pair, one obtains a measurement
procedure equivalent to 2. The two original machines are said to be jointly measurable if
there exists a joint measurement machine for them.
Now assume that the two properties probed by Alice in the CHSH scenario are jointly
measurable and that the same is true for the two properties measured by Bob. They could
then use joint measurement machines to produce a complete table, with all properties
A1 , A2 , B1 , B2 provided in every round. The definition of a joint measurement machine
and of equivalent measurement implies that each pair i, j, the marginal distributions for
Figure 1.11: Top panel: Each physical property can be measured in many equivalent ways.
Bottom panel: Formalization this observation for probabilistic theories. Two measurement
devices 1, 1′ are equivalent if for every preparation procedure P , measuring 1 or 1′ leads
to identical probability distribution over outcomes.
Figure 1.12: (i) Two two-outcome measurement devices, 1 and 2, like the ones held by
Alice in the CHSH scenario. They are jointly measurable if there exists a measurement
device J that produces two pairs of outcomes such that: (ii) The first pair (cyan) alone
defines a measurement that is equivalent to 1, and (iii) The second pair (pink) alone defines
a measurement that is equivalent to 2.
Ai Bj the arise this way are identical to the ones that the original measurement devices
realize. In particular, the correlation function C must be the same in both cases. But, as
proven above, in this case |C| ≤ 2.
The contrapositive: In a universe where the CHSH inequality can be violated (such as
ours), there must be pairs of physical properties that cannot, as a matter of principle, be
jointly measured. This is a remarkably far-reaching statement to follow from empirical
observations alone!
• Q.: Wait. In our earlier Q&A, you said that as far as you knew, Alice could measure
her two properties jointly.
A.: And that was the right answer at that point in the analysis! We didn’t have to
assume incompatibility. We derived it. Like the cool kids.
No cloning
Define a universal cloning machine to be a process that takes one physical system as input
and outputs two systems such that: Applying any measurement device to the first or to
the second output is equivalent to applying it to the input. It is clear that the existence of
a universal cloner implies the existence of a joint measurement machine for any pair of
properties (Fig. 1.13). Again, we conclude that in a universe where CHSH violations are
observed, cloning is impossible.
Figure 1.13: Top: A universal cloning machine (i) is a device that takes one physical
system as input and outputs two physical systems, where each of the outputs is indistin-
guishable from the input under any measurement (ii), (iii). Bottom: A cloner can be used
to construct a joint measurement machine.
There’s a famous paper (cited in an academic publication about once every day!) that
derives the no-cloning theorem from quantum mechanics. Here’s their proof: If U is an
operator that “clones two orthogonal states” in that
U |0⟩ = |00⟩, U |1⟩ = |11⟩,
then by linearity,
1 1 1 1
U √ (|0⟩ + |1⟩) = √ (|00⟩ + |11⟩) ̸= √ (|0⟩ + |1⟩) ⊗ √ (|0⟩ + |1⟩),
2 2 2 2
so it necessarily fails to clone superpositions of the two states. That’s cool and all, but note
that it assumes the validity of quantum mechanics, whereas our argument doesn’t!
True randomness
Assume I put a dice in a cup, shake it vigorously, and put the cup upside down on a table.
Nobody will have any idea how many eyes the dice shows, so one might well model the
situation by ascribing a probability of 1/6 to any of the possible outcomes. But note that
this description only reflects my ignorance about the true state of the dice. There is no
doubt that some side is facing up even before I lift the cup. In fact, it is conceivable
in principle that a computer coupled to a camera that captured my motions might solve
Newton’s equations and predict the state of the dice accurately. Let’s refer to a variable as
pseudo-random if such a prediction is possible in principle, and as truly random otherwise.
A priori, it is unclear whether true randomness exists at all. (Pascal’s demon refers to
a thought experiment that suggests that none does).
But CHSH violations are only possible if the outcomes of Alice and Bob are truly
random. For if some process could predict the outcomes, it could do so independently of
which property they choose to measure. It could therefore predict the full table, and we
are back at the proof by contradiction outlined above.
The fact that no outside observer can predict the outcomes of Alice and Bob means
that they are, in this sense, “private” to them. This observation is the basis of provably
secure quantum key distribution protocols.
1.5.3 Interpretations
We have presented a negative argument that rules out the classical model of a world that
evolves independently from observations. It is widely argument accepted today. How-
ever, there is no positive agreement what, if anything, should replace it. Below are some
common reactions as I see them.
The orthodox position is to say that the purpose of science is to make empirically
testable predictions. QM excels at this task. Counterfactual questions about “what would
have happened had you measured something else” just amount to storytelling and lie out-
side the remit of science. So Bell is interesting for its operational consequences (Sec. 1.5.2),
but philosophically, there’s not much to be done other than to shrug and move on.
Problems with this position: (1) It is rather unambitious. Theoretical physics has his-
torically offered more than just the ability to predict detector click patterns. To just dis-
allow hypotheticals feels like giving up too early. (2) The elements of reality critique
explained next.
The Bohmians point out that sometimes, one can predict the outcome of a measure-
ment with 100% certainty. (E.g., for a system in the singlet state, when Alice measured
spin along one axis and obtained ↑, Bob will definitely obtain ↓ w.r.t. the same axis). They
argue that in such situations, reality doesn’t change if the now somewhat redundant mea-
surement is performed – so that if we consider outcomes to be real, there must already
have been some element or reality representing them before the measurement. Speaking
in terms of the lab books we analyzed above, they therefore posit that there always is a full
table representing the true state of all elements of reality at any time, measured or not.
By Bell’s argument, the table can’t be independent of the measurements made. A
more detailed analysis shows that one can accommodate CHSH violations only if Bob’s
variables change as a result of Alice interacting with her side of the joint system (and / or
vice versa). There is a simple model developed by David Bohm showing that QM can in
principle be interpreted in such a realistic (i.e. properties have values whether measured
or not) but non-local (i.e. the unmeasured parameters change due to actions far away)
way. In Bohm’s model, the change of unmeasured parameters happens in a subtle way
that is strong enough to enable CHSH violations, but too weak to allow for the exchange
faster-than-light signals between far away parties.
Therefore, the Bohmians argue, such a description is both necessary (by the elements-
-of-reality argument) and possible (by Bohm’s model). There is thus no paradox, and we
should concentrate on working out the details.
The problem with this position is that you get into tension with special relativity even in
the absence of superluminal signals. Recall that if A&B’s actions are space-like separated,
their order in time is observer-dependent. So how can I think about Alice’s actions causing
change at Bob’s end, when in some reference frames, Bob acted first?
The loopholers maintain that there are further implicit assumptions in the analysis,
some of which have to be rejected. After improved experimental techniques in the past
few years, the (unfortunately-named) free will loophole is the last major one standing.
Recall that we have assumed that A&B choose their settings randomly. More precisely,
the empirical means for Ai Bj only converge to the expected values ⟨Ai Bj ⟩ if the probabil-
ity of choosing a setting is independent of its value. (Think of an election pollster calling
random citizens on their landlines during work hours, to ask about their voting intentions.
Retired people are more likely to answer the phone—potentially skewing the result, as their
voting preferences are different from the population as a whole). But A&B are physical
systems, too! They share a common history with the central box. It is therefore unjustified,
it is argued, to assume that they can make independent choices.
Problems: (i) The position “proves too much”. It seems like it can be used as a general
argument against all of empirical science (“apples mostly fall up, but we only look when
they fall down”). (ii) One can design the choice function of A&B in such a way that it
would take one sophisticated cosmic conspiracy to still produce a CHSH value of 2.7.
People have performed Bell experiments where the settings were driven by fluctuations in
the cosmic background radiation measured at different sections of the night sky, XOR’ed
against the input of internet users participating in an online action game.
The many-worlders content that QM anyway has a philosophical problem (the one
we didn’t address in Sec. 1.3.1), so let’s fix all issues in one fell swoop. They then throw
out the measurement postulate and posit that there exists a “wave function of the universe”
that evolves under a global Hamiltonian. The reality we experience is an emergent feature
of this wave function – not a pre-existing concept like in standard QM.
Without a measurement postulate that will probabilistically pick one “branch of a su-
perposition”, all of them have an equal right to being considered as “real”. For example,
if |,⟩ is the state of all of my elementary particles that correlates with me feeling happy,
then summands in a superposition state like α|↑⟩|δt⟩|,⟩ + β|↓⟩| − δt⟩|/⟩ (encountered
in Eq. (1.14)) should be interpreted as different co-existing “worlds” in which my feelings
are correlated with other degrees of freedom of the universe. In particular, in a CHSH
experiment, all possible outcomes are simultaneously realized in different branches of the
wave function. Any philosophical problem tied to the assumption that only one branch
actually happens is thus spurious.
The problem here is that the measurement postulate, clunky as it may be, is what
connects the formalism to reality! If you claim it’s unnecessary, it’s on you to re-derive
the empirical content of the theory in this reduced framework. One important touchstone
is the Born rule which says in this language that “if my wave function splits into two
branches with amplitudes α and β, I experience these with probability |α|2 , |β|2 respec-
tively”. Researchers working on many-world formulations therefore spend a lot of time
thinking about probabilities and their interpretation (but, to my personal taste, haven’t
cracked this nut yet).
1.6 Further reading

To repeat the basics:
• Quantum Mechanics by Leslie Ballentine is a nice presentation that’s somewhat
more careful than many textbooks without being too mathematical.
• Quantum Mechanics 1 & 2 by Cohen-Tannoudji and friends contains an enormous
amount of optional material for each chapter. It can thus both be used as an intro-
ductory textbook and as a reference.
• Modern Quantum Mechanics and Advanced Quantum Mechanics by Sakurai will
also be used for later parts of this course.
The quantum model of the measurement process is described in Chapter 12 of Quantum

Theory by Asher Peres. A classic volume on decoherence theory is Decoherence and the
Appearance of a Classical World in Quantum Theory by Jost, Zeh, Kiefer (of Cologne),
Giulini, Kupsch, and Stamtescu. A standard introduction to quantum computing is Quan-
tum Information and Computation by Nielsen and Chuang. The operational consequences
of Bell violations follow Quantum Information Theory: An Invitation by Reinhard Werner.
Chapter 2
Indistinguishable particles
2.1 Bosonic and Fermionic Hilbert spaces

The tensor product construction (Sec. 1.2.1) of the Hilbert space of two distinguishable
particles was guided by the need to represent observables for properties of the first or
second particle alone. (For example, “What is the expected position of the first particle?”,
or “Does the second particle’s spin point up?”, which turned out to be represented by
observables of the form A ⊗ 1 and 1 ⊗ B respectively). However, electrons, say, seem
to be indistinguishable in the sense that any experiment that is sensitive to one electron
will be equally sensitive to any other. It thus makes sense to search for a joint Hilbert
space that only supports observables like “What is the expected position averaged over
all particles?” or “How many particles have their spins pointing up?” that do not involve
unphysical references to specific particles.
The same issue

already
arises in classical mechanics, where e.g. the two configu-
q1 q2
rations , of point particles describe the same physics. In mechanics,
q2 q1
this redundancy does not usually seem to lead to wrong predictions (unlike the QM
case, as we will see shortly). There’s some indication that things are amiss, though.
The Gibbs paradox says that the classical thermodynamical treatment of a gas of
identical particles does give wrong results unless indistinguishable configurations
are counted only once. However, this doesn’t quite falsify the redundant formu-
lation of classical mechanics, as the connection between this microscopic theory
and thermodynamics depends on unproven assumptions (e.g. the maximum entropy
principle), and so the problem could lie somewhere else.
There’s a simple construction that seems to account for all fundamental particles. Let
H(1) be a single-particle Hilbert space with basis {|i⟩}. If the particles were distinguish-
able, a general element of the n-body joint Hilbert space would be
X
|ψ⟩ = ψi1 ,...,in |i1 , . . . , in ⟩ ∈ (H(1) )⊗n .
i1 ,...,in
Let’s look for subspaces of (H(1) )⊗n that make sense for indistinguishable particles. Let
τkl be the operator that exchanges the k-th and the l-th factor:
τkl (| . . . , ik , . . . , il , . . . ⟩) = | . . . , il , . . . , ik , . . . ⟩.
If the particles are indistinguishable, then |ψ⟩ and τkl |ψ⟩ should describe the same physics.
This is certainly true of they differ at most by a phase factor. Because τkl 2
= 1, such a
36
CHAPTER 2. INDISTINGUISHABLE PARTICLES 37
phase must be ±1. The totally symmetric or Bosonic subspace Symn (H(1) ) consists of all
vectors such that
τkl |ψ⟩ = |ψ⟩ ∀k, l.
The totally anti-symmetric or Fermionic subspace ∧n (H(1) ) (“wedge-n”) consists of all

vectors such that
τkl |ψ⟩ = −|ψ⟩ ∀k, l.
At this point, many texts “prove” that the construction leading to Fermions and
Bosons are the only conceivable ways for building a quantum theory of indistin-
guishable particles. I find all these arguments inconsistent and unhelpful to the
degree that I’m prepared to claim the world would be better if they all just be for-
gotten. Ask me about it, or maybe don’t.
2.1.1 Permutations and occupation numbers

The operators τkl (called transpositions) generate the group Sn of all permutations of
the n factors. Recall that a permutation is a way of re-arranging the symbols 1, 2, . . . , n
(Fig. 2.1). The sign of a permutation π is

+1 σ is product of an even number of transpostions
sgn(σ) = .
−1 σ is product of an odd number of transpositions
The Bosonic and Fermionic Hilbert spaces can therefore also be defined as the sets of
vectors such that
π|ψ⟩ = |ψ⟩, Bosons

π|ψ⟩ = sgn(π)|ψ⟩ Fermions
for all permutations π ∈ Sn .
You have encountered this concept before, in the definition of the determinant of an
(n × n)-matrix:
X n
Y
det M = sgn(π) Mi,π(i) . (2.1)
π∈Sn i=1
(a) (b)
π1
π1 = π2π1
π2
Figure 2.1: (a) A permutation can be represented as a graph, where each position indicates a letter,
and where the arrows points to where each letter is mapped. (b) One can multiply permutations σ1
and σ2 by performing one after the other.
How many permutations of n letters are there? There are n ways of choosing a new
place for the first symbol, then n − 1 ways for the second symbol (as we can’t repeat the
first one), etc, for a total of
|Sn | = n(n − 1) · · · 2 · 1 = n! .
This explains the various “factorials” that will appear in formulas below.
We can now find bases for the Bosonic / Fermionic subspaces. Indeed, if
X
|ψ⟩ = ψi1 ,...,in |i1 , . . . , in ⟩
i1 ,...,in
is Bosonic, then |ψ⟩ = π|ψ⟩ for all π and thus

1 X X 1 X
|ψ⟩ = π|ψ⟩ = ψi1 ,...,in π|i1 , . . . , in ⟩ . (2.2)
n! π i ,...,i
n! π
1 n
The vector in parentheses only depends on the number of times nk each single-particle
basis element |k⟩ appears in the product |i1 ⟩ . . . |in ⟩. This P
motivates the definition of the
occupation number basis. For ni ∈ {0, 1, 2, . . . } such that i ni = n, set
1 X
|n1 , n2 , . . . ⟩ := p Q π| 1, . . . , 1, 2, . . . , 2, . . . ⟩
n! k nk ! π∈Sn
| {z } | {z }
n1 × n2 ×
1 X
=p Q π(|1⟩⊗n1 |2⟩⊗2 . . . ). (2.3)
n! k nk ! π∈Sn
The funky factorial factor makes the vector normalized (check it!). By (2.2), any Bosonic
state vector can be expanded in the occupation number basis.
Recall the triplet states of two spin-1/2 particles

1
|↑↑⟩, √ |↑↓⟩ + |↓↑⟩ , |↓↓⟩.
2
They are clearly invariant under permutations of the particles. In occupation number
notation with respect to the |↑⟩, |↓⟩-basis, the triplet states are
|2, 0⟩, |1, 1⟩, |0, 2⟩.
If |ψ⟩ is Fermionic, then arguing as above gives

X 1 X
|ψ⟩ = ψi1 ,...,in sgn(π)π|i1 , . . . , in ⟩ . (2.4)
i ,...,i
n! π
1 n
Anti-symmetry makes things a bit more exciting, though: Again look at the vector in
parentheses for some choice i1 , . . . , in of single-particle states. If one state occurs twice
(say ik = il ), then
| . . . , ik , . . . , il , . . . ⟩ + sgn(τkl ) τkl | . . . , ik , . . . , il , . . . ⟩ = 0
which implies that the sum is 0. Therefore, in the Fermionic occupation number basis
1 X
sgn(π)π |1⟩⊗n1 |2⟩⊗n2 . . . ,

|n1 , n2 , . . . ⟩ := √ (2.5)
n! π∈Sn
nk must be either 0 or 1. This explains the Pauli principle! Beware that in the Fermi
case, the sign of the occupation number basis elements (2.5) depend on an ordering of
single-particle basis vectors.
For the anti-symmetrization of general single-particle vectors |α1 ⟩, . . . , |αn ⟩, one also
uses the wedge product notation
1 X
|α1 ⟩ ∧ · · · ∧ |αn ⟩ := √ sgn(π)π |α1 ⟩ ⊗ · · · ⊗ |αn ⟩
n! π∈Sn
pronounced “alpha one, wedge alpha two, ...”. Wedge products are also called Slater de-
terminants. That’s because one can express the wedge product as a “formal determinant”:
|α1 ⟩(1) |α2 ⟩(1) . . . |αn ⟩(1)

 
1  |α1 ⟩(2) |α2 ⟩(2) . . . |αn ⟩(2) 

|α1 ⟩ ∧ · · · ∧ |αn ⟩ = √ det  . ..  .
 
. ..
n!  . . . 
(n) (n)
|α1 ⟩ |α2 ⟩ . . . |αn ⟩(n)
Here, the super-scripts indicate which tensor factor the vector belongs to.
The singlet state √12 |↑↓⟩ − |↓↑⟩ = |↑⟩ ∧ |↓⟩ is clearly anti-symmetric. In occu-

pation number notation with respect to the |↑⟩, |↓⟩-basis, it is given by |1, 1⟩.
Assume dim H(1) = d < ∞. In both the Bose and the Fermi case, the occupation
number bases give us a combinatorial way to compute the dimension of the Hilbert
spaces.
Fermions: Basis elements are labeled by subsets S ⊂ {1, . . . , 1} of size |S| = n.
Thus

d
dim ∧n Cd =

.
n
Bosons: Basis elements are labeled by a partition n = di=1 ni of n into d non-

P
negative parts. There’s a cute combinatorial argument for computing the number of
such partitions. The answer is

n+d−1
dim Symn Cd =

. (2.6)
n
Can you find it? (Spoiler: Search for “stars and bars”).
The occupation number basis adds another possible meaning to the heavily over-
P of “a list of numbers in a ket”. In particular, in |n1 , n2 , . . . ⟩ =
loaded notation
1
√ Q π π|1, . . . , 1, 2, . . . , 2, . . . ⟩ the numbers in the ket on the l.h.s. count
n! k nk !
occupations, while the numbers in the ket on the r.h.s. are indices of some single-
particle basis. Which of these definitions is meant, and which single-particle basis it
is relative to, and whether the occupation numbers are for Fermions or for Bosons,
or whether the numbers have nothing to do with these many-body concepts and are
more general “quantum numbers” (like the labels |n, l, m⟩ of the atomic basis) has
to be inferred from context. There’s no general, reliable rule.
Look. If I were the emperor of physics, I’d outlaw this mess. But I’m not and
everybody is using it. After you got used to it, you’ll find that this convention
causes surprisingly few catastrophic misunderstandings.
Summary
Let H(1) be a single-body Hilbert space with basis {|i⟩}. Then a general state of n
indistinguishable particles can be expressed in the occupation number basis as
X
|ψ⟩ = cn1 ,n2 ,... |n1 , n2 , . . . ⟩,
n1 ,n2 ,...
P is over ni ∈ {0, 1, 2, . . . } for Bosons and ni ∈ {0, 1} for Fermions,

where the sum
and where i ni = n. The occupation number basis is defined as
1 X
|n1 , n2 , . . . ⟩ = p Q (sgn π)ζ π(|1⟩⊗n1 |2⟩⊗2 . . . ), (2.7)
n! k nk ! π∈Sn
where ζ = 0 for Bosons and ζ = 1 for Fermions.
2.1.2 Single-particle operators

We started this section remarking that a full tensor product Hilbert space supports more
observables than are physically meaningful for indistinguishable particles. Let A(i) be an
operator acting on the i-th particle. Then (why?)
πA(i) π −1 = A(πi ) .
Thus, if |ψ⟩ is Bosonic or Fermionic,

1 X (i) 1 X −1 (i)
tr A(i) |ψ⟩⟨ψ| = tr A π|ψ⟩⟨ψ|π −1 = tr π A π|ψ⟩⟨ψ|
n! n!
π∈Sn π∈Sn
n
1 X (j)
= tr A |ψ⟩⟨ψ| . (2.8)
n j=1
A measurement on any one particle is thus equal to the average over all of them – the
formalism no longer allows us to pick out the properties of individual particles.
Now assume A has an eigendecomposition
X
A= λi |i⟩⟨i|.
i
Then for an element of the occupation number basis with respect to the eigenbasis {|i⟩} of
A, one computes from (2.7)
n
X X
A(j) |n1 , n2 , . . . ⟩ = λi ni |n1 , n2 , . . . ⟩. (2.9)
j=1 i
In particular, single-body operators are diagonal in the occupation number basis. If the
single-body eigenvalues are sorted λ0 ≤ λ1 ≤ . . . , then the lowest n-body eigenvalue in
the Bosonic case is nλ0 and in the Fermionic case λ0 + · · · + λn−1 . For Fermions, if the
λi ’s describe energies, then λn−1 , the largest energy still occupied in the ground state, is
called the Fermi energy.
2.1.3 The exchange interaction

Goals
The Coulomb repulsion term between two electrons, h(1,2) ∝ ∥x1 − x2 ∥−1 , does
not depend on spin. However, when combined with the anti-symmetrization postu-
late for Fermions, an effective coupling between electron spins arises. It is impor-
tant, e.g. in magnetism and atom physics. We’ll look at a simple case: the electrons
of the Helium atom in first-order perturbation theory.
Treating the nucleus as fixed, the Hamiltonian for the Helium atom is
H = H0 + h(1,2) , H0 = h(1) + h(2) ,

Pi2 2e2 1 e2 1
h(i) = − , h(1,2) = .
2m 4πϵ0 ∥xi ∥ 4πϵ0 ∥x1 − x2 ∥
The eigenfunctions of the single-body Hamiltonian are the same as those for hydrogen
(with Bohr radius halved on account of the higher charge), and with arbitrary spin:
1 1
|ϕn,l,m ⟩|s⟩, n ≥ 0, l ≤ n − 1, −m ≤ l ≤ m, − ≤s≤ .
2 2
By Sec. 2.1.2, their Slater determinants diagonalize the non-interacting part H0 .
Warm up: The ground state

Write |1⟩ := |ϕ1,0,0 ⟩ for short. The ground state of H0 is given by
1
|1↑⟩ ∧ |1↓⟩ = √ |1↑⟩|1↓⟩ − |1↓⟩|1↑⟩ .
2
That is: both electrons are in the single-body ground state (spectroscopic notation: 1s2 ),
with anti-parallel spins. The ground state vector becomes a lot clearer when we group the
spatial and the spin degrees of freedom together:
1 1
|1⟩|↑⟩ ∧ |1⟩|↓⟩ = √ (|1⟩|1⟩|↑⟩|↓⟩ − |1⟩|1⟩|↓⟩|↑⟩) = |1⟩|1⟩ √ |↑⟩|↓⟩ − |↓⟩|↑⟩ (2.10)
2 2
Let’s analyze this. The permutation τ exchanges all degrees of freedom of the electrons:

τ (|ϕ1 ⟩|s1 ⟩)(|ϕ2 ⟩|s2 ⟩) = (|ϕ2 ⟩|s2 ⟩)(|ϕ1 ⟩|s1 ⟩)
We could also define operators τ (space) and τ (spin) that only act on one of them:

τ (space) (|ϕ1 ⟩|s1 ⟩)(|ϕ2 ⟩|s2 ⟩) = (|ϕ2 ⟩|s1 ⟩)(|ϕ1 ⟩|s2 ⟩),

τ (spin) (|ϕ1 ⟩|s1 ⟩)(|ϕ2 ⟩|s2 ⟩) = (|ϕ1 ⟩|s2 ⟩)(|ϕ2 ⟩|s1 ⟩)
so that τ = τ (space) τ (spin) . The Hamiltonian H commutes not only with τ , but (in this
case) with τ (space) and τ (spin) individually. We can therefore find a common eigenbasis, i.e.
energy eigenvectors that also have well-defined parity with respect to the exchange of each
of the spatial and the spin parts. To get anti-symmetry under τ , exactly one of these two
parts has to be anti-symmetric. That’s what happened in (2.10).
The energy correction induced by the interaction in first-order perturbation theory is
⟨1|⟨1|⟨Ψ− |h(1,2) |1⟩|1⟩|Ψ− ⟩ = ⟨1|⟨1|h(1,2) |1⟩|1⟩

2e2
Z
1
= |⟨x1 |1⟩|2 |⟨x2 |1⟩|2 d3 x1 d3 x2 . (2.11)
4πϵ0 ∥x1 − x2 ∥
This expression – called the Coulomb or direct integral – equals the expected value of the
repulsion term experienced by two classical electrons that are found at x with probability
density |⟨x|1⟩|2 .
Excited states
The first excited states of H0 are the ones where one electron remains in the ground state
and one is in |ϕ2,0,0 ⟩ =: |2⟩ (spectroscopic: “1s, 2s”). Taking spin into account, the first
excited energy of the non-interacting Hamiltonian is thus four-fold degenerate:
|1⟩|s1 ⟩ ∧ |2⟩|s2 ⟩ si ∈ {↑, ↓}.
As discussed above, we can choose a basis of states that are symmetric / anti-symmetric in
the spatial and spin degrees individually:
o
√1 |1⟩|2⟩ + |1⟩|2⟩ √1 |↑↓⟩ − |↓↑⟩

2 2
(S = 0, “singlet”)

√1 |1⟩|2⟩ − |1⟩|2⟩ √1 |↑↓⟩ + |↓↑⟩

2 2


√1 |1⟩|2⟩ − |1⟩|2⟩ |↑↑⟩ (S = 1, “triplet”)
2
√1 |1⟩|2⟩ − |1⟩|2⟩ |↓↓⟩


2
Again, the energy correction only depends on the spatial part. In particular it is the same
for the last three vectors. For the first two, we get
1
⟨1|⟨2| ± ⟨1|⟨2| h(1,2) |1⟩|2⟩ ± |1⟩|2⟩ = ⟨1|⟨2|h(1,2) |1⟩|2⟩ ± Re⟨1|⟨2|h(1,2) |2⟩|1⟩.

2
The first matrix element is again a “Coulomb integral”
2e2
Z
1
I := ⟨1|⟨2|h(1,2) |1⟩|2⟩ = |⟨x1 |1⟩|2 |⟨x2 |2⟩|2 d3 x1 d3 x2 > 0,
4πϵ0 ∥x1 − x2 ∥
which allows for the same probabilistic interpretation as given for Eq. (2.11). The second
one is called the exchange integral
2e2
Z
1
J := ⟨1|⟨2|h(1,2) |2⟩|1⟩ = ⟨1|x1 ⟩ ⟨2|x2 ⟩ ⟨2|x1 ⟩ ⟨1|x2 ⟩ d3 x1 d3 x2 .
4πϵ0 ∥x1 − x2 ∥
The exchange integral is also positive, although that’s less obvious.
To see this, rewrite the exchange integral as
2e2
Z
1
J= ⟨1|x1 ⟩ ⟨2|x1 ⟩ ⟨1|x2 ⟩⟨2|x2 ⟩ d3 x1 d3 x2 .
ϵ0 4π∥x1 − x2 ∥
Defining
Z
1
ϕ(x) := ⟨1|x⟩⟨2|x⟩, A := |x1 ⟩ ⟨x2 | d3 x1 d3 x2 ,
4π∥x1 − x2 ∥
the integral is of the form ⟨ϕ|A|ϕ⟩ with A a translation-invariant. By Eq. (A.24), A

is diagonal in the Fourier basis, with eigenvalues given by (2π)3/2 times the Fourier
transform of f (x) = 1/(4π∥x∥). From Eq. (C.18), (2π)3/2 f˜(k) = ∥k∥ 1
2 , so that
2e2 |⟨ϕ|k⟩|2 3
Z
J= d k > 0.
ϵ0 ∥k∥2
The effect of the interaction is thus twofold: (i) It uniformly increase the energies by
the Coulomb term I describing the expected repulsion felt by the two electrons (as one
would expect). (ii) It introduces a splitting by 2J of the energies between the symmetric
S = 1 and anti-symmetric S = 0 spin states. The physical way to think about the second
effect is that anti-symmetry in the spatial part “allows the electrons to avoid each other”,
thus decreasing the energy penalty due to electron-electron repulsion.
The Heisenberg model

We have seen that within the 1s, 2s-space, the energy depends only on the spin configura-
tion. Let’s map it to an effective 2-spin model by setting:
|s1 , s2 ⟩ := |1⟩|s1 ⟩ ∧ |2⟩|s2 ⟩.
In this two-spin Hilbert space, the effective Hamiltonian is, up to an irrelevant global shift
of energies,
Heff = −Jτ.
We can write the transposition τ as (excercise!)
σj σj = 1 + σ (1) · σ (2)
(1) (2)
X
τ=
j∈0,x,y,z
and thus, up to another shift,
Heff = −J σ (1) · σ (2) . (2.12)
The exchange principle can thus be described as an effective interaction between the two
spins. Equation (2.12) is an embryonic version of the Heisenberg model of magnetism.
2.2 Second quantization
Goals
This section is mostly formal (definitions, generic constructions). Not too excit-
ing? Maybe. But familiarizing you with the formalism of “second quantization” is
one of the most important goals of this lecture. Much builds on it. Be alert!
So far, we have considered systems with a fixed number n of particles. We will now
treat the particle number as variable. Mathematically, this actually simplifies some cal-
culations (we won’t have to worry about combinatorial expressions like (2.6) any more).
Physically, this step is necessary e.g. for relativistic theories, where different species of
particles can be converted into each other.
2.2.1 Fock space

Start with a single-particle Hilbert space H(1) . To describe systems with an indefinite
particle number, we’ll use superpositions
∞
X
|ψ⟩ = |ψn ⟩
n=0
with |ψn ⟩ ∈ Symn (H(1) ) (Bosons) or |ψn ⟩ ∈ ∧n (H(1) ) (Fermions). Terms corresponding
to different particle numbers are taken to be orthogonal, so that inner products are
∞
X
⟨ψ|ψ ′ ⟩ = ⟨ψn |ψn′ ⟩.
n=0
The resulting Hilbert space is called the symmetric/anti-symmetric Fock space

∞
M
FS H(1) = Symn (H(1) ) (Bosons),
n=0
M∞
FA H(1) = ∧n (H(1) )

(Fermions).
n=0
Wait, n = 0 is included? That’s right, we allow for systems with zero particles. To make
sense of that, define
(H(1) )⊗0 , ∧(0) (H(1) ), Sym(0) (H(1) ) := C1 ,
the Hilbert space of one-component vectors. Up to a phase, it only contains a single
normalized vector, which is called the vacuum and denoted as |vac⟩ or |0⟩.
This construction is very transparentP in the occupation number basis, where it basically
amounts to removing the constraint i ni = n (and all the combinatorial nastiness that
comes with it). With respect to a basis {|i⟩} of H(1) , Fock space is the Hilbert space with
basis |n1 , n2 , . . . ⟩, where ni ∈ {0, 1, 2, . . . } for Bosons and ni ∈ {0, 1} for Fermions.
The vacuum is |0, 0, . . . ⟩ = |vac⟩ = |0⟩.
2.2.2 Creation and annihilation operators

Recall the treatment of the quantum harmonic oscillator (Appendix A.2.1). There, one
introduces the ladder operators that create/destroy excitations in the sense that
√ √
a† |n⟩ = n + 1|n + 1⟩ ⇔ a|n⟩ = n|n − 1⟩. (2.13)
The definition might feel a bit unmotivated at first, but it turns out to radically simplify
the analysis. Ladder operators can likewise be introduced on Fock space, and once more,
they turn out to simplify calculations with indistinguishable particles much more than one
could expect.
For any single-particle state |α⟩, the creation operator a†α is defined via its action on
n-particle states |ψn ⟩ as
√ 1 X
a†α |ψn ⟩ = n + 1 (sgn π)ζ π |α⟩ ⊗ |ψn ⟩

(2.14)
(n + 1)!
π∈Sn+1
| {z } | {z } | {z }
scale to match (2.13) (anti-)symmetrize add particle in state |α⟩
with ζ = 0 (Bosons) and ζ = 1 (Fermions). The associated annihilation operator is the

adjoint: aα = (a†α )† .
Equation (2.14) is commonly summarized as “a†α creates a particle in state |α⟩”.

This phrase should be thought of √ as a mnemonic, not as a definition. For one,
it omits the crucial scale factor n + 1. What is more, creation operators don’t
usually have a direct physical interpretation (Sec. 2.3.2). Rather, they appear as
mathematical building blocks that allow for a convenient representation of local
operators (Sec. 2.2.3).
It is slightly unfortunate that a† is a more natural starting point than a, requiring the
round-about definition of a as (a† )† . On the upside, the “dagger” symbol used by
physicists to denote the adjoint looks a bit like a “+”, so one can easily remember
that a† is the one that “adds” a particle.
We’ll usually fix a basis {|i⟩} of the single-body Hilbert space and work in the as-
sociated occupation number basis, where the ladder operators act in a transparent way.
Eq. (2.14) implies
√
a†i | . . . ni−1 , ni , ni+1 . . . ⟩ = ni + 1(−1)ζ j<i nj | . . . ni−1 , ni + 1, ni+1 . . . ⟩.
P
Here, we use the convention that |n1 . . . ⟩ equals 0 if one of the occupation numbers is
negative, or, in the Fermionic case, additionally if one occupation number exceeds 1. Ex-
plicitly, for Bosons:
√
a†i | . . . ni−1 , ni , ni+1 . . . ⟩ = ni + 1| . . . ni−1 , ni + 1, ni+1 . . . ⟩,
√ (2.15)
ai | . . . ni−1 , ni , ni+1 . . . ⟩ = ni | . . . ni−1 , ni − 1, ni+1 . . . ⟩,
and for Fermions
a†i | . . . ni−1 , ni , ni+1 . . . ⟩ = (−1)
P
nj
j<i | . . . ni−1 , ni + 1, ni+1 . . . ⟩,
P
nj
(2.16)
ai | . . . ni−1 , ni , ni+1 . . . ⟩ = (−1) j<i | . . . ni−1 , ni − 1, ni+1 . . . ⟩.
Iterating, any basis element can be written using creation operators acting on the vacuum:
(a† )n1 (a†2 )n2

|n1 , . . . ⟩ = √1 √ . . . |0⟩. (2.17)
n1 ! n2 !
Basis expansions and field operators

Choose a single-body basis {|i⟩} and a state |α⟩ ∈ H (1) . Plugging the expansion
X
|α⟩ = |i⟩⟨i|α⟩
i
into (2.14) shows that “creation operators can be expanded like kets and annihilation op-
erators like bras”:
⟨i|α⟩a†i ⇒ aα =
X X
a†α = ⟨α|i⟩ai . (2.18)
i i
We don’t need to restrict ourselves to normalizable states. For example, if |α⟩ = |x⟩
is a delta function centered at x ∈ R3 and |i⟩ = |ϕi ⟩ for some smooth function ϕi (x) in
L2 (R3 ), then the above reads
X X
|x⟩ = |ϕi ⟩⟨ϕi |x⟩ = ϕ̄i (x)|ϕi ⟩,
i i
and thus the operators “creating / destroying a particle at position x” are
ϕ̄i (x)a†i ⇒ ax =
X X
a†x = ϕi (x)ai .
i i
Recall that a classical field is any physical quantity that depends on points in space.
The ax are quantum operators depending on points in space, and thus a first example
of a quantum field. These annihilation field operators and their Heisenberg-picture time
evolution are commonly written as
t t
Ψ̂(x) := ax , Ψ̂(t, x) := ax (t) = e− iℏ H ax e iℏ H .
Despite the similarity in notation, the field operators Ψ̂(x) should not be confused with
wave functions ψ(x) ∈ L2 (R3 )!
All the caveats that apply to delta functions (App. A.1.8) likewise apply to the
Ψ̂(x). In particular, formulas involving field operators have physical content only
when integrated against smooth functions. (In the mathematical literature, the Ψ̂(x)
are therefore referred to as operator-valued distributions, to indicate that they give
proper operators only after an integration). See the discussion around (2.23) for an
example of how this pans out.
The converse of the above construction also works. From the completeness relation for
delta functions (A.13):
Z Z
|α⟩ = α(x) |x⟩ d3 x ⇒ a†α = α(x) Ψ̂† (x) d3 x. (2.19)
Commutation relations
As is the case for the treatment of the harmonic oscillators with ladder operators, their
commutation relations are important in calculations.
To treat the Bosonic and Fermionic cases in parallel, introduce the notation
[A, B]ζ := AB − (−1)ζ BA
so that
[A, B]ζ = AB − BA = [A, B] (Bosons, ζ = 0),

[A, B]ζ = AB + BA = {A, B} (Fermions, ζ = 1).
From (2.15, 2.16), one finds
[ai , a†j ]ζ = δij 1, [ai , aj ]ζ = [a†i , a†j ]ζ = 0. (2.20)
More generally, combining these basis-dependent relations with (2.18) gives
[aα , a†β ]ζ = ⟨α|β⟩ 1 (2.21)
which for field operators formally reads
[Ψ̂(x), Ψ̂† (y)]ζ = δ(x − y) 1. (2.22)

How should one interpret Eq. (2.22)? Recall the general rule that expressions in-
volving delta functions carry meaning only when integrated against smooth func-
tions. Viewed this way, (2.22) turns out to be an equivalent restatement of the un-
problematic version (2.21). Indeed, for smooth functions α(x), β(x), combining
Eq. (2.19) and Eq. (2.22) gives
Z Z
[aα , a†β ] = ᾱ(x)β(y)[Ψ̂(x), Ψ̂(y)† ] d3 x d3 y
Z Z
= ᾱ(x)β(y)δ(x − y) 1 d3 x d3 y (2.23)
Z
= ᾱ(x)β(y)1 d3 y = ⟨α|β⟩1.
2.2.3 Single- and two-particle operators

An n-particle Hamiltonian is typically of the form
n n
X 1 X (k,l)
H= h(k) + h
2
k=1 k̸=l=1
for a single-particle term h(k) (e.g. h(k) = Pk2 /(2m)) and an interaction term h(k,l) (e.g.
h(k,l) = V (xk − xl )). On Fock space, we have to sum over all possible particle numbers
n, so that, e.g., the single-particle term becomes
∞ X
M n
h(k) .
n=1 k=1
These formulas become much cleaner when expressed in terms of creation and annihilation
operators.
Indeed, choose a single-particle basis {|i⟩} and consider the expansion
X X
h= ⟨i|h|j⟩ |i⟩⟨j| = hij |i⟩⟨j|. (2.24)
ij ij
We claim that for both Bosons and Fermions, the following holds:
∞ X
n
hij a†i aj .
M X
h(k) = (2.25)
n=1 k=1 ij
In other words: We can formally move from single-body operators to many-body operators
replacing “ket’s by creation operators and bra’s by annihilation operators”.
This is not so surprising if we look at (2.24) in the right way. The bra ⟨j| is a linear
map from H(1) to the complex numbers, a space that we have since identified as
the “vacuum sector”. In this sense, ⟨j| maps the single-particle state |j⟩ to |vac⟩.
Dually, we can re-interpret the ket |i⟩ as a linear map C(1) → H(1) , (z) 7→ z|i⟩, or
|vac⟩ 7→ |i⟩. Thus, the familiar matrix element expansion (2.24) can be interpreted
as a superposition of processes that “destroy a particle in state |j⟩ and create one in
state |i⟩, weighted by the amplitude hij ”. From this point of view, (2.25) amounts to
the claim that the same description remains valid in higher particle number sectors.
To verify (2.25) start with the case where {|i⟩} is an eigenbasis of h. We have already
found in (2.9) that in this case, the occupation number basis diagonalizes the single-body
operator, so that
∞ X
M n X X
h(k) |n1 . . . ⟩ = λi ni |n1 , n2 , . . . ⟩ = λi a†i ai |n1 , n2 , . . . ⟩
n=1 k=1 i i
as claimed. The general case follows from the fact that, as remarked around (2.18), “cre-
ation operators transform like kets and annihilation operators like bras”: If {|αi ⟩} is an-
other single-particle basis, then inserting completeness relations and using (2.18) gives
⟨i|h|i⟩a†i ai = ⟨i|h|j⟩a†i aj
X X
(h is diagonal in {|i⟩}-basis)
i ij
⟨i|αk ⟩⟨αk |h|αl ⟩⟨αl |j⟩a†i aj

X
=
ijkl
X X
⟨i|αk ⟩a†i
X
= ⟨αk |h|αl ⟩ aj ⟨αl |j⟩
kl i j
X
= ⟨αk |h|αl ⟩a†αk aαl .
kl
Likewise, if h is a two-particle operator on H(1) ⊗ H(1) , then the symmetrized n-body

version is
n
1 X (k,l)
h ,
2
k̸=l=1
where the super-script denotes the two particles on which the operator acts non-trivially.
The factor 1/2 is there to avoid double-counting of (k, l) and (l, k). As above, one can
show that
∞ n
1 M X (k,l) 1X
h = hijrs a†i a†j as ar , hijrs = ⟨ij|h|rs⟩.
2 n=1 2 ijrs
k̸=l=1
Note that the indices s, r of the annihilation operators are reversed as compared to the
indices in the matrix element! This makes the sign come out right in the Fermionic case.
We omit the proof.
Some concrete operators

Let’s apply the framework developed above to some important examples, both in position
and in momentum representation.
Single-particle potential. The single-particle potential operator is

Z
U = U (x)|x⟩⟨x| d3 x.
We can directly read off the corresponding expressions in second quantization

Z
Ψ† (x)U (x)Ψ(x) d3 x
(“destroy a particle at x, multiply with potential at this point, re-create it”).

Its matrix elements in the Fourier basis are
Z
′
⟨k′ |U |k⟩ = (2π)−3 U (x)ei(k−k )x d3 x = (2π)−3/2 Ũ (k′ − k).
leading to
Z
(2π) −3/2
Ũ (k′ − k)a†k′ ak d3 k′ d3 k. (2.26)
Read that as: A potential term can change the momentum of particles. The amplitude
associated with a change of q = k′ − k is proportional to the Fourier transform Ũ (q) of
the potential.
If one works in a box of finite volume V = L3 , then (in the sense of App. A.1.9), the
expression becomes
1
Ũ (k′ − k)a†k′ ak d3 k′ d3 k.
X
√
V k,k′′ ∈ Z /(2πL)
3
Momentum and kinetic energy. In the Fourier basis, we directly get

Z Z
P = ℏ k |k⟩⟨k| d3 k 7→ ℏ k a†k ak d3 k,
P2 ℏ2 ℏ2
Z Z
= 2 3
∥k∥ |k⟩⟨k| d k 7→ ∥k∥2 a†k ak d3 k.
2m 2m 2m
In the sense of App. A.1.8, one can also express these in position basis:
Z
P 7→ −iℏ Ψ̂† (x) ∇ Ψ̂(x) d3 x,
P2 −ℏ2 ℏ2
Z Z
7→ Ψ̂† (x) ∇2 Ψ̂(x) d3 x = (∇Ψ̂(x)† )(∇Ψ̂(x)) d3 x.
2m 2m 2m
These expressions are very suggestive, but also easy to misinterpret. Keep in mind
that Ψ̂(x) = aδx is not a complex function on R3 , but rather a field of annihilation
operators for delta functions indexed by x. If you are confused, read the explanation
in App. A.1.8. If you are not confused, then you’re probably missing something
(confusion is the natural state at this point!), so you should really read App. A.1.8!
Chemical potential. The point of Fock space is that the particle number is variable. The
problem with Fock space is that the particle number is variable. Let’s say you want to find
the ground state of a gas (as we’ll do later). There will be some mechanism (walls of a
container, pressure exerted by other gases, ...) that controls at least the average number of
particles in the gas. We could explicitly describe this mechanism (sounds complicated),
or just follow the lead of the grand canonical ensemble of stat mech and add an effective
term −µN̂ that formally adjusts the energy carried by a particle, and then vary µ until the
ground state shows the right average particle number. The operator implementing this is
just
Z Z
Ψ̂(x) (−µ)Ψ̂(x) dx = a†k (−µ) ak dk.
†
Interaction potential. Now consider an interaction potential V (x1 , x2 ) = V (x1 − x2 )

that only depends on the relative position of two particles. The most prominent example
is, of course, the Coulomb potential. In position basis, the second quantized version is:
Z
1
V (x1 − x2 )Ψ† (x1 )Ψ† (x2 )Ψ(x2 )Ψ(x1 ) d3 x1 d3 x2 . (2.27)
2
The Fourier transform that turns (2.27) into its momentum representation is already
slightly annoying to perform. To guide us, let’s first guess the structure of the momentum
representation. Recall that potentials that are invariant under a simultaneous translation of
all particles conserve total momentum. The most general two-particle process compatible
with that conservation law is one that shifts the two momenta in a symmetric way, say by
±q. We thus expect an integral over terms
f (k1 , k2 , q) a†k1 +q a†k2 −q ak2 ak1
where the amplitude f (k1 , k2 , q) remains to be found. Comparison with (2.26) suggests
that f might be related to the Fourier transform of the potential. That turns out to be true:
Z 3
d x1 d3 x2 i(k1 −k1′ )x1 +i(k2 −k2′ )x2
⟨k1′ , k2′ |V |k1 , k2 ⟩ = e V (x1 − x2 )
(2π)3 (2π)3
Z 3
d x1 d3 x2 i(k1 −k1′ )x1 +i(k2 −k2′ )x2 d3 q iq(x1 −x2 )
Z
= 3 3
e e Ṽ (q)
(2π) (2π) (2π)3/2
d3 q
Z Z 3 Z 3
d x1 i(k1 −k1′ +q)x1 d x2 i(k2 −k2′ −q)x2
= Ṽ (q) e e
(2π)3/2 (2π)3 (2π)3
d3 q
Z
= (2π)−3/2 Ṽ (q) δ(k1 + q − k1′ )δ(k2 − q − k2′ ).
(2π)3/2
Therefore, the momentum representation of an interaction term is

Z
1 1
Ṽ (q) a†k1 +q a†k2 −q ak2 ak1 d3 k1 d3 k2 d3 q.
2 (2π)3/2
Summary
• Action of ladder operators on occupation number basis:

√
a†i | . . . , ni , . . . ⟩ = ni + 1(−1)ζ j<i ni | . . . , ni + 1, . . . ⟩.
P
• Commutation relations
[ai , a†j ]ζ = δij 1, [ai , aj ]ζ = [a†i , a†j ]ζ = 0.
• “Creation operators can be expanded like kets”:
⟨i|α⟩a†i
X
a†α =
i
• Annihilation field operators in position basis (not wave functions!):

t t
Ψ̂(t, x) := e− iℏ H ax e iℏ H .
• In second quantization, kets 7→ creation ops and bras 7→ annihilation ops.
2.3 Quasiparticles and collective excitations

The Bosonic Fock space ladder operators act on the occupation number basis in exactly
the same way as the ladder operators associated with quantum harmonic oscillators act
on their energy eigenbasis (compare e.g. Eq. (2.17) to Eq. (A.27)) In fact, other than the
way they have been constructed, there is no systematic way of distinguishing between
an n-dimensional harmonic oscillator and non-interacting Bosons with an n-dimensional
single-particle Hilbert space. Because any Hamiltonian that is quadratic in position and
momentum operators is equivalent to a collection of uncoupled Harmonic oscillators when
expressed in normal modes (Sec. A.2.2), such models are widely applicable. Excitations
arising this way are called quasiparticles or collective excitations.
Formally, one says that the two systems are unitarily equivalent. Define a linear map
U from L2 (Rn ) to FS (Cn ) by requiring that it sends an element |n1 , . . . ⟩L (R )
2 n
of the eigenbasis of n harmonic oscillators as constructed in (A.27) to the element

|n1 , . . . ⟩FS (C ) of the occupation number basis as constructed in (2.17). Then
n
U , mapping an ONB to an ONB, is unitary and one immediately verifies that

L2 (Rn ) † F (Cn )
U ai U = ai S .
The most elementary case are lattice vibrations, or phonons. Let’s have a look.
2.3.1 Phonons
Goals
The phonon Hamiltonian is conceptually easy to solve (by undergrad mechan-
ics tools), but has much to teach us! Here, phonons will serve as an example of
how Fock space describes collective excitations, rather than arising from a single-
particle space. We’ll also have the opportunity to recall normal mode expansions.
A continuum limit will later motivate rules for field quantization.
We consider N particles in one dimension whose interaction potential has a minimum

at distance a and goes to 0 for large distances. There is therefore an equilibrium configu-
ration where the particles are arranged in a linear chain, with the k-th particle at position
ka. Let Xk be the position of the k-th particle, measured relative to its equilibrium value.
Expanding the potential around the minimum to second order,
N 2
X P r κ
H= + (Xr − Xr+1 )2 . (2.28)
r=1
2m 2
We have to specify boundary conditions. If the chain is longer than the length scale of any
phenomenon we’ll be studying, boundary effects shouldn’t matter much (c.f. App. A.1.9).
We therefore opt for the mathematically simplest case: cyclic boundary conditions, i.e. we
assume that the indices of the operators in (2.28) only depend on r modulo N .
The chain Hamiltonian is quadratic in positions and momenta and can therefore be
diagonalized using canonical transformations (App. A.2.2). Working out the details is an
excellent exercise, so we only present the final result here.
For n = 1 . . . N and k = n 2π
L with L = N a the total length, define
r N r N
1 X −ikra 1 X ikra
ϕk = e Xr , πk = e Pr .
N r=1 N r=1
In the sense of App. A.2.2, the ϕk , πk correspond to complex normal coordinates associ-
ated with standing waves with quasi-momentum k. Then
r r r
1 mωk 1 κ
ak = √ ϕk + i π−k , ωk = 2| sin(ka/2)|
2 ℏ mℏω k m
define annihilation operators ([ak , a†k′ ] = δk,k′ ) that diagonalize the Hamiltonian
X 1 1
ℏωk a†k ak +
X X
H= πk π−k + 2κ sin2 (ka/2)ϕk ϕ−k = .
2m 2
k k k
The (Heisenberg picture) equations of motion iℏ∂t ak (t) = [ak , H] are then solved by
ak (t) = e−iωk t ak (0). For the original observables this means
r
ℏ X 1
Xr (t) = √ (ak e−iωk t+ikar + a†k eiωk t−ikar ),
Nm 2ω k
k
r r (2.29)
mℏ X ωk −iωk t+ikar † iωk t−ikar
Pr (t) = −i (ak e − ak e ).
N 2
k
In these expressions, we’ve grouped adjoint terms together, to emphasize that Xr is Her-
mitian. Sometimes it’s more advantageous to group terms by complex normal modes
instead:
r
ℏ X 1
ak (t) + a†−k (t) eikar ,

Xr (t) = √
Nm 2ωk
k
r r (2.30)
mℏ X ωk † ikar
Pr (t) = −i ak (t) − a−k (t) e .
N 2
k
Finally, note that every formula in this section equally applies to the classical case,
with the only exception that the Hamilton function reads (c.f. App. A.2.2):
X
H= ℏωk |ak |2 .
k
2.3.2 Global phase gauge symmetry and particle number conservation

A gauge symmetry is a mathematical redundancy in the description of physical objects.
In quantum mechanics, a global phase change |ψ⟩ 7→ eiϕ |ψ⟩ is such a redundancy. It is
implemented by the 1 × 1 unitary “matrix” (eiϕ ) ∈ U (1), and therefore called global U (1)
gauge symmetry.
When we constructed Fock space, we inadvertently got in tension with that symmetry!
That’s because multiplying all single-particle vectors by eiϕ means that a tensor product
of n such vectors changes by einϕ . In other words, if
X X †
N̂ = n̂i = ai ai
i i
is the total particle number operator, then U (1) acts on Fock space as eiϕN̂ . Thus, U (1)-
transformations induce relative phases between subspaces of different particle numbers.
These will change the expectation values of observables that do not commute with N̂ .
So can we observe global phase changes of single-particle states when working with
many-body systems?
For non-relativistic massive particles (i.e. the kind of systems treated in undergraduate
QM courses), the answer is “no”. Loosely speaking, we expect that in a “non-relativistic
theory” deserving of that name, massive particle cannot be created or destroyed. We should
then require that all physical observables commute with total particle number. The re-
quirement that all physical observables obey an extra symmetry (i.e. [A, N̂ ] = 0) is called
a superselection rule. In particular, because
a†eiϕ ψ = eiϕ a†ψ , aeiϕ ψ = e−iϕ aψ (2.31)
linear expressions in ladder operators are not directly observable in the presence of this
superselection rule.
The Fock space for phonons was not constructed starting from a single-particle Hilbert
space of a non-relativistic massive particle, so the argument does not apply in this case.
And indeed, the observable (2.29) corresponding to the displacement of the r-th particle
(clearly a measurable quantity, at least in principle) is a linear combination of ladder oper-
ators. Also, as we’ll see next, when the particle number tends to infinity, the physical and
mathematical definition of N̂ becomes iffy, which may lead to non-relativistic systems to
behave as if particle number conservation is violated.
Figure 2.2: The motion in a Newton cradle is determined by energy and momentum
conservation alone. (Figure adapted from Wikipedia.)
2.4 Bose gas: Take 1

Indistinguishable particles with interaction potential V are described by the Hamiltonian
Z −ℏ2 ∇2 Z
†
H = Ψ̂ (x) − µ Ψ̂(x) d x + Ψ̂† (x)Ψ̂† (y)V (x − y)Ψ̂(x)Ψ̂(y) d3 x d3 y.
3
2m
Simplify: Restrict the gas to a box of finite volume V (App. A.1.9); choose a “hard core
interaction potential” V (x − y) = U δ(x − y), with Fourier transform Ṽ (q) = V −1 U ;
2 2
k
switch to momentum representation; suppress vector notation; set ϵk = ℏ2m . Then
U X †
(ϵk − µ)a†k ak + ak+q a†k′ −q ak′ ak .
X
H= (2.32)
2V ′
k k,k ,q
This still is difficult to treat, so let’s get some intuition first, to guide our analysis.
Superfluidity
At very low temperature, Helium becomes superfluid: A particle slowly passing through
it does not experience friction. Here’s a way to think about that: Recall the Newton cradle
(Fig. 2.2), where one can uniquely determine the number of balls being excited merely
from energy and momentum conservation. Likewise, one may model the interaction be-
tween the particle and the gas as a scattering process, where the particle transfers energy
and momentum to the gas. Now imagine that the energy-momentum relations of the par-
ticle and the excitations of the gas are “out of tune” in the sense that there is no process
that would respect both conservation laws. In this case, no scattering is possible and one
would expect the particle to pass through the gas uninhibited.
With this model in mind, we set it as our goal to work out the energy-momentum
relation of the low-lying excitations of H.
Bose-Einstein condensation
Recall that for non-interacting Bosons (i.e. when V = 0), the ground state is achieved
when all particles are in the lowest-energy state of the single-particle term. It is plausible
(though a very difficult question to treat rigorously) that remnants of this behavior per-
sist for non-zero interaction V and for low-lying states. We will thus treat H under the
assumption that there is a finite density
n0 1
ρ= = ⟨a†0 a0 ⟩ (2.33)
V V
of particles occupying the k = 0 mode. To achieve this, we add a “chemical potential
term” −µN̂ to the Hamiltonian and will later adjust µ to achieve (2.33).
Figure 2.3: SSB. TBD.
2.4.1 Approximate solution part 1

Because for low-lying many-body states |ψ⟩, we expect the occupation number ⟨ψ|a†0 a0 |ψ⟩
of the single-particle ground state to be much larger than the ones for other modes, we ne-
glect all terms that are of third order or higher in creation/annihilation operators for k ̸= 0.
A lengthy but uneventful calculation leads to
U † 2U X †
(ϵk − µ)a†k ak + a0 a0 a†0 a0 + a0 a0 a†k ak
X
H=
2V V
k k̸=0
(2.34)
U X † †
+ (a0 a0 ak a−k + a0 a0 a†k a†−k ) + O(a3k ).
2V
k̸=0
To make further progress, we employ Bogoliubov’s c-number substitution: Replace the

√
operator √1V a0 with a complex number ρeiθ .
Wait, we do what? Why would that be justified? The minimal story goes like this:
In the limit V → ∞, the number of Bosons n0 in the k = 0-mode is expected to be
macroscopic n0 = ρV → ∞. Because we cannot physically resolve the number of Bosons
√ √
in the mode, a0 |n0 ⟩ = n0 |n0 − 1⟩ “behaves just like” n0 |n0 ⟩ with respect to any
measurement we can actually implement. So switching √ to a mathematical model where a0
is not a ladder operator at all, but rather equal to ρV 1 should give similar results.
Well, OK. That always seemed at most mildly convincing to me. To get a better feeling
for why this is a justified way of arguing, let’s take a detour and introduce a broader
framework for such phenomena, which are connected to spontaneous symmetry breaking.
If you are already fully convinced, or if “mildly convincing” is anyway all you
aim for at this moment, you can skip ahead to Sec. 2.6.
2.5 Detour: Spontaneous symmetry breaking

Broadly interpreted, the concept of spontaneous symmetry breaking (SSB) refers to any
situation where the solutions of a problem are less symmetric than the problem itself.
There are banal ways in which this can manifest (Fig. 2.3), but there’s also deep ones. In
the examples we’ll look at, the technical origin of the effect may be traced back to the
(vague, for now) principle
“One cannot implement operators that act on macroscopically many particles.” (2.35)
2.5.1 Ferromagnetism
The guiding phenomenlogical example is ferrogmagnetism. If cooled below its Curie tem-
perature, a ferromagnet develops a magnetic moment M ̸= 0. In the absence of external
fields, the moment M is equally likely to point into any direction. Thus, statistically, the
behavior is rotationally invariant. But every time the magnet is cooled down, it “sponta-
neously” singles out one direction in space, thereby “breaking the symmetry”.
The simplest case of a model exhibiting ferromagnetic behavior is the Ising model. It
involves N spin-1/2 particles – and in fact, we can learn a lot by looking at their Hilbert
space in the limit N → ∞, even before introducing the Hamiltonian.
Indeed, consider the two states (depending on the relative phase)
1
N
|ψ± ⟩ = √ (|↑⟩⊗N ± |↓⟩⊗N ).
2
The |ψ± N
⟩ are eigenvectors of σx⊗N with eigenvalue +1 and −1 respectively. Despite them
being orthogonal, I claim that as N gets macroscopic, the two states become effectively
indistinguishable.
To justify this outrageous claim, assume that just one of the macroscopically many par-
ticles is lost (as will always, realistically, be the case). Then any measurement effectively
takes place on the reduced density matrix
1 1
N
tr1 |ψ± N
⟩⟨ψ± |= (|↑⟩⟨↑|)⊗(N −1) + (|↓⟩⟨↓|)⊗(N −1) ,
2 2
which is a uniform mixture of |↑ . . . ⟩, |↓ . . . ⟩, and independent of the relative phase. In this
sense: The operator σx⊗N does not actually describe a physically realizable measurement
in the limit N → ∞.
Sometimes, it is beneficial to keep idealized mathematical objects around (like δ func-
tions) even if they are not directly physical. In this case, however, it turns out that we’ll
attain a cleaner understanding of ferromagnetism, superfluidity, and many other impor-
tant quantum many-body phenomena, if we commit to the principle (2.35) and declare
operators like σx⊗N to be unphysical in the limit N → ∞.
Let’s explore this further. Define H↑ to be the space of states that can be reached by
physical operations starting from |↑⟩⊗N and define H↓ analogously. For microscopic N ,
the two spaces are identical, but as N → ∞, they become orthogonal. A good way to see
this is to consider the average magnetization. For a state |ψ⟩, it is defined as
N
1 X
m := ⟨ψ|σz(k) |ψ⟩. (2.36)
N
k=1
The average magentization is +1 on |↑⟩⊗N and −1 on |↓⟩⊗N . If A is any physical operator,

then by (2.35), A|ψ⟩ differs from |ψ⟩ only on a microscopic number of spins. For N → ∞,
this doesn’t affect the average in (2.36), and we conclude that no quantum-mechanical
process can change m in that limit.
One consequence is that within each of these two physically separated Hilbert spaces,
PN (k)
the average magnetization operator N1 k=1 σz can be replace by a number, namely by
±1 respectively. (Spoiler alert: That’s the mechanism that will allow us to replace √aV0 by
a complex number for Bose-Einstein condensates.) √
√
We can still mathematically write down superpositions |ψ⟩ = p|ψ↑ ⟩+eiϕ 1 − p|ψ↓ ⟩
between vectors |ψ↑ ⟩ ∈ H↑ , |ψ↓ ⟩ ∈ H↓ , of the two disjoint spaces. But because for every
physical operation A, the matrix elements between them vahish ⟨ψ↑ |A|ψ↓ ⟩ = 0, these co-
herent superpositions cannot be experimentally distinguished from the incoherent mixture
ρ = p|ψ↑ ⟩⟨ψ↑ | + (1 − p)|ψ↓ ⟩⟨ψ↓ |.
That’s a generalization of the example we started with.

It’s time to have a look at the Hamiltonian of the Ising model:

X
H = −J σz(i) σz(j) , J > 0,
i,j
where the sum is over nearest neighbors. The summands give

(i) (j) −J |si sj ⟩ = |↑↑⟩, |↓↓⟩,
−Jσz σz |si sj ⟩ =
+J |si sj ⟩ = |↑↓⟩, |↓↑⟩.
We immediately see that the Hamiltonian is invariant under a simultaneous flip of all spins,
realized by the operator σx⊗N . Also, the ground state energy is −J times the number of
neighboring pairs. It is attained on the subspace with basis |↑⟩⊗N , |↓⟩⊗N , or, equivalently,
N
with basis the |ψ± ⟩.
Given the discussion above, it is now easy to see what happens. The spin flip symmetry
of the Hamiltonian is implemented by σx⊗N , which “breaks” in the sense that it becomes
N
unphysical for N → ∞. For microscopic N , the |ψ± ⟩ are pure ground states that are
invariant under the spin flip symmetry (up to phase). As N → ∞, they remain invariant,
but they become effectively mixed. In fact, any ground state α|↑⟩⊗N + β|↓⟩⊗N becomes
a mixture of the two non-symmetric ones |↑⟩⊗N , |↓⟩⊗N . We can now connect back to the
loose definition of “symmetry breaking” in the very beginning: The restriction on physical
observables in macroscopic systems means that there is no longer a pure ground state that
shares the symmetry of the Hamiltonian.
Further comments (not needed in the sequel)

• Because σx σz σx† = −σz , the average magnetization vanishes for every state (pos-
sibly mixed) that is spin flip invariant. An observable that “witnesses the lack of
symmetry” in this way is called an order parameter.
• In reality, there’s at least some tiny external fields around, so we should modify the
P (k)
Hamiltonian to read Hλ = H + λ k σz , where λ corresponds to the net external
field. The sign of λ lifts the degeneracy of the ground space. SSB is then witnessed
by the fact that when taking limits limλ→0 limN →∞ Hλ (in that order!), the resulting
ground state depends on whether λ approaches 0 from above or from below.
• In elementary QM, one often associates different Hilbert spaces with the same quan-
tum system, as a matter of convenience. For example, a single harmonic oscillator
can be described by the Hilbert space L2 (R) of square-integrable functions, or by
the Fock space F(C). These choices are equivalent: Every basis vector |n⟩ ∈ F(C)
can be mapped to a wave function (in terms of Gaussians and Hermite polynomials)
and this way, every set of expectation values realizable on one of the Hilbert spaces
can be reproduced on the other. In contrast, the fact that the average magnetization
takes on different values in the two Hilbert spaces constructed above, shows that the
representation of the observables of the Ising model on them are inequivalent.
2.5.2 SSB and Bose-Einstein condensation

We are now ready to argue that Bose-Einstein condensation of a macroscopic number of
particles leads to spontaneous symmetry breaking, this time of a continuous symmetry.
Consider a Bose gas contained in a box of volume V . Recall from (2.33) that we are
interested in states |ψ⟩ that have a fixed density ρ = n0 /V of particles in the k = 0-mode:
1 1 1
⟨ψ|a†0 a0 |ψ⟩ = ⟨ψ| √ a†0 √ a0 |ψ⟩ = ρ.

(2.37)
V V V
In the limit V → ∞, measuring the precise occupation number
⟨ψ|a†0 a0 |ψ⟩ = V ρ → ∞
would require us to count a macroscopic number of particles. Consistent with the principle
(2.35), we reject his as unphysical. The density, however, should be measurable. Hence
we posit that an observable is physical only if it can be expressed in terms of the re-scaled
ladder operators
1 1
√ a0 , √ a†0 (2.38)
V V
as well as the ak , a†k for k ̸= 0 (with coefficients that do not depend on V , of course).
This seemingly minor restriction has dramatic effects in the limit V → ∞. Indeed,
h 1 1 i 1
√ a0 , √ a†0 = → 0, (2.39)
V V V
so that in the thermodynamic limit, the operators (2.38) commute! But then all physical
observables commute with √1V a0 (why?). This operator therefore plays the same role as
the average magnetization in the Ising model: Its eigenspaces are physically separated
in the sense that relative phases between them are not observable and no vector can be
mapped from one eigenspace to another. If √aV0 |ψ⟩ = λ|ψ⟩ then (2.37) implies that λ =
√ iθ
ρe for some θ ∈ [0, 2π)
Thus, we may always assume that the dynamics takes place in one of the eigenspaces
n a0 √ o
Hθ = |ψ⟩ √ |ψ⟩ = ρeiθ |ψ⟩ ,
V
√
where √aV0 acts like ρeiθ . This is what we set out to justify.
U (1) symmetry breaking

Just like spin flip symmetry before, there are unphysical operations that do connect differ-
ent eigenspaces. This role is played by the U (1) symmetry eiϕN̂ (Sec. 2.3.2). It “breaks”
in the V → ∞ limit, because it involves the diverging total particle number operator N̂ .
Mathematically, however, it holds that
eiϕN̂ Hθ = Hθ+ϕ .
That’s because annihilation operators transform as
e−iϕN̂ aα eiϕN̂ = eiϕ aα (2.40)
(the positive phase gets applied to one more particle than the negative one). Some conse-
quences:
By (2.40), the expectation value ⟨ √aV0 ⟩ vanishes in any state that is U (1) invariant. The
operator aV0 therefore constitutes an order parameter. Now comes a big difference to the
Ising example. On Fock space for massive non-relativistic particles, we have a second
condition for an observable to be physical: In addition to fulfilling (2.35), observables
also have to be gauge invariant (Sec. 2.3.2). Hence in this case, the order parameter is
not measurable (unlike the average magnetization, which is the central physical quantity
associated with the Ising magnet). It also means that the physical behavior of the Bose gas
can only depend on ρ, not on θ, so we are free to restrict to the case θ = 0 below.
We found that a state |ψ⟩ is pure with respect to the physical observables only if it is
contained in one of the Hθ spaces. But then, it isn’t U (1)-invariant. Only the mixed state
Z
dθ
ρ = eiϕN̂ |ψ⟩⟨ψ|e−iϕN̂
2π
is. We’re again encountering the dichotomy that states are symmetric or pure, but not both.
2.6 Bose gas: Take 2

Back to the Bose gas. We (reasonably) assume that each low-lying state has a non-zero
density ρ = n0 /V of particles occupying the single-body ground state. The value of ρ will
be determined momentarily. For now, following the discussion on SSB, we just make the
√
substitution √aV0 7→ ρ. Then (2.34) becomes
U 2 X Uρ X
ϵk − µ + 2U ρ a†k ak + (ak a−k + a†k a†−k ).

H ≃V − µρ + ρ +
2 2
k̸=0 k̸=0
The effect of ρ on the energy will, in the limit V → ∞, be dominated by the first
term, which is the only one proportional to V . Thus, low-lying states will have a density ρ
minimizing that term. Setting its derivative to zero gives the relation ρ = µ/U . We keep ρ
and eliminate µ, to get
Uρ X
ϵk + U ρ a†k ak + (ak a−k + a†k a†−k ).
X
H =const. + (2.41)
2
k̸=0 k̸=0
This is a quadratic expression in ladder operators, so we know from general principles

(Sec. A.2.2) that it can be diagonalized using a canonical transformation. Let’s find it in
two (and a half) easy steps!
The result of the following “2½ easy steps” is summarized in (2.46). In principle,
one can just check directly that the form of H given there is indeed equal to
(2.41). Below, we only describe a somewhat natural thought process that leads
to (2.46). If you’re in a hurry, skip ahead.
Step 1: Decouple. Start with the rightmost term. It creates / destroys pairs of particles
of opposite momentum. This suggests switching the basis of the single-particle space to
one that consists of superpositions of states moving in opposite directions. Remembering
that | ± k⟩ are represented in position space by complex exponentials that are each other’s
conjugates, the cosine / sine basis
1 −i
√ (|k⟩ + | − k⟩), √ (|k⟩ − | − k⟩). (2.42)
2 2
seems like a natural candidate. Let’s agree that a vector k is positive if its first non-zero
component is. Then there is exactly one positive wave vector in every pair +k, −k. For
k > 0, define the annihilation operators
1
bk = √ (ak + a−k ) (“positive k label the cosines”)
2
i
b−k = √ (ak − a−k ) (“negative k label the sines”)
2
associated with the new basis. Inverting,
1 1
ak = √ (bk − ib−k ), a−k = √ (bk + ib−k ) k > 0. (2.43)
2 2
Plugging in, the pair term decouples, as hoped:
X Uρ
ϵk + U ρ b†k bk + (bk bk + b†k b†k ) .

H = const. +
2
k̸=0
Step 2: Solve harmonic oscillator. It turns out that each summand represents a har-
monic oscillator and that a simple re-scaling of position and momentum coordinates will
put it into standard form. To see how this works, we switch to Hermitian operators for the
moment:
1 −i
X = √ (bk + b†k ), P = √ (bk − b†k ).
2 2
Abbreviating A = ϵk + U ρ, B = U ρ one directly finds
B A−B 2 A+B 2
A b†k bk + (bk bk + b†k b†k ) = P + X . (2.44)
2 2 2
Well, we know how to solve these using undergrad methods (App. A.2.1)! The transfor-
mation
r r
4 A + B A−B p
X̃ = X, P̃ = 4 P, E k = A2 − B 2
A−B A+B
is obviously canonical, [X̃, P̃ ] = [X, P ], and puts the oscillator into standard form:
1 1p r A − B r
A + B 2
2 2 2
Ek (X̃ + P̃ ) = (A + B)(A − B) P + X = (2.44).
2 2 A+B A−B
Therefore, setting b̃k = √1 (X̃ + iP̃ ), we have diagonalized H (that wasn’t too hard ,):
2
q
Ek b̃†k b̃k ,
X
H = const. + Ek = ϵ2k + ϵk 2U ρ. (2.45)
k̸=0
Step 2.5: Cleanup. Because E−k = Ek , the Hamiltonian is degenerate and any
unitary transformation within the ±k-subspaces will leave its form invariant. Choosing
1 1
ck := √ (b̃k + b̃−k ) for k > 0, ck := √ (b̃k − b̃−k ) for k < 0
2 2
turns out to lead to the cleanest theory. Plugging in all the nested definitions gives
r r r r
1 ϵk Ek 1 ϵk Ek
ck = uk ak − vk a†−k , uk = + , vk = − ,
2 Ek ϵk 2 Ek ϵk
q
Ek c†k ck .
X
Ek = ϵ2k + ϵk 2U ρ, H = const. +
k̸=0
(2.46)
The coefficients lie on the unit hyperbola:

1 ϵk Ek 1 ϵk Ek
u2k − vk2 = +2+ − −2+ =1
4 Ek ϵk 4 Ek ϵk
which implies (exercise) that the inverse transformation is
ak = uk ck + vk c†−k . (2.47)
Discussion
We have found that the elementary excitations of the Bose gas are given by quasi-particles
created by the c†k . The ground state is the quasi-particle vacuum characterized by
ck |0⟩(q) = 0 ∀k.
It is not to be confused with the particle vacuum |0⟩(p) characterized by ak |0⟩(p) = 0! For
example, using (2.47), the expected number of particles with momentum k in the quasi-
particle vacuum is
⟨0|(q) a†k ak |0⟩(q) = ⟨0|(q) (uk c†k + vk c−k )(uk ck + vk c†−k )|0⟩(q) = vk2 .
While the above shows that quasi-particle occupation number states | . . . nk . . . ⟩(q) do
not have definite particle numbers, it turns out that they do have definite momentum! In
the exercise, you will show that c†k creates quasi-particles with momentum ℏk. Thus, E(k)
found in (2.45) describes their energy-momentum (or dispersion) relation. Compared to a
free particle, E(k) involves the additional term ϵk 2U ρ. It dominates if
r
ℏ2 ∥k∥2 ∥ℏk∥ Uρ
ϵk = ≪ 2U ρ ⇔ ≪ =: c,
2m m m
i.e. for velocities much smaller than c. In this regime, we have Ek ≃ c∥ℏk∥, that is, energy
scales linearly with momentum. Beyond that, Ek is convex (“bends upwards”, Fig. 2.4),
so that Ek ≥ c∥ℏk∥ holds in general.
As alluded to in the very beginning, this means that a particle moving through the Bose
gas at low velocity cannot slow down by transferring energy and momentum to a quasi-
particle. Quantitatively: Let M be the mass of the test particle and p its initial momentum.
Assume it excites a quasi-particle of momentum q. Then energy conservation demands
∥p∥2 ∥p − q∥2 pq ∥q∥2

∥p∥ ∥q∥ ∥p∥
0= − −Eq = − − −Eq ≤ −c∥q∥ = −c ∥q∥
2M 2M M 2M M M
which has a solution only if the test particle has initial velocity ∥p∥/M at least c.
Figure 2.4: Blue line: Dispersion relation E(∥k∥) for the Bose gas. Orange line: E =
cℏ∥k∥ is a good approximation for small ∥k∥, and a lower bound for all k. The x-axis is
in units of mc/ℏ, y-axis in units of mc2 .
2.7 Further reading

A good presentation of many-body theory is Advanced Quantum Mechanics by Schw-
abl, which also covers the Bose gas. Spontaneous symmetry breaking is a complex phe-
nomenon that can be approached from many points of view that might feel quite different.
I enjoy the presentations by Strocchi (Elements of Quantum Mechanics in Infinite Systems
and Symmetry Breaking), but they might be a little too mathematical for the average taste.
A more phenomenological approach in the language of path integrals is in Chapter 6 of
Condensed Matter Field Theory by Alexander Altland (of Cologne) and Ben Simons.
Chapter 3
Field quantization and quantum theory

of light
Our goal is to construct a quantum theory for the EM field. Since quantum mechanics
is more fundamental than classical physics, one cannot hope to derive a quantum theory
from its classical limit. “Quantization” thus always involves educated guesses.
To educate ourselves, we’ll first have another look at lattice vibrations (Sec. 2.3.1). For
both their classical and their quantum model, one can easily construct a continuum limit.
The result is a classical and a quantum field theory. Their relation will serve as a template
for quantizing other fields.
3.1 Phonon continuum limit

Recall our treatment of N coupled particles arranged in a line of length L (Sec. 2.3.1). For
phenomena that have length scales much larger than the equilibrium spacing a = L/N ,
the behavior of the model should not depend on the precise value of a. (Try to infer the
lattice spacing from listening to the sound of a string instrument...). More precisely, the
family of models with parameters
1 1
N (λ) = λN, m(λ) = m, a(λ) = a, κ(λ) = λκ,
λ λ
for λ ∈ N should all behave similarly (Fig. ??). It thus make sense to investigate the limit
λ → ∞.
Quantities that do not depend on λqinclude the total length L = N a, the mass den-
κa
sity ρ = m/a, and the velocity c := ρ . Asymptotically, also the dispersion relation
becomes independent:
r r
(λ) κ 2 κ −1
ωk = 2 λ | sin(kaλ−1 /2)| → 2 λλ |ka/2| = c|k|.
m m
Recall the formula (2.29) for the displacement of the r-th particle in terms of the nor-
mal coordinates
r
ℏ X 1
Xr (t) = √ (ak e−iωk t+ikar + a†k eiωk t−ikar ).
Nm 2ω k
k
63
CHAPTER 3. FIELD QUANTIZATION AND QUANTUM THEORY OF LIGHT 64
Let’s rewrite it in a form suitable for our limit. The product N m is just the total mass,
invariantly expressed as Lρ. Also, it makes sense to label the particles not by their index
r = 1, . . . N , but by their equilibrium position x = ra ∈ [0, L]. With these substitutions,
we obtain the “displacement field”
s
ℏ X 1
ϕ(t, x) = √ (ak e−iωk t+ikx + a†k eiωk t−ikx ). (3.1)
Lρ 2ωk
k
The continuum model is now defined as an infinite collection of harmonic oscillators

L Z with Hamiltonian
indexed by k ∈ 2π
ℏc|k| a†k ak + const.

X
H= (3.2)
k
and associated displacement field ϕ(t, x) given by (3.1).
There’s some trouble brewing in (3.2): The “constant” is k 12 ℏc|k|, which di-
P
verges. This is the first of the many infinities of quantum field theory. This one is
easy to deal with: For finite N , the sum over the ground state energies of the har-
monic oscillators is finite. Subtracting this constant from the total energy does not
alter physical predictions, so as long as we do not dynamically change the ground
state energy (e.g. by putting stress on the material in a way that affects the equilib-
rium separationa) or get into
thePrealm of general relativity. Thus, the renormaliza-
tion k ℏc|k| a†k ak + 21 7→ k ℏc|k| a†k ak , while maybe not very principled,
P
does not affect predictions and makes the continuum limit converge. So let’s adopt
this convention. (We’ll encounter more troubling infinities later).
As in Sec. 2.3.1 and App. A.2.1, the definitions so far make sense equally in classical
and in quantum mechanics. In QM, the ak ’s are annihilation operators that are taken to
act on Fock space with occupation number basis | . . . nk . . . ⟩. Classically, the ak ’s are
complex numbers and (3.1) is the most general real-valued solution of the wave equation
1
2 2
∂ − ∂x ϕ(t, x) = 0 (3.3)
c2 t
under cyclic boundary conditions.
We went through this exercise in order to find a strategy for quantizing Maxwell’s
equations. The relation between the classical and the quantum continuum model found
here suggests the following recipe for quantizing classical wave equations:
Summary
• Consider a classical wave equation whose solutions are of the form

1
ak e−iωk t fk (x) + a†k eiωk t fk (x)†
X
ϕ(t, x) = N √
k
2ωk
for some set of modes {fk (x)}k , a constant N and ak ∈ C.
• Choose normalization such that H = k ℏωk a†k ak is the energy of the field.
P
• The quantized field is obtained by associating an oscillator with every mode

and replacing the complex coefficients ak by annihilation operators acting
on a Bosonic or Fermionic Fock space.
Fields for which this program can be implemented are called free. We’ll only work
with free field in this course. General, interacting fields, are treated in the QFT courses.
How to decide whether to use Fermionic or Bosonic Fock spaces will be a major topic in
Chap. 4.
Further comments
It is also of interest to write down a momentum field π(x) which describes the continuum
limit of the Pr . Because the mass of the individual particles goes to 0 for λ → ∞, only
the momentum density defines an interesting quantity in the limit. Thus, starting from
r r
1 1 ℏm X ωk
Pr = −i (ak eikar − a†k e−ikar ),
a a N 2
k
and arguing as above, we get for the momentum density field

r r
ℏρ X ωk
π(x) = −i (ak eikx − a†k e−ikx ).
L 2
k
In the continuum limit, the commutation relation (or iℏ times the Poisson bracket) between
the displacement and the momentum density fields is
−iℏ X ′ ′
[ϕ(x), π(y)] = [(ak eikx + a†k e−ikx ), (ak′ eik y − a†k′ e−ik y )]
2L ′
k,k
iℏ X ikx−k′ y iℏ X ik(x−y)
= e [ak , a†k′ ] = e = iℏδ(x − y).
L L 2π
′
k,k k∈ L Z
3.2 Quantization of the EM field

Classical electrodynamics can be described either in terms of E- and B-fields, or in terms
of scalar and vector potential Φ, A such that
B = ∇ × A, E = −∇Φ − ∂t A. (3.4)
The classical Hamilton function

1
H= P − qA)2 + qΦ
2m
for a charged particle is expressed in terms of the potential. This suggests that Φ, A, rather
than E, B, are the right fields to base a quantum theory on.
However, this immediately leads to a problem: Φ, A are determined by the physical
state of the EM field only up to gauge transformations
A 7→ A + ∇χ, Φ 7→ Φ − ∂t χ
with an arbitrary function χ. Here, we get rid of the ambiguity by adopting the Coulomb
gauge, fixed by the gauge condition
∇ · A(t, x) = 0. (3.5)
Further, we restrict to the free-space version of Maxwell’s equation, i.e. we assume that
there are no charges or currents ρ = j = 0. In this case, the Maxwell equations become
1
Φ(t, x) = 0, 2
∂t2 − ∂x2 − ∂y2 − ∂z2 A(t, x) = 0. (3.6)
c
In a box with side length L and cyclic boundary conditions, the space of complex
solutions to Eq. (3.6) is spanned by plane waves of the form
2π 3
Ak e±iωk t+ikx , A k ∈ C3 , k∈ Z, ωk := c∥k∥.
L
The gauge condition (3.5) requires the coefficients Ak to be “transversal” to the wave
vector k:
0 = ∇ · Ak e±iωk t+ikx = ik · Ak e±iωk t+ikx ⇔ k · Ak = 0.

We can take this into account by choosing, for each k, an ortho-normal basis (the polar-
ization vectors)
e1 (k), e2 (k) ⊂ {k}⊥ ⊂ R3

with eλ (−k) = eλ (k)
for the space orthogonal to k (Fig. ??). Then a general real-valued solution to the Maxwell
equations in Coulomb gauge is
r
ℏ X 1
eλ (k) akλ e−iωk t+ikx + a†kλ e+iωk t−ikx ,

A(t, x) = 3
√ (3.7)
ϵ0 L 2ωk
k,λ
where the sum is over wave vectors k ∈ 2π L Z and polarization directions λ ∈ {1, 2}. As
3
discussed before for phonons (Eq. (2.30), it is often convenient to re-arrange the sum in
(3.7) so that terms corresponding to the same complex mode are grouped together:
r
ℏ X 1
eλ (k) akλ (t) + a†−kλ (t) eikx ,

A(t, x) = √ (3.8)
ϵ0 L3 2ω k
k,λ
k
The time evolution of the E and B-fields follows by applying (3.4). Setting κ = ∥k∥ ,
r r
ℏ X ωk
eλ (k) akλ e−iωk t+ikx − a†kλ e+iωk t−ikx

E(t, x) = i (3.9)
ϵ0 L3 2
k,λ
r r
ℏ X ωk
eλ (k) akλ (t) − a†−kλ (t) eikx ,

=i (3.10)
ϵ0 L3 2
k,λ
r r
ℏ X ωk
κ × eλ (k) akλ e−iωk t+ikx − a†kλ e+iωk t−ikx , (3.11)

B(t, x) = i
ϵ0 L3 c2 2
k,λ
r r
ℏ ωk
κ × eλ (k) akλ (t) + a†−kλ (t) eikx .
X
=i 3 2
(3.12)
ϵ0 L c 2
k,λ
Plugging these expressions into the formula

Z
ϵ0
Hem = E 2 (t, x) + c2 B 2 (t, x) d3 x
2
for the energy of the EM field, one finds after some calculations
X
Hem = ℏωk |akλ |2 .
k,λ
The A-field is thus of the form discussed in Sec. 3.1 so that one can perform a free-field
quantization. From now on, we will thus treat the akλ ’s as annihilation operators for a
collection of harmonic oscillators acting on the Fock space Hem .
Notation
For increased legibility, we’ll now write k for (k, λ), with the convention that −k corre-
sponds to (−k, λ). Also, for an element | . . . nk . . . ⟩ of the occupation number basis of
the harmonic oscillators, write |{n}⟩.
3.3 States of the EM field

3.3.1 Number states
Elements |{n}⟩ of the occupation number basis – i.e. states with a definite number of
photons in each mode – are called number states or Fock states. The expected electric field
strength in any number state is
⟨{n}|(ak eikx − a†k e−ikx )|{n}⟩ = 0.

X
⟨{n}|E(x)|{n}⟩ ∝
k
Zero on average does not imply zero with probability one. Indeed, compute the variance:
⟨{n}|E(x) · E(x)|{n}⟩
√
−ℏ X ωk ωk′ ′ ′
ek · ek′ ⟨{n}| ak eikx − a†k e−ikx ak′ eik x − a†k′ e−ik x |{n}⟩

= 3
ϵ0 L ′
2
k,k
ℏ X X ℏωk
= 3
ωk ⟨{n}|ak eikx a†k e−ikx + a†k e−ikx ak eikx |{n}⟩ = (nk + 1/2),
2ϵ0 L ϵ0 L3
k k
which diverges. Another infinity!

The infinity encountered in (3.2) was easy to dismiss, as it related to an unobservable
choice of energy zero point. This one is a somewhat tougher nut to crack, because electric
field strength (proportional to the force exerted on a charged body) has direct physical
consequences. One can argue as follows: Any test particle used to measure the field
strength will have finite extent, so it cannot be concentrated on just one point x in space.
If we replace the point-sized probe by one with a charge density ρ(x), then one can show
(excercise!) that the spatially averaged force
Z
F = ρ(x)E(x) d3 x
has finite fluctuations, if ρ is sufficiently spread out. This is physically plausible. The
sum diverges because there are infinitely many summands with increasingly large wave
vector k. But these correspond to fields that oscillate rapidly, so that cancellations over
any finite region cause the net force to be small. Mathematically speaking, we found again
Figure 3.1: Net force is zero, so QFT would presumably be OK with it. (Scene from the
Caucasian Chalk Circle, as depicted on in this poster).
(c.f. Sec. 2.2.2) that field operators should be thought of as distributions that have to be
integrated against smooth functions to be meaningful.
Is this a satisfactory solution?
Yes, in that it gives a good reason for why extended bodies don’t regularly get acceler-
ated into orbit due to vacuum fluctuations. No, because it paints quite the violent picture
of the microscopic world, where, supposedly, unbounded forces constantly tear at objects
and only cancellations prevent mayhem (Fig. 3.1). It sure feels like an indication that our
current theories of light and matter become invalid at very short length scales.
3.3.2 Coherent states

Number states have zero expected field strength. Since we expect classical electrodyndam-
ics to emerge as a limiting case, there should be states for which the expectation values
⟨E(x, t)⟩ resemble the classical behavior.
To construct these, recall the coherent states of a single harmonnic oscillator. For
α ∈ C, define
∞
2 X αn
|α⟩ = e−|α| /2
√ |n⟩.
n=0 n!
Coherent states are eigenvectors of the annihilation operator

∞ ∞ ′
2 X αn √ 2 X αn +1 √ ′
a|α⟩ = e−|α| /2
√ n|n − 1⟩ = e−|α| /2 p n + 1|n′ ⟩ = α |α⟩.
n=1 n! n′ =0
(n′ + 1)!
A coherent state |{α}⟩ of the entire EM field is one where each mode k ≡ (k, λ) is in
a coherent state |αk ⟩. Let’s compute the expectation value of the E-field:
r
2ℏπ X √
⟨{α}|Ê(x, t)|{α}⟩ = −i ωk ek ⟨{α}|(a†k e−ikx+ωk t − ak eikx−ωk t )|{α}⟩
L3
k
r
2ℏπ √
ωk ek αk† e−ikx+ωk t − αk eikx−ωk t
X
= −i 3
L
k
which is indeed the classical value (3.9).

3.4 Light-matter interaction

The Hamiltonian of a single spinless particle with charge q, position and momentum oper-
ators X, P , subject to a field in Coulomb gauge is
2
P − qA(X)
ℏωk a†k ak .
X
H= + U (X) + (3.13)
2m
k
It acts on a total Hilbert space H = Hpar ⊗ Hem that is the tensor product between the
spaces of the particle Hpar = L2 (R3 ) and of the field Hem . Here, A(X) is defined by
(3.7), where the ladder operators ak , a†k act on Hem , but the parameter x is evaluated on
the position of the particle. In other words
x ∈ R3 , |ψ⟩ ∈ Hem .

A(X) |x⟩|ψ⟩ = |x⟩ A(x)|ψ⟩ (3.14)
We will now go through a sequence of simplifications and transformations. Start with
1 2 P2 q q2
P − qA(X) = − (P A(X) + A(X)P ) + A(X)2 .
2m 2m 2m 2m
As a first step, we will neglect the square A(X)2 , which describes two-photon processes.
Next, verify that in Coulomb gauge, momentum and the vector potential commute:

P A(X)|ϕ⟩ = −iℏ∇ A(X)|ϕ⟩ = −iℏ(∇ · A(X))|ϕ⟩ + A(X) · P |ϕ⟩ = A(X) · P |ϕ⟩,
so that we can write H ≃ Hpar + Hem + HI , with
P2 q
ℏωk a†k ak ,
X
Hpar = + U (X), Hem = HI = − P · A(X).
2m m
k
So far, we have worked in a “mixed picture”, where the EM field was expressed in
second quantization, but only a single particle in first quantization was present. We now
also pass to the second-quantized picture for the particle. To this end, let {|ϕi ⟩}i be an
eigenbasis of Hpart and denote the corresponding creation operators as b†i , so that
Ei b†i bi .
X
Hpart =
i
It remains to treat the interaction Hamiltonian. Even without doing any calculations, we
can see from (3.8) that HI will be of the form
gijk (ak + a†−k )b†i bj .

X
ijk
It thus describes a superposition of processes where a photon is removed from or added to

the field, while the state of the particle gets switched. Let’s calculate the amplitudes:
q X
ϕi HI ϕj b†i bj = − ϕi P · A ϕj b†i bj
X
ij
m ij
r Z
q ℏ X 1
ϕ†i (x) √ (ak + a†−k )e−ikx ek · P ϕj (x) d3 x b†i bj (3.15)

=− 3
m ϵ0 L 2ωk
ijk
so that
r Z
q ℏ
gijk =− ϕ†i (x)e−ikx ek · P ϕj (x) d3 x.
m ϵ0 L3 2ωk
The wave lengths associated with atomic transitions are much longer than the length scales
of the atoms themselves. This justifies the dipole approximation, in which the dependen-
cies of the EM field on position is neglected by substituting eixk ≃ 1. Then
r Z
q ℏ
gijk ≃ − ϕ†i (x)ek · P ϕj (x) d3 x.
m ϵ0 L3 2ωk
In the expression, the momentum operator acts energy eigenfunctions in position represen-
tation. One can eliminate momentum using
iℏ m
[X, Hpart ] = P ⇒ P = [X, Hpart ],
m iℏ
so that the coupling constants become
r r
−q ℏ 1
gijk = ϕ i (ek · P ) ϕ j ⟩ = iq (Ej − Ei ) ϕi (ek · X) ϕj ⟩.
m ϵ0 L3 2ωk ϵ0 L3 2ℏωk
(3.16)
Because this expression is symmetric under inversion of k, the minus sign of a†−k in (3.15)
can be dropped, so that
ϕi HI ϕj b†i bj = gijk (ak + a†k )b†i bj .

X X
(3.17)
ij ijk
3.4.1 Spontaneous emission

The goal is to compute the life time of the first excited state n = 2 of a hydrogen atom. We
will employ first-order time-dependent perturbation theory in the form of Fermi’s Golden
Rule (Sec. A.3.1), which says that HI will cause an initial state |i⟩ to decay at a rate
Z
2π
Γ≃ |⟨f |HI |i⟩|2 δ(Ei − Ef )ρ(f ) df. (3.18)
ℏ
Here, |i⟩ = |ϕ2,l,m ⟩|0⟩ (we’ll choose l and m later). The delta function ensures that total
energy is conserved. Because the EM field is already in its lowest-energy state, only final
states where the atom has transitioned into its ground state and has emitted photons are
permitted. Because, by Eq. (3.17), HI is linear in ladder operators, the coupling matrix
element is non-zero only for final states that contain a single photon: |f ⟩ = |ϕ1,0,0 ⟩|k⟩,
where k = (k, λ) labels the state of the emitted photon. (This is an artifact of the approxi-
mations we have made – multiple-photon processes are, in principle, possible).
The energy difference between the two lowest levels (the Lyman-α line) is (Sec. A.2.3)
1 3 3α2 2
E1,2 := 1 − EI = EI = mc .
4 4 8
The photon energy is ℏωk = ℏc∥k∥ and energy conservation is thus equivalent to ∥k∥ =
E1,2
ℏc .
It follows that the integral in (3.18) is over states labeled by f = (k, λ), where k lies
E1,2
on a sphere of radius ℏc . For fixed λ, the density of states in k-space is ρ(k)d3 k =
L 3 3

2π d k. Switching to spherical coordinates,
3 3 3
E2

L L L
ρ(k) d3 k = d3 k = r2 dr sin θ dϕ dθ = dE sin θ dϕ dθ.
2π 2π 2π ℏ3 c3
Using (3.16, 3.17), the coupling constant for ℏωk = E1,2 is
e2 E1,2 2
|⟨ϕ2,l,m |⟨0|HI |ϕ1,0,0 ⟩|k⟩|2 = ϕ2,l,m (ek · X) ϕ1,0,0 ⟩ .
2ϵ0 L3
To evaluate the matrix element, we need to borrow some results on atomic eigenstates.
Four facts: (F1) The dipole matrix elements ⟨ϕ2,l,m |e · X|ϕ1,0,0 ⟩ are non-zero only
if l = 1. (F2) ⟨ϕ2,l,0 |x|ϕ1,0,0 ⟩ = ⟨ϕ2,l,0 |y|ϕ1,0,0 ⟩ = 0. (F3) States that differ only
in the magnetic quantum number m can be mapped onto each other by a rotation.
(F4) Using the explicit form of the functions ϕn,l,m (x), a tedious integral gives
215 2 ℏ2 215
|⟨ϕ2,1,0 |z|ϕ1,0,0 ⟩|2 = a0 = .
310 m2 c2 310 α2
Fact (F1) implies that in first-order perturbation theory, the states |ϕ2,l,0 ⟩ have infinite
life time unless l = 1, i.e. only the 2p → 1s transition can be computed in this approx-
imation. By (F3), m can be changed by rotating the atom. But the life time of a level is
independent of the atom’s orientation and hence of m. We trust that our approximations
reproduce this rotational invariance (they do), and compute Γ only for m = 0:
3
e2 X
Z
2π 2 L E1,2
Γ= ϕ2,1,0 (ek · X) ϕ1,0,0 ⟩ sin θ dϕ dθ.
ℏ 2ϵ0 L3 2π ℏc
λ
Then by (F2, F4), only the z-component of ek · X = eλ (k) · X gives a non-zero contri-
bution, namely
X 2 215 2 X
ϕ2,1,0 (eλ (k) · X) ϕ1,0,0 ⟩ = a (eλ (k))2z .
310 0
λ λ
k
To evaluate the sum, note that with e0 (k) := ∥k∥ , the set {eλ (k)}2λ=0 forms on ortho-
normal basis. Expressing the length-squared of ez in that basis gets us
2
X 2
X 2
X
1= 2
|eλ (k) · ez | = cos θ + 2
(eλ (k))2z ⇒ (eλ (k))2z = sin2 θ.
λ=0 λ=1 λ=1
Using the identity sin3 θ dθ = − sin2 θ d(cos θ) = (z 2 − 1) dz, the integration results in
Z 2π Z π Z 1
4
sin3 θ dθ dϕ = 2π (z 2 − 1) dz = 2π .
ϕ=0 θ=0 −1 3
To express all quantities in relativistic units, eliminate e2 in favor of the fine structure
2
constant α = 4πϵe0 ℏc . Now brew some coffee, close the door, and plug in:
3
2π α4πϵ0 ℏc ℏ2 215 L 3α2 mc2

4
Γ= 2π (don’t think, just copy)
ℏ 2ϵ0 L3 m2 c2 310 α2 2π 8ℏc 3
= 217 8−3 3−8 π 0 L0 ϵ00 α5 ℏ−1 m1 c2 (sort by units)
8
2 mc2
= α5 = 6.27 × 108 Hz = 1/(1.6 ns) (yeah, go ahead and click).
3 ℏ
Amazingly, given the number of approximations made, this is the accepted value [Radzig,
Smirnov, Reference Data on Atoms, Molecules, and Ions, Table 7.4].
3.5 Further reading

For field quantization, see Photons and Atoms by Cohen-Tannoudji, Dupont-Roc, and
Grynberg. Matter-light interaction follows Quantum Optics by Walls and Milburn and
Advanced Quantum Mechanics by Sakurai (who uses Heaviside-Lorentz units instead of
SI units employed by the other authors – consult Wikipedia to convert).
Chapter 4
Relativistic quantum mechanics
In this chapter, we assume the use of coordinates for which ℏ = c = 1.

The Schrödinger equation
3
1 X 2
i∂t + ∂i − V (x) ψ(t, x) = 0
2m i=1
treats time and space very differently. It is indeed not relativistically invariant. Therefore,
in the late 1920s, it become a popular past-time to come up with new wave equations with
the goal of finding a relativistic quantum theory of single particles.
The results of this program proved to be important, but not for the reasons their creators
intended. It turns out that the very idea of constructing a quantum theory of a single
relativistic particle leads to conceptual problems. These include the difficulty to define
a position operator that doesn’t lead to super-luminal signaling, and a Hamiltonian that
doesn’t exhibit states of unbounded negative energies. (See also Fig. ?? for a heuristic
argument that suggests that we shouldn’t be surprised by these issues).
Let’s take a closer look at the first problem. If we accept that there is no reasonable
position operator associated with a relativistic particle, we immediately face the next chal-
lenge: To avoid faster-than-light influences, physical interactions are local. For example,
in Eq. (3.14), the coupling term between a particle and the EM field was given by the
field A evaluated at the position of the particle (represented quantum-mechanically by its
position operator X). So, having thrown out the position operator, how does one couple
relativistic theories?
Fortunately, we already know one realtivistic quantum theory: electro-magnetism. So
we can just check how things work there and copy them. And indeed, in Chapter 3, we
did not associate a notion of position with a photon. Instead, locality was incorporated
by specifying a collection of fields (E, B, . . . ) that represent the properties of the system
that are measurable at any given point (t, x) in space-time. Citing [Quantum Field Theory
Lectures of Sidney Coleman] “If we know where the observations are, we don’t have to
know where the particles are.”
This (radical!) shift of perspective also works for relativistic particles. We forget about
“electron-the-particle” as a fundamental object, and instead look for an “electron field”
that, as for electro-magnetism, generates at every point in space-time the observables for
those “electron-properties” that are locally measurable. Just like phonons and photons
before, particles get re-introduced as excitations of the field Hamiltonian.
Now that we decided that we actually want to construct quantum field theories, it would
sure be helpful to have a few relativistic wave equations lying around to which we could
73
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 74
apply the free-field quantization procedure that served as well before. It is for this purpose
that the single-particle equations we previously tossed out as unsatisfactory get a second
lease on life! Re-interpreted as “classical” fields, their quantized versions give satisfactory
theories of relativistic particles.
One more conceptual point before we’ll look at the details: Just like the phonon field
(3.1) and the fields associated with photons (3.9 – 3.12), field operators for massive parti-
cles will also be linear combinations of creation and annihilation operators. The fact that
the theory now contains processes that change the number of massive particles is to be
expected in the relativistic regime. That is, unless the particles are charged! Unlike parti-
cle number N , the total charge Q seems to be conserved by all physical interactions. One
would get around this problem if every time a particle with charge q is created, another
one that is identical except for having charge −q got destroyed.
It may feel like we’re just making things up at this point. Not so! Nature has indeed
solved the problem of charge conservation in quantum fields by introducing such anti-
particles. And remember the negative energies we complained about before? It turns out
that upon free-field quantization, the wave function that would have negative energies if
interpreted as quantum states, become positive-energy modes of the quantum field, and are
associated with anti-particles.
Now for the details.
4.1 Special relativity recap

Here, we quickly recall some notions from special relativity. We won’t have time to explain
much, so you might want to keep your favorite SRT textbook handy.
4.1.1 Space-time symmetries

Empirically, the laws of physics seem independent of the place where, the time when, or
the orientation in which experiments are being performed.
Let’s discuss these symmetries more precisely. Fix a time reference and an inertial
orthonormal coordinate system. Points in space-time (events) are then labeled by vectors
x ∈ R4 with components xµ for µ = 0, 1, 2, 3 where x0 = t and (x1 , x2 , x3 ) = x are the
spatial coordinates.
There are at least two physical ways a transformation g : x 7→ x′ of the coordinates
describing an event may arise. It could be that the event itself is moved by g (“active”
interpretation); or that the reference points defining the coordinate system are moved by
g −1 (“pasive” interpretation). Our discussion here applies to both points of view equally.
Let’s go through the important transformations of space-time.
Translations of space and time. Any vector a ∈ R4 defines a translation
Ta : x 7→ x + a.
Rotations. For a point n ∈ R3 on the unit sphere and an angle θ, let R(n, θ) be the
3 × 3 matrix rotating points about n by θ. For example:
 
cos θ − sin θ 0
R(ez , θ) =  sin θ cos θ 0 .
0 0 1
Figure 4.1: Lorentz boosts in 1+1 dimensions. Left: There are two time-like world-
lines through the origin (dotted lines). Lorentz boosts have to leave them invariant. This
is achieved in particular by maps that are diagonal in a basis (bluearrows) of time-like
vectors. Let’s thus consider the linear map B(α) that multiplies 11 by eα and on −1 1

cosh(α) sinh(α)
by e−α (red arrows). In the t-x-basis, B(α) = sinh(α) cosh(α) . Middle: The vertical
world-line represents a particle at rest. The slanted lines are its images under B(α) for
sinh(α)
various values of α. They represent uniform motion with velocity β = ∆x ∆t = cosh(α) =
tanh(α) ∈ (−1, 1) (in units of c). We can also track the image of one event (e.g. (1, 0),
red dot) under B(α). It traces out the branch of a hyperbola with the asymptotes given by
the time-like lines. Right: In general, the orbit of a vector under all boosts B(α) forms the
branch of a hyperbola. Hyperbolas are thus to Lorentz boosts what circles are to rotations.
Rotations act on R4 leaving time invariant:
1 0t

ΛR = .
0 R
Translations and rotations work the same in non-relativistic and in relativistic physics.
More interesting are boosts, which add a constant velocity. Non-relativistically, these are
implemented by Galileo boosts
 
1 0 0 0
v 1 1 0 0

t t
Gv = v 2 0 1 0
 : →
7 .
x x + tv
v3 0 0 1
Under Galileo boosts, relative velocities add up linearly
Gv Gw = Gv+w
and can, in particular, take arbitrarily large values. Relativistic theories are not invariant
under these transformations, as they violate the central axioms of special relativity: There
is no motion faster than the speed of light, and objects traveling at the speed of light do so
in every coordinate system.
These axioms imply that a change of relative motion has to be described by Lorentz
boosts. We won’t derive them here (see e.g. [Sexl-Urbantke]), but see Fig. 4.1 for some
motivation. A Lorentz boost along the ex -axis by a velocity β ∈ (−1, 1) (in units of the
speed of light) is implemented by

 
cosh(α) sinh(α) 0 0
 sinh(α) cosh(α) 0 0
Λβez =   0
, with α = tanh−1 (β) the rapidity.
0 1 0
0 0 0 1
1
Boosts along other directions are defined analogously. The term cosh(α) = √ on the
1−β 2
diagonal is the Lorentz factor γ.
Finally, we need two discrete symmetries: time reversal T and the parity transfor-
mation P , given by
   
−1 0 0 0 1 0 0 0
0 1 0 0 0 −1 0 0
T = , P = .
0 0 1 0 0 0 −1 0
0 0 0 1 0 0 0 −1
Some terminology (quite a lot, unfortunately, but necessary):
• The group generated by rotations, Lorentz boosts, time reversal and parity is the
Lorentz group O(1, 3).
• A Lorentz group element Λ is called called special (or proper) if det Λ = 1 and or-
thochronous if sign(Λ0 0 ) > 0. The special orthochronous Lorentz group is denoted
by SO+ (1, 3) and generated by rotations and Lorentz boosts.
• The group generated by the Lorentz group and space-time translations is the Poin-
caré group.
• Non-relativistic case: The analogue of the Poincaré group with Galileo boosts in-
stead of Lorentz boosts is the Galileo group.
4.1.2 Minkowski geometry

Let’s phrase the analysis of Fig. 4.1 more geometrically and in 3+1 dimensions. Any
straight line in R4 is of the form

s 1
xµ (s) = s ∈ R, with tangent vectors v µ = ∂s xµ (s) = .
sv + a v
Thus xµ (s) represents a word-line traversing space-time slower than / at / faster than the
speed of light depending on whether 12 − ∥v∥2 is positive / zero / negative. To express this
geometrically, define the Minkowski inner product as
 
1 0 0 0
0 −1 0 0
⟨u, v⟩ = uT ηv, η= 0 0 −1 0  .

0 0 0 −1
Then 12 − ∥v∥2 = v T ηv, i.e. the character of the curve xµ (t) is given by the “squared
Minkowski length” of its tangents.
A 4 × 4-matrix Λ leaves the Minkowski form invariant if
(Λu)T η(Λv) = uT ηv ∀ u, v ∈ R4 which is equivalent to ΛT ηΛ = η. (4.1)

One can check directly that the elements of the Lorentz group leave the Minkowski form
invariant. The converse is also true: only Lorentz transformations have this property. This
explains the notation O(1, 3) for the Lorentz group: It is the group of isometries of the
symmetric bilinear form with 1 negative and 3 positive elements on the main diagonal,
generalizing the notion of O(n), the symmetry group of the standard Euclidean form.
4.1.3 Transformation behavior of vectors and fields

We have introduced space-time symmetries via their action g : x 7→ x′ = Λx + a on
events. There are more general objects that can be displaced.
Tangent vectors u = ∂s x(s) transform only under the linear part of the Poincaré
group:
u 7→ u′ = ∂s x′ (s) = ∂s Λx(s) + a = Λ ∂s x(s) = Λu.

(4.2)
This is called the contravariant transformation law. We will always use superscripts to
label the components uµ of contravariant vectors. Geometrically, contravariant vectors
describe directions in space-time.
As an (important) example, take x(τ ) to be the world-line of a point particle with rest
mass m, parameterized by its proper time τ . Then p = m∂τ x(τ ) is its four momentum
pµ = (E, p), which, being proportional to a tangent vector, is therefore contravariant. The
squared Minkowski length of the four momentum is equal to the squared rest mass:
⟨p, p⟩ = E 2 − ∥p∥2 = m2 . (4.3)
Scalar functions: The action of a space-time symmetry g on a scalar functions ϕ :
R4 → R is ϕ(·) 7→ ϕ′ (·) = ϕ g−1 (·) (Fig ??).
Linear functionals: As an example of a scalar function, consider an affine function
ϕ(x) = k T x + c, which describes e.g. the phase of a plane wave. Then
ϕ′ (x) = k T Λ−1 (x − a) + c = (Λ−T k)T (x − a) + c,
where Λ−T is the transpose-inverse of Λ. The vector describing the linear part behaves as
k 7→ k ′ = Λ−T k. (4.4)
This is the covariant transformation law. We will always use subscripts for the compo-
nents kµ of covariant vectors. Geometrically, covariant vectors describe linear functionals.
By construction, the inner product between a co- and a contravariant vector is invariant
(Λ−T k)T Λu = k T Λ−1 Λu = k T u.

Gradient fields: Associated with every scalar function ϕ(x) is its gradient field u(x)
with components uµ (x) = ∂µ ϕ(x). A symmetry g will adjust the argument as for a scalar
field, and the components as for a covariant vector:
u′ (x) = Λ−T u g −1 (x) .

Wait. Didn’t we say that covariant vectors describe functionals? What do gradients have to
do with functionals? It turns out that geometrically, the right way to think about a gradient
∂µ ϕ is as a linear functional that maps a direction v µ to the directional derivative v µ ∂µ ϕ.
Contravariant vector fields assign a direction to every point
in space-time. An exam-
ple is the electro-magnetic potential Aµ (x) = ϕ(x), A(x) for which one can show the
transformation law
A′ (x) = Λ A g −1 (x) .

4.1.4 Ricci calculus

For calculations, we’ll use the Ricci calculus. It’s based on the following conventions:
Write uµ to refer either to (depending on context):
• the µ-th component of a contravariant vector u ∈ R4 (here, we think of µ as a
specific number from {0, 1, 2, 3}); or
• the entire vector u ∈ R4 (thinking of µ is a variable).
More generally, we’ll use this dictionary:
contravariant vector u uµ
covariant vector k kµ
in particular: the gradient ∇ ∂µ
linear map (matrix) M M µν
metric η ηµν
inverse metric η −1 η µν
where M µ ν , ηµν , and η µν are the matrix elements of M, η, and η −1 respectively. (In our
case, η = η −1 , but we keep the distinction, as the calculus also work with more general
metrics).
We’ll employ the Einstein summation convention: indices that appear repeatedly are
summed over (or contracted). Thus:
evaluation of functional k(u) = k T u kµ uµ
Minkowski inner product ⟨u, v⟩ = uT ηv uµ ηµν v ν
matrix-vector multiplication u = M v uµ = M µ ν v ν .
Recall how covariant vectors describe linear functionals? That’s how you can remember /
derive the placement of indices. The inner product consumes two vectors and generates a
number – it thus has two covariant indices. A linear map takes one vector (one covariant
index) and outputs another (one contravariant index).
In QM, we associated with every (“ket”) vector |ψ⟩ a linear functional that projects
onto it – the “bra” ⟨ψ| : |ϕ⟩ 7→ ⟨ψ|ϕ⟩. There’s an analogous construction in Minkowski
space. With every vector u, associate the “Minkowski projection” v 7→ ⟨u, v⟩. In coordi-
nates: uµ 7→ uµ ηµν v µ , i.e. the Minkowski projection is described by the covariant vector
uν := uµ ηµν . For graphically obvious reason, the procedure that sends a vector to the
projection onto it is called lowering the index. The inverse map is achieved by contract-
ing with the inverse metric uµ = uν η νµ , thereby raising the index. (Mathematicians, for
what’s passing as humor, sometimes use the symbols “♭” and “♯” in this context, alluding
to their original meaning of, uh, lowering or raising the pitch of a musical note).
The lowering / raising procedure can be applied not just to vectors. As an important
example, consider a Lorentz transformation Λµ ν . Then Λµ ν are the components of ηΛη −1 .
Re-arranging (4.1) we find
ΛT ηΛ = η ⇒ ηΛη −1 = Λ−T ,
the matrix under which covariant vectors transform! Thus:
Minkowski projection / lowering index ♭ : u 7→ ⟨u, ·⟩ uν = uµ ηµν
raising index ♯ : ⟨u, ·⟩ 7→ u uµ = uν η νµ
Minkowski inner product revisited ⟨u, v⟩ = uT ηv uµ ηµν v ν = uµ vν = uµ v ν
inverse-transpose Lorentz trans. Λ−T = ηΛη −1 ηµα Λα β η βν = Λµ ν
contravariant transformation ũ = Λu ũµ = Λµ ν uν
covariant transformation k̃ = Λ−T k k̃µ = Λµ ν kν .
Occasionally, we will use Roman letters as indices, as in xi . It is understood that

Roman letters take values in 1, 2, 3, whereas Greek ones take values in 0, 1, 2, 3.
4.2 The Klein-Gordon Equation

The first attempt at guessing a relativistic version of Schrödinger’s equation led to the
Klein-Gordon equation. As you’ll see, Klein and Gordon didn’t win any originality points
for it: It’s really the first thing anyone would try. While the KG equation does describe
some physics (pions, for one), we will use it here only as a warm-up for the real prize, the
much more sophisticated Dirac equation.
The treatment will be heuristic – we’ll be making informed guesses just as people did
in the 1920s. For a more systematic approach, see Sec. 5.
The energy and momentum of a free non-relativistic point particle are related by
3
1 X 2
E= p . (4.5)
2m i=1 i
In QM, the momentum pi is associated with the operators −i∂i . Energy E is associated
with the Hamiltonian H, but by virtue of the Schrödinger equation i∂t ψ = Hψ, we can
also take E = i∂t . Thus, we arrive at the correspondence principle (justified in Sec. ??)
pµ = (E, p) → (i∂t , −i∇) = i∂ µ .
Applying this substitution to (4.5) and letting it act on a wave function ψ gives
3
1 X 2
i∂t + ∂ ψ = 0,
2m i=1 xi
the Schrödinger equation for a free particle! Nice.

So let’s apply the same recipe to the relativistic energy-momentum relation (c.f. (4.3))
p
E = m2 + ∥p∥2 .
We could apply the correspondence principle to this expression and take the square root
of the resulting operator (as in Sec. A.1.10). However, the result would no longer be a
differential operator (i.e. a polynomial in xi ’s and ∂xi ’s). To maintain a close analogy to
the Schrödinger equation, we’d prefer to look for a differential equation, which we can
achieve by first squaring the relation
3
X
E2 − p2i − m2 = 0
i=1
and then using the correspondence principle, to get

X
− ∂t2 + ∂x2i − m2 ψ = 0 ⇔ ∂ µ ∂µ + m2 ψ = 0,

(4.6)
i
This is the Klein-Gordon equation.

The KG equation is invariant under the Poincare group in the sense that if ψ(x) solves
(4.6) then so does ψ ′ (x) = ψ(g −1 (x)). That’s because the gradient ∂µ transforms covari-
antly and thus the contraction ∂ µ ∂µ is invariant (c.f. Sec. 4.1.3).
The space of solutions is spanned by plane waves

µ p
ψp± (xµ ) = e−ipµ x with pµ = (±Ep , p), Ep = m2 + ∥p∥2 , (4.7)
indexed by momenta p ∈ R3 .
Warning: As a consequence of the energy operator being E = i∂t , negative ener-

gies are associated with positive frequencies ψp+ (xµ ) = e−iEp t+ix·p .
There are two problems with the KG-equation:

• The solutions ψp± in (4.7) can have both positive and negative energy
i∂t ψp± = ±Ep ψp±
of arbitrary magnitude. This means particles could emit an unbounded amount of

energy by transitioning into negative energy states. This, uhum, is not observed.
• If we try to interpret |ψ(t, x)|2 as the probability density of finding the particle at
space-time point (t, x), then the KG equation has solutions that propagate faster
than the speed of light. That is unacceptable for a relativistic theory.
4.2.1 Superluminal solutions to the Klein-Gordon equation

Here’s a proof that the KG equation exhibits faster-than-light propagation of signals.1
We consider a particle that is initially localized at the origin, ψ(0, x) = δ(x), and is a
superposition only of positive-energy solutions. By (4.7),
Z Z
ψ(t, x) = (2π)−3 ψp+ (t, x) d3 p = (2π)−3 e−itEp +ix·p d3 p. (4.8)
is a solution of the KG equation. It fulfills our chosen boundary conditions:

Z
ψ(0, x) = (2π)−3 eix·p d3 p = δ(x).
The energy depends non-linearly on p, but at least it does so in a rotationally-invariant way,

so it makes sense to rewrite the integral in spherical coordinates. With x = ∥x∥, p = ∥p∥,
Z ∞ Z π Z 2π
ψ(t, x) = (2π)−3 dp e−itEp p2 dθ sin θdϕ eipx cos θ
0 0 0
∞ θ=π
−eipx cos θ
Z
= (2π)−2 dp e−itEp p2
0 ipx θ=0
Z ∞
−2 p ipx
−itEp
e − e−ipx

= (2π) dp e
ix
Z0 ∞ √ 2 2
−i
= dp p e−it m +p +ipx . (4.9)
(2π)2 x −∞
1 The argument follows the near-identical presentations in the quantum field theory books by Coleman and
by Lancastar-Blundell. Unfortunately, as far as I can tell, their presentation contains a mistake. Using the
notation below: They claim that the integral over the “large arcs” gives no contribution without introducing the ϵ-
regularization that we need. I can’t see how this holds. In particular, Lancaster-Blundell’s invocation of Jordan’s
Lemma does not seem give a finite bound. If anyone can explain to me why their argument is actually correct,
I’m game.
Mathematica can’t solve that integral – never a good sign. The integrand is oscillating
and the non-linear dependency of the phase on the integration variable p means that one
cannot tell by inspection whether cancellations will cause the integral to vanish for x > t,
as needed to avoid superluminal signaling.
The matter can be decided by an exercise in contour integration. Let’s recap some
basics. √Introduce polar coordinates z = ρeiϕ , ϕ ∈ (−π, π] in the complex plane. The
√
choice z = ρeiϕ/2 fixes a sign for the square root. It introduces a discontinuity on the
non-positive real axis R≤0 , where eiϕ/2 changes from from +i to −i as we cross from the
upper to the lower half-plane. This is called a branch cut of the square root. (While we’re
free to change its position, it is easy to see that a discontinuity cannot be avoided).
The branch cut on the non-positive

real axis is visible in the imaginary
√
part of the principal square root z.
On C \ R≤0 , the square root is complex-differentiable, i.e. an analytic function.

Now consider the function
Iϵ (p) := p e−itEp +ipx e−ϵEp
for ϵ > 0 and with p = u+iv now a complex argument. The integrand in (4.9) is recovered
as limϵ→0 Iϵ (p) for real p. The rays ±i(µ + R≥0 ) are mapped to the branch cuts of the
square root. Excluding those values, Iϵ is again an analytic function.
The branch cuts of Iϵ (p) are shown

p in red.
The imaginary part of the root m2 + p2 is
positive / negative in the blue / green regions.
We went through all the trouble because the integral of an analytic function along a
closed contour in the complex plane vanishes.
We will apply this to the contour shown

to the left. The original integral in (4.9)
is recovered as
−i
Z
lim+ lim Iϵ (p) dp.
ϵ→0 R→∞ (2π)2 x γ1
It must be equal to negative the integral

over the rest.
We now assume x > t (i.e. we evaluate ψ at space-like distance from the origin). Then
|Iϵ (p)| vanishespexponentially on the arcs γ2 , γ6 as R → ∞. To see this, use that on the
arcs, limR→∞ m2 + p2 = ±p so that we get for the real part of the exponent
lim Re(itEp + ipx − ϵEp ) = −(x ± t) Im p − ϵ| Re p| ≤ − min(x − t, ϵ)R.

R→∞
Thus, the integrals over γ2 , γ6 do not contribute. Neither does the integral over γ4 , because
the length of γ4 goes to 0 as R → ∞, while Iϵ (p) remains bounded along that path.
Thus the only non-zero contributions come from γ3 and γ5 :
Z Z ∞ √
2 2
lim+ lim Iϵ (p) dp = ∓ ive±t v −m −vx d(iv).
ϵ→0 R→∞ γ3 /γ5 m
Adding up these two cases, we obtain in total

Z ∞
i −vx
p
ψ(t, x) = 2
ve sinh t v 2 − m2 dv (valid for space-like (t, x)).
2π x m
In this formulation, the integrand is non-negative, which implies that ψ(t, x) ̸= 0 for all
x > t. Not what we were hoping for in a relativistic theory.
4.2.2 The Klein-Gordon field

So we ran into trouble interpreting the KG equation as the description of a single relativis-
tic particle. As outlined to in the introduction, a radical re-interpretation of its solution as
a field to be quantized leads to a satisfactory theory that does have applications in funda-
mental physics.
We thus perform the free field quantization procedure on the KG equation. There’s one
issue: Hilbert spaces are complex, and so we treated the KG wave functions as complex.
But we only know how to quantize real fields. We’ll return to the issue in Sec. 4.2.4. For
now, let’s just restrict to real-valued solutions
1
ak e−iωk t+ikx + a†k eiωk t−ikx ,
X p
ϕ(t, x) = N √ ωk = m2 + ∥k∥2 ,
k
2ωk
which become a free quantum field upon re-interpreting the ak ’s as annihilation operators.
Should we use Bosonic or Fermionic Fock space? Are we even free to chose? We’ll show
how to use locality considerations to settle that matter in the next section.
Let’s apply some cosmetic changes: (1) The normalization constant N is only mean-
ingful
√ for concrete physical applications. We get the √ cleanest formula if we set it to
1/ V (i.e. we’re now measuring the field in units of V /N ). (2) A “box with side
length L” is√notPLorentz-invariant. As per (A.22), we take the limit L → ∞ and replace
by the integral (2π)−3/2 over all of R3 . (3) Introduce the 4-vector
R
the sum 1/ V
p = (Ep , p) = ℏ(ωk , k) = (ωk , k). This yields a more manifestly relativistic formula:
d3 p
Z
1 µ µ p
ϕ(x) = p ap e−ipµ x + a†p eipµ x , Ep = m2 + ∥p∥2 .
2Ep (2π)3/2
As for any free field, the Hamiltonian is

Z
H = Ep a†p ap d3 p.
In particular, no unbounded negative energies exist in the field interpretation. We’re one
problem down!
4.2.3 Microcausality
What about the second problem, those pesky superluminal particles? A field theory doesn’t
directly make statements about “positions of particles”. But the relativistic ban on signals
propagating faster than light does have implications for fields.
To see what the right consistency condition is in this case, consider two space-like
separated regions A, B and two observables Â, B̂ that can be expressed in terms of the
field at points in A and B respectively. Relativity demands that any interaction performed
in region A must not influence measurements taking place in region B. This is the case if
and only if [Â, B̂] = 0, a condition known as microcausality.
How can we ensure this condition? Well, assume the fields satisfy
⟨x − y, x − y⟩ < 0 ⇒ [ϕ(x), ϕ(y)]ζ = 0 (4.10)
for either ζ = + (commutators) or ζ = − (anti-commutators). In the first case, the field
operators themselves fulfill the microcausality condition, which therefore also holds for
general observables built from them.
Let’s look at the second, trickier, case. Because two minuses make a plus, operators
Â, B̂ in space-like separated regions now commute if they are polynomials of even degree
in the fields. If we declare that only operators of this type are physically observable, then
the anti-commutator case, too, gives rise to a causal theory.
Here are some general comments that apply whenever the field operators are linear
in creation/annihilation operators – which is certainly the case when they arise from our
quantization rule:
• Unsurprisingly, the commutator version of (4.10) can only hold if the ladder opera-
tors act on a Bosonic Fock space, and the anti-commutator case only on a Fermionic
Fock space.
• It thus follows that if the field operators are directly observable, the field must be
Bosonic. This explains why we used a Bosonic space for the EM field (because the
E-field is observable).
• An even-degree polynomial in the field operators can be expanded as an even-degree
polynomial in ladder operators. A product of an even number of ladder operators
can change N only by an even amount. In other words, they leave the parity (−1)N
invariant. In this way, the constraint on physical Fermionic observables can be ex-
pressed as a superselection rule in the sense of Sec. 2.3.2): Physical observables
have to commute with the Fermion parity operator.
• It seems wasteful to introduce Fermionic quantum fields, only to declare that they
are unobservable, and it is certain polynomial expressions in them that are actually
physical. But there’s precedent to the idea that seemingly superfluous constructs
make a theory easier, e.g. the vector potential in classical electrodynamics. One
can formulate the theory in such a way that the primary objects are the physical
observables rather than unobservable fields. This program goes by the name of
algebraic quantum field theory. While conceptually clean, this perspective comes
with its own difficulties and has remained somewhat niche.
Let’s now apply these general considerations to the Klein-Gordon case. To this end,
define the annihilation / creation parts of the field as
Z 3 Z 3
+ 1 −ipµ xµ d p − 1 † ipµ xµ d p
ϕ (x) = p ap e , ϕ (x) = ap e .
(2π)3/2
p
2Ep 2Ep (2π)3/2
The use of the “−” sign to denote the part of the field that comes with creation oper-
ators and has a positive exponent is counter-intuitive (or, in the words of renowned
Harvard professor Sidney Coleman “completely bananas”), but it’s established no-
tation stemming from the fact that in the (now abandoned) single-particle wave
function interpretation, ϕ− corresponds to negative energy solutions.
We don’t yet know whether the KG field describes Bosons or Fermions and will there-
fore treat both cases. The commutator
Z
1 µ ′ µ
[ϕ+ (x), ϕ− (y)]ζ = 3
p e−ipµ x +ip µ y [ap , a†p′ ]ζ d3 p d3 p′
(2π) 2 Ep Ep′
1 −ipµ (xµ −yµ ) d3 p
Z
= e =: ∆+ (x − y)
2Ep (2π)3
is important enough to have a symbol, ∆+ . It is related via

i
∂t ∆+ (t, x) = − ψ(t, x)
2
to the integral we just spent two painful pages to get information on. Unfortunately, we
found that the r.h.s. does not vanish for space-like arguments, so ∆+ cannot vanish either.
Microcausality now hinges on whether cancellations cause either the Bosonic
[ϕ(x), ϕ(y)]− = [ϕ+ (x), ϕ− (y)]− − [ϕ+ (y), ϕ− (x)]− = ∆+ (x − y) − ∆+ (y − x)

= 2 Im ∆+ (x − y)
of the Fermionic
[ϕ(x), ϕ(y)]+ = [ϕ+ (x), ϕ− (y)]+ + [ϕ+ (y), ϕ− (x)]+ = ∆+ (x − y) + ∆+ (y − x)

= 2 Re ∆+ (x − y)
commutators to be zero. It turns out that this works in the Bosonic, but not in the Fermionic
case. Indeed, (4.8, 4.9) give, for space-like x,
Z ∞
i 1
ve−vx sinh t v 2 − m2 dv ∈ R.
p
∂t ∆+ (t, x) = − ψ(t, x) = − 2
2 (2π) 2x m
Z −ip·x
e
∆+ (0, x) = d3 p ∈ R (because p 7→ −p conjugates the integrand).
2(2π)3 Ep
Integrating along the (space-like) path (τ, x), τ ∈ [0, t], we find that ∆+ is real for space-
like arguments.
Causality thus implies that the KG field describes Bosons!
4.2.4 Anti-particles
Unlike particle number, charge seems to be conserved in every interaction. There is thus
one more superselection rule we have to accommodate: Physical observables have to com-
mute with the charge operator
Z
X
(l) (l)
† (l) 3
Q= ql N , N = a(l)
p ap d p
l
where the sum is over all fields, with ql the charge carried by particles of type l and N (l)
the total number operator for particles of this type.
What are the consequences of the charge superselection rule? Adding / removing
charged particles obviously does not conserve Q. Therefore, any expression that is lin-
ear in ladder operators does not commute with Q and thus cannot represent a physical
observable. (For example, because the E-field is observable and linear in ladder opera-
tors, a photon cannot be charged. OK, we probably knew that before).
Just like the Fermion parity superselection rule, conservation of charge therefore forces
us to build physical observables from polynomials in the fields. But this time, it is far less
obvious how to write down polynomials in the field operators that commute with Q.
Nature, fantastically, solves this by associating with each particle an anti-particle of
opposite charge so that, whenever a particle is created, an anti-particle is destroyed.
Here are the details. We start with two KG fields, ϕ(i) (x), i = 1, 2. They are indepen-
dent, with Hilbert space H = H(1) ⊗ H(2) , and Hamiltonian H = H (1) + H (2) . Combine
them to form a complex field that has one of the fields as its real and one as its imaginary
part2 :
ϕ(1) + iϕ(2)
ϕ(x) = √ . (4.11)
2
To analyze the complex field, introduce new annihilation operators
(1) (2) (1) (2)
ap + iap ap − iap
ap := √ , bp := √ .
2 2
This corresponds to a unitary basis change in the single-particle subspace of the joint
Hilbert space, similar to the basis change from the cosine/sine’s to complex exponentials
used in (2.43). The complex field takes on the form
d3 p
Z
1 µ µ
ϕ(x) = p ap e−ipµ x + b†p eipµ x . (4.12)
2Ep (2π)3/2
So far, we have just engaged in formal manipulations. Things become physically

meaningful if we assign a charge of +q to the particles created by the a†p and a nega-
tive charge to the particles created by the b†p . In other words, we define the charge operator
to be
Z
a†p ap − b†p bp d3 p.

Q := q
It’s customary to say that the a†p ’s create particles and the b†p ’s anti-particles (though that’s
an arbitrary assignment, as we haven’t put enough physics into the model to distinguish
one).
With this definition, the complex field (4.12) becomes a superposition of processes
destroying particles or creating anti-particles. Since both processes involve changing the
charge by −q, it is not surprising to find
Z
[Q, ap ] = q [a†p′ ap′ , ap ] d3 p = −qap , [Q, b†p ] = −qb†p ⇒ [Q, ϕ(x)] = −qϕ(x)
2 This is common terminology, though “non-hermitian field”, “hermitian part” and “anti-hermitian part”
would be more precise.

and taking adjoints,

[Q, ϕ† (x)] = [ϕ(x), Q]† = qϕ† (x).
The complex field operator itself is still not compatible with charge conservation, but
the commutation relations are now simple enough that it is easy to construct expressions
in ϕ(x) that are. In particular, the “absolute value squared” of the field satisfies
[Q, ϕ(x)ϕ† (x)] = [Q, ϕ(x)]ϕ† (x) + ϕ(x)[Q, ϕ(x)† ] = −qϕ(x)ϕ† (x) + qϕ(x)ϕ† (x) = 0.
More generally, any expression in the complex field operators and their adjoints that is
invariant under the transformations ϕ(x) 7→ eiα ϕ(x) will commute with Q. These are the
physical observables associated with charged particles.
Final comments:
• There is no rigorous argument (known to me) that would describe a sense in which
the anti-particle trick is the only natural way to implement charge conservation in a
quantum field theory. But there seems to be a general feeling that anti-particles are
essentially required for this. (C.f. Weinberg’s The Quantum Theory of Fields, Vol. I,
Chapter 5).
• Remember how we treated only the real KG field, because we didn’t know how to
quantize complex fields? Equation (4.11) suggests how to handle this case: Quantize
the real and the imaginary part of the field separately, then combine as in (4.11).
Verifying how this fits into the Lagrangian formulation is part of the homework
sheet.
4.3 The Dirac equation

Let’s put ourselves back into speculation mode, as in Sec. 4.2.
Recall that the free Schrödinger equation
3
1 X 2
i∂t ψ = − ∂ ψ
2m i=1 i
contains first order time derivatives and second order spatial derivatives and therefore can-
not possibly be Lorentz-invariant. Klein-Gordon restored the symmetry by going second-
order in both time and space. Let’s instead try to guess an analogue of i∂t ψ = Hψ where
now H, too, is built out of first-order spatial derivatives. Then H must be of the form
3
X
H = −i αi ∂i + mβ (4.13)
i=1
for suitable coefficients αi , m. The operator H should represent the energy which, by (4.3)
satisfies E 2 = m2 + ∥p∥2 . Invoking the correspondence principle, we thus demand
3
X 2
H2 = − i αi ∂i + mβ
i=1
3
X 3
X 3
X
(αi )2 ∂i2 + m2 β 2 − αi αj + αj αi ∂i ∂j − im αi β + βαi ∂i

=−
i=1 i̸=j=1 i=1
3
!
X
=− ∂i2 + m2 ,
i=1
which is equivalent to
αi αj + αj αi = 2δij , αi β + βαi = 0, β 2 = 1. (4.14)
The diagonal case i = j says that the αi ’s square to identity, while the off-diagonal case
i ̸= j says that they anti-commute. There is clearly no way to satisfy these conditions
using numbers, but squaring-to-identity and anti-commuting sounds a lot like what Pauli
matrices do.
Before presenting a solution in terms of Paulis, let’s re-write the condition to look more
relativistic. Assume we could find four matrices γ 0 , ..., γ 3 such that
[γ µ , γ ν ]+ = 2η µν 1, (4.15)
i.e. that anti-commute and square to ±1. Then one may check directly that
β = γ0, αi = βγ i (4.16)
would solve (4.14). (In mathematical terms, the relations (4.15) say that the gamma matri-
ces are generators of the Clifford algebra associated with the Minkowski form).
To find a solution to (4.15), start with the Pauli matrices

1 0 0 1
1 = σ0 = 0 1 , X = σ1 =
1 0
,

0 −i 1 0
Y = σ2 = , Z = σ3 = .
i 0 0 −1
There are only three non-trivial Pauli matrices, while we need four γ matrices. But tensor
products of two Paulis give us enough freedom. For example, one can verify directly that
a solution to (4.15) is given by
γ 0 = Z ⊗ 1,
γ 1 = i Y ⊗ X,
γ 2 = i Y ⊗ Y,
γ 3 = i Y ⊗ Z.
Or, written out as 4 × 4 block matrices:
1 0

0 σi
γ0 = , γi = . (4.17)
0 −1 −σi 0
This solution is called the Dirac representation of the γ-matrices. Other representations
can be obtained by a unitary change of basis in C4 . For example, we’ll later have use for
the Weyl representation, which has the same γ i , but replaces γ 0 by
0 1

γWeyl = X ⊗ 1 =
0
1 0 .
Now put things together. Start from i∂t ψ = Hψ, plug in the ansatz (4.13), multiply
with β from the left, and use (4.16) to pass to the gamma matrices, to obtain the Dirac
equation
iγ µ ∂µ − m ψ = 0.

(4.18)
The compact notation hides quite a lot of complexity. Indeed, because the gamma’s are
4 × 4-matrices, the Dirac equation acts on vector-valued wave functions. Explicitly:
 
ψ1 (x)
(i∂t − m)1

+iσ · ∇ ψ2 (x)
=0
(−i∂t − m)1 ψ3 (x)

−iσ · ∇
ψ4 (x)
in terms of the 2 × 2 matrices
3
X ∂z ∂x − i∂y
σ·∇= σi ∂xi = .
∂x + i∂y ∂z
i=1
That doesn’t look very transparent. We’ll check whether one can make physical sense
of it in the next sections.
4.3.1 Momentum representation of the Dirac equation

Like any partial differential equation with constant coefficients, the free Dirac equation
can be solved in terms of plane waves
µ
ψ(x) = up e−ipµ x .
The ansatz is parameterized by a wave vector p = (E, p) ∈ R4 and a vector-valued coef-
ficient up ∈ C4 (this is analogous to the treatment of the Maxwell equation in Sec. 3.2).
Plugging in, the Dirac equation reduces to a 4-dimensional equation:
(E − m)1

µ −σ · p
(iγ µ ∂µ − m)up e−ipµ x = 0 ⇔ u = 0. (4.19)
+σ · p (−E − m)1 p
The simplest case is p = 0 (a particle at rest), where the matrix is diagonal:
(E − m)1

0
u = 0.
0 (−E − m)1 p
The equation has non-trivial solutions if and only if E = ±m. Thus, solutions with
unbounded negative energies turn up once more (oh well – we’ll exorcise them later by
passing to a quantum field theory). For each sign, there is a two-dimensional space of
solutions: For positive energies, the first two components of up can be chosen freely,
while the last two must be 0. The reverse is true for negative energies.
We have argued that if p = (E, 0, 0, 0), then (γ µ pµ −m)up = 0 has a two-dimensional
set of solutions iff E = ±m and no solutions else. The (optional) argument below shows
that the conclusion holds for arbitrary p ∈ R4 .
Indeed,
√ the Clifford algebra property (4.15) implies that γ µ pµ has eigenvalues
± pµ pµ = ±m with equal multiplicity. To prove this, first compute the square
1 µ ν
(γ µ pµ )2 = γ µ γ ν pµ pν =(γ γ pµ pν + γ ν γ µ pν pµ ) = η µν pµ pν 1 = pµ pν 1,
2
√
which shows that the eigenvalues of γ µ pµ are ± pµ pµ . To see that the two eigen-
values occur with equal multiplicity, it is thus sufficient to show that tr γ µ pµ = 0.
But that is always the case: For i ̸= 0, using first anti-commutativity and then
cyclicity of the trace,
tr γ i = tr γ 0 γ 0 γ i = − tr γ 0 γ i γ 0 = − tr γ i γ 0 γ 0 = − tr γ i ⇒ tr γ i = 0.
The proof that tr γ 0 = 0 works similarly. Note that the argument only used (4.15)
and is therefore valid for any representation.
Some comments:
• We have seen that, while the Dirac equation nominally acts on four-component wave
functions, it is better to think of ψ as a two-component function that has been em-
bedded into C4 . Which two-dimensional subspace of C4 is used depends on the
sign of the energy and the value of p. This is interesting, as “two component wave
function” might remind you of the theory of spin- 12 degrees of freedom. We will
indeed see in a moment that the Dirac equation describes spin- 12 particles.
• p
Recall how in the Klein-Gordon case, we initially wanted to implement the root
m2 + ∥p∥2 , but settled for the squared relation because we didn’t want to take the
root of differential operators? Dirac gets around this using the remarkable property
√
eigs(γ µ pµ ) = ± pµ pµ of Clifford algebras. Note that γ µ pµ is linear in momen-
tum, but still has eigenvalues that show the right square root behavior! The price to
pay for this trick is having to switch to a higher-dimensional space of wave func-
tions.
The lecture notes are fairly complete until this point, and then again from the Ap-
pendix on. The remainder until the Appendix is under active construction, though.
4.3.2 Lorentz invariance of the Dirac equation

We have to define an action of the Lorentz group on four-component wave functions. By
analogy with the transformation behavior of vector fields (Sec. 4.1.4), we aim to associate
with every Lorentz transformation Λ a complex 4 × 4-matrix SΛ such that ψ transforms as
ψ ′ (x) = SΛ ψ Λ−1 (x) .

(4.20)
The big question is whether one can choose Λ 7→ SΛ such that ψ ′ is a solution of the Dirac
equation whenever ψ is. Only in this case does the Dirac equation define a set of solutions
that is Lorentz-invariant. Plugging the ansatz for ψ ′ into the Dirac equation gives
iγ µ ∂µ ψ ′ = mψ ′ ⇔ iγ µ Λµ ν ∂ν SΛ ψ = mSΛ ψ ⇔ i SΛ−1 γ µ SΛ Λµ ν ∂ν ψ = mψ.

The final expression equals the Dirac equation for ψ if and only if
SΛ−1 γ µ SΛ = Λµ α γ α (4.21)
ν
(because Λµ α Λµ ν = (ΛT Λ−T )α = δα ν ). To see whether this is possible, let’s recall
some basics from the quantum theory of angular momentum.
Recap: Spin- 12 representations

A spin- 12 degree of freedom is associated with the Hilbert space H = C2 . Under a spatial
rotation R, a state |ψ⟩ ∈ C2 will be transformed to |ψ ′ ⟩ = UR |ψ⟩. Performing R after
another rotation S is the same physical operation as performing RS in one step. This
means that the associated actions UR US and URS on quantum states have to give the same
results. Because state vectors are defined only up to a global phase factor, it follows that
UR US = eiϕR,S URS (4.22)
for some ϕR,S ∈ R. A map R 7→ UR satisfying (4.22) is called a projective representation

of the rotation group SO(3).
To find more general projective representations of a group, it is easiest to analyze what
physicists call the infinitesimal generators and mathematicians call the Lie algebra of the
group.
Expanding the exponential into a series, one sees that a rotation by an angle θ about
the z-axis can be written as
   
cos θ − sin θ 0 0 −1 0
Rθ,e3 =  sin θ cos θ 0 = e−iθL3 , L3 = i∂θ |0 Rθ,e3 = i 1 0 0 .
0 0 1 0 0 0
For rotations about a general unit vector n ∈ R3 , the formula is

   
P3 0 0 0 0 0 1
Rθ,n = e−iθ i=1 ni Li , L1 = i 0 0 −1 , L2 = i  0 0 0 .
0 1 0 −1 0 0
To interpret these, write the matrix exponential as

−iθL3 N
e−iθL3 = lim 1+ .
N →∞ N
Thus, we can express a rotation by a finite angle θ as a sequence of “infinitesimal rota-

tions” 1 − iϵL3 , with ϵ ≪ 1. This relation suggests that to understand the behavior of
general elements of a group, it is often enough to understand the infinitesimal ones. (An
observation made precise in the theory of Lie groups.)
Let’s apply this way of thinking to a projective representation Uθ,n of the rotation
group. They, too, can be written in terms of generators
Uθ,ei = e−iθJi .
Obvious question: What are the conditions on the generators Ji such that (4.22) holds?
Using the Baker-Campbell-Hausdorff formula, one can show that (4.22) is essentially3
equivalent to
[Ji , Jj ] = [Li , Lj ].
In other words, commutator relations encode the group law of symmetries. This partly
explains the obsession of physicists with commutators. .
In the particular case of the rotation group, recall (or check)
[Li , Lj ] = iϵijk L3
which is satisfied by one half the Paulis Ji = 12 σi . Thus,

θ
P P
R = e−iθ i ni Li
7→ e−i 2 i σi
=: UR .
defines a projection representation of the rotation group. This is the spin- 21 -representation.
We will make use of the following covariance property of the Pauli basis:
UR† σ i UR = Ri j σ j . (4.23)
Note: Common (admittedly illogical) convention has it that the placement of indices
on Pauli matrices has no significance (i.e. σ µ = σµ ), while for γ-matrices, γµ = ηµν γ ν .
This formula is, in fact, the reason that the Bloch representation of rotations works:
(a′ )i = tr σ i UR |ϕ⟩⟨ϕ|UR
† † i
= tr UR σ UR |ϕ⟩⟨ϕ| = Ri j aj .
It is sufficient to prove the first-order version:
θ i θ i i i j
∂θ |0 ei 2 σ σ j e−i 2 σ = [σ , σ ] = −ϵijk σ k ,
2
which should be compared to the transformation of a vector under rotations
j
∂θ |0 e−iθLi v = (ei × v)j = ϵjik v k = −ϵijk v k .
4.3.3 Spin representations of the Lorentz group

Let’s now find the generators of Lorentz boosts.
   
cosh(α) sinh(α) 0 0 0 1 0 0
 sinh(α) cosh(α) 0 0 1 0 0 0
K1 = i∂α |0 Λαex = i∂α |0   0
 = i ,
0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
3 TBD.
And likewise:
     
0 1 0 0 0 0 1 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0
K1 = i 0
, K2 = i  , K3 = i  .
0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0
This leads to commutation relations
[Li , Lj ] = iϵijk Lk , [Li , Kk ] = iϵijk Kk , [Ki , Kj ] = −iϵijk Lk ,
which can be represented on C2 by

1 i
Li 7→ σi Ki 7→ ± σi .
2 2
This gives rise to two representations of SO+ (1, 3) in terms of non-unitary 2 × 2-matrices:
Non-unitary
1 1
P P P P
Λ = ei i li Li +i i ki Ki
7→ ei 2 i li σi ± 2 i ki σi
=: A±
Λ.
We’ll use the identities

1 1
P P
(A+
Λ)
−†
= e+i 2 i li σi − 2 i ki σi
= A− +
Λ = AP ΛP (4.24)
and also write A := A+ .

Like in the Bloch representation of rotations, the Pauli basis again transforms nicely
under these representations. Define σ µ = (σ 0 , σ) and σ̄ µ = (σ 0 , −σ). Then
σ̄ 0 = σ 0 , σ̄ i = −σ i
Then
A†Λ σ µ AΛ = Λµ ν σ ν , A−1 µ −† µ
Λ σ̄ AΛ = Λ ν σ̄
ν
in terms of the parity transform P .
We have already verified the formula for rotations. Now let Λ be a pure Lorentz
boost along the ei -axis.
 i
 σ µ=0
θ σi µ θ σi 1
∂θ |0 e 2 σ e 2 = [σ i , σ µ ]+ = σ0 µ=i ,
2
0 else

which again should be compared to the action on vectors

 i
 v µ=0
µ
∂θ |0 eiθKi v = v0 µ = i .

0 else

The second equality follows from (4.24).

Weyl Spinors
Set
A−†

Λ 0
SΛ = .
0 AΛ
Then
A†Λ A†Λ σ µ AΛ
−†
σµ

0 0 AΛ 0 0
SΛ−1 γ µ SΛ = = µ −†
= Λµ ν γ ν
0 A−1
Λ
σ̄ µ 0 0 AΛ A−1
Λ σ̄ AΛ 0
as required.
4.4 Non-relativistic limit

Dirac equation with fields
We will treat that limit for a version of the Dirac equation which incorporates classical EM
fields. Mimicking what worked in Sec. 3.4, this is achieved using the substitution
pµ 7→ pµ − qAµ ⇒ i∂µ 7→ i∂µ − qAµ .
where Aµ = (ϕ, A) is the 4-potential. This leads to
γ µ (i∂µ − qAµ ) − m ψ = 0,

(4.25)
the Dirac equation in the presence of fields.
It turns out that the non-relativistic limit is best treated using the Dirac representation
of the gamma-matrices. Denote the components of the Dirac spinor as ψ = (ψl , ψs ), with
ψl/s ∈ C2 . (Recall that for a particle at rest, the ψl correspond to positive energy solutions
and the ψs to negative energy solutions. Thus, it would be nice if ψs would get “small”
and the ψl relatively “large” in the non-relativistic limit. As we’ll see momentarily, this
is indeed what happens and explains the labels.) Denote the mechanical 3-momentum by
Π := p − qA. Then the block matrix representation of the Dirac equation with fields is
(c.f. (4.19)

E − qϕ − m −σ · Π ψl
= 0.
σ·Π −E + qϕ − m ψs
This can be written as two coupled equations
σ · Πψs = (E − qϕ − m)ψl ,
σ · Πψl = (E − qϕ + m)ψs .
Equivalently, solving the second one for ψs ,
1
ψs = σ · Πψl . (4.26)
E − qϕ + m
and plugging the result into the first one
1 2
σ · Π ψl = (E − qϕ − m)ψl . (4.27)
E − qϕ + m
So far, we have only re-formulated things. Eqs. (4.26, 4.27) are exactly equivalent to
(4.25). To make progress, we will now pass to the non-relativistic limit.
Non-relativisitic limit
Now we assume that we are in the non-relativistic limit where all energies are small com-
pared to the rest mass:
E − m ≪ m, |qAµ | ≪ m, ...
Then
E − qϕ + m ≃ 2m, (4.28)
so that from (4.26),

3
1 1 X
∥ψs ∥ ≃ σΠψl ≤ ∥∂j ψl ∥ + ∥qAj ψl ∥ ≪ ∥ψl ∥.
2m 2m j=1
Thus, in the non-relativistic regime, the “small spinor” ψs has a much smaller norm than
the “large spinor” ψl and can therefore be neglected. We thus focus on (4.27), which, using
(4.28) becomes
1
(E − m)ψl = σΠ)2 ψl + qϕ ψl ,
2m
the Pauli equation.
You may be able to see the Schrödinger equation lurking through. The left hand side
factor is just the non-relativistic energy. The right rightmost term is he potential energy.
The middle term requires some more work, but it is quadratic in momenta, which looks
promising!
About that middle term. A tedious calculation gives
2
σΠ)2 = P̂ − qA − qσ · B.
Here are some details. Start with

X X 2 X
σΠ)2 = σj σk Πj Πk = Πj + i ϵjkl σl Πj Πk .
jk j jkl
Compute the second summand for the case l = 3:

X
i ϵjk3 σ3 Πj Πk = iσ3 Π1 Π2 − Π2 Π1
jk

= −qσ3 ∂x Ay + Ax ∂y − ∂y Ax − Ay ∂x

= −qσ3 ∂x Ay + qσ3 ∂y Ax = −qσ3 (∇ × A 3 .
To go from the second to the third line, recall that we are calculating with operators!
So ∂x Ay is not the x-derivative of Ay , but the operator that will act on a function
ψ first by multiplying with Ay and then differentiating the product. Thus

∂x Ay − Ay ∂x ψ = ∂x Ay ψ − Ay ∂x ψ = ∂x Ay ψ + Ay ∂x ψ − Ay ∂x ψ

= ∂x Ay ψ.
Treating the other components similarly, we arrive at

X 2
σΠ)2 = (−i∂j − qAj )2 − qσ · ∇ × A = P̂ − qA − qσ · B.

j
Final cleanup! Let

1 q
S= σ, µB = , g=2
2 2m
be the spin- 12 angular momentum operator, the Bohr magneton, and, respectively the Landé
factor. Then we finally recover the Schrödinger equation for a particle with spin- 21 in an
EM field:

1 2
(E − m)ψl = P̂ − qA + qϕ − gµB S · B ψl .
2m
Why did we break up the coupling constant q/(2m) into two factors µB times g? To
explain that, assume the magnetic field is constant and of the form
 
0
B = 0.
B
A potential giving rise to this is

 
−By
1 1
A= B×X = Bx  .
2 2
0
Then
2
= −∇2 + iq ∇ · A + A · ∇ + q 2 A2

− i∇ − qA
For the middle summand, realize that Ai commutes with ∂i so that it can be written as

−2qA · (−i∇) = −q xPy − yPx B = −qL · B,
with orbital angular momentum operator L = X × P . Thus the Hamiltonian is

1 2 q 1
∥B × X∥2

P̂ − L + 2S · B + qϕ +
2m 2m 2m
1 2 q
≃ P̂ − L + 2S · B + qϕ,
2m 2m
where the approximation holds when the magnetic field can be neglected to second order.
Thus, whereas L + S is a conserved quantity, the dynamical coupling between spin and
magnetic field is twice as strong as between orbital angular and magnetic field!
Chapter 5
Symmetries in Quantum Mechanics
97
Appendix A
Quantum mechanics recap
In this chapter, we recall some facts that should be familiar from linear algebra and intro-
ductory quantum mechanics courses. The textbook Quantum Mechanics by L. Ballentine
is a good source for this material.
A.1 Linear algebra of Hilbert spaces

A.1.1 Hilbert spaces
A Hilbert space H is a complex vector space with a sesquilinear inner product ⟨·|·⟩.
Sesquilinearity means that for all vectors
α, β, γ ∈ H
and complex numbers z ∈ C, we have
⟨α|β + γ⟩ = ⟨α|β⟩ + ⟨α|γ⟩, (A.1)

⟨α|zβ⟩ = z⟨α|β⟩, (A.2)
as well as
⟨α|β⟩ = ⟨β|α⟩. (A.3)
From this, it follows that
⟨α + β|γ⟩ = ⟨α|γ⟩ + ⟨β|γ⟩,

⟨zα|β⟩ = z̄⟨α|β⟩,
i.e. the inner product is anti-linear w.r.t. the first entry and linear w.r.t. the second one.
Beware that mathematicians usually employ the opposite convention, where the
sesquilinear inner product is linear in the first entry!
The norm of a vector α ∈ H is given by

p
∥α∥ := ⟨α|α⟩.
Recall that inner products are required to be definite, i.e. to fulfill
∥α∥ > 0 ∀α ̸= 0.
98
APPENDIX A. QUANTUM MECHANICS RECAP 99
There are two examples of Hilbert spaces you should be acquainted with: column
vectors and square-integrable functions. Let’s look at both in turn.
The vector space Cd is formed by d-dimensional complex column vectors
 
α1
α =  ... 
 
αd
with sesquilinear inner product

d
X
⟨α|β⟩ = ᾱi βi . (A.4)
i=1
Show Hilbert spaces appears e.g. in the description of spin degrees of freedom.
More involved is the Hilbert space L2 (Rn ) of square-integrable complex functions on
R . Given two functions α, β : Rn → C, we can define a “continuous analogue” of
n
Eq. (A.4):
Z
⟨α|β⟩ = ᾱ(x)β(x) dn x. (A.5)
For the non-pedantic physicist, the space of all wave functions, together with (A.5) defines
a Hilbert space. It is associated with a point particle with n degrees of freedom.
There are three technical problems that one has to address to define the Hilbert space
of functions with mathematical rigor.
The first problem is the integral is not actually defined for all functions. Set, for
example

sin(1/x) x ̸= 0,
ψ(x) =
0 x = 0.
Then
Z
|α(x)|2 dn x
does not exist (in either the Riemann or the Lebesque sense). The second problem
is that the integral may be defined, but infinite – take e.g. α(x) = 1 and compute
⟨α|α⟩. To get rid of both problems, we define a function α to be square-integrable
if
Z
∥α∥2 = ⟨α|α⟩ = |α(x)|2 dn x
exists and is finite. If α, β are square-integrable, then the product ᾱβ is integrable,
and the Cauchy-Schwarz inequality says that
|⟨α|β⟩|2 ≤ ∥α∥2 ∥β∥2 < ∞,
so that, by restricting to square-integrable functions, we have rid outselves of unde-

fined and infinite integrals!
The third problem is that the norm is no longer definite. Indeed, define a function

1 x=0
α(x) = .
0 x ̸= 0
Then α ̸= 0, but ∥α∥2 = 0. Circumventing this problem requires some mathemati-

cal gymnastics: We say that two functions are equivalent if they differ only on a set
of measure zero. This means e.g. that the function α is equivalent to the 0-function,
as the two differ only at one point. If we define the Hilbert space L2 (Rn ) to be the
complex vector space of equivalence classes of square-integrable functions, then
one can show that (A.5) becomes a definite inner product. Problem solved.
Another technical issue with function spaces concerns physical units. Let me say
upfront that one can represent all physical quantities just by real numbers relative to
some fixed set of units, and that in this case, none of the issues below arise. (This is
what we will mainly do in this document). However, attaching a dimension to every
physical quantity has some value in that it can highlight certain inconsistencies and
guide heuristic arguments. So let’s briefly discuss how this would be done in QM.
For example, we may want ψ(x) to be defined not on the set of real numbers, but
on a set representing physical positions ([x] = L) measured in some concrete unit
of length, say meters m. Then [dx] = L as well, and for the normalization property
to work out, we can either stick with the scalar product
Z
⟨ϕ|ψ⟩ = ϕ̄(x)ψ(x) dx,
R
in which case the wave function needs to have the dimension [ψ(x)] = L1/2 , or we
retain dimensionless wave functions, in which case we have to redefine the scalar
product
Z
⟨ϕ|ψ⟩ = ϕ̄(x)ψ(x) dµ(x),
R·m
with respect to a dimension-free measure
1
dµ(x) := dx.
m
When working in another continuous representation (e.g. momentum, see below),
the units will have to be adapted accordingly. Unlike functions that depend
P on con-
tinuous parameters, discrete coefficients remain dimensionless (so that [ i |αi |2 ] =
1) and thus do not carry information about their physical interpretation.
We will sometimes consider slight modifications, e.g. by choosing some region R ⊂

Rd and working with the space L2 (R) of square-integrable functions on R, subject to
appropriate boundary conditions.
A.1.2 Linear operators

A map A between two vector spaces is linear if
A(ϕ + ψ) = A(ϕ) + A(ψ), A(λϕ) = λA(ϕ).
In QM, linear maps between Hilbert spaces are traditionally called operators.
Examples:
• H = Cd : In this case, operators can conveniently be specified as matrices, which act
on column vectors in the usual way. For example, we will have ample opportunity
to work with the Pauli matrices:

0 1 0 −i 1 0
σx = , σx = , σz = .
1 0 i 0 0 −1
• H = L2 (R): The position operator acts on a function ψ : R → C by multiplying

it with its argument
(Xψ)(x) = xψ(x).
The momentum operator maps a function to −iℏ times its derivative
d
P = −iℏ : ψ 7→ P ψ = −iℏψ ′ .
dx
A.1.3 Dirac notation
Physicists often use notational aids to delinate vector-valued quantities from scalars. In
quantum mechanics, the suggestive Dirac notation (or “bra-ket” notation) is usually em-
ployed. Here, a vector α ∈ H is written as |α⟩. This is called a ket, for reasons that will
be obvious momentarily.
Every vector ψ ∈ H defines a linear function
H → C, ϕ 7→ ⟨ψ|ϕ⟩,
the “projection onto ψ”. In quantum, we denote this function as ⟨ψ| and call it a bra. Then
we can write
⟨ψ|(|ϕ⟩) = ⟨ψ|ϕ⟩, (A.6)
so a “bra-ket” is a “braket”. This passes for humor around here.
Linear functions from a vector spaces to C are also called dual vectors or (linear)
functionals. In the calculus of variation – i.e. the branch of calculus that turns the
action principle into the Euler-Langrange equation – the word “functional” is used
instead to refer to a function that takes other functions as arguments. Don’t be
confused by this ambiguity!
In math and engineering, is is common to use a star or sometimes a dagger super-
script to denote the functional associated with a vector in a Hilbert space:
|ψ⟩ ↔ ψ, ⟨ψ| ↔ ψ ∗ or ψ † .
The genius of this notation is that one doesn’t need to expend any thoughts on concepts
like “dual vectors” or “linear functionals” – the formalism almost forces one to use these
object correctly.
Let’s play around with this. Equation (A.6) is the inner product between |ψ⟩ and |ϕ⟩.
One can combine two vectors also to form an outer product, namely the linear operator
H → H defined as

|β⟩ 7→ |ϕ⟩⟨ψ| |β⟩ := |ϕ⟩ ⟨ψ|β⟩ . (A.7)
Definition (A.7) implies that composing bra’s and ket’s is associative: One can read the
expression
|ϕ⟩⟨ψ|β⟩
as either

|ϕ⟩⟨ψ| |β⟩ “operator acting on vector”
or as

|ϕ⟩ ⟨ψ|β⟩ “vector times inner product” ,
getting the same result.
A.1.4 Bases
Let H be a Hilbert space. A set {|ei ⟩}i ⊂ H is called ortho-normal if
⟨ei |ej ⟩ = δi,j .
If in addition, every element |ψ⟩ ∈ H can be expressed as a liner combination

X
|ψ⟩ = ψi |ei ⟩
i
with suitable expansion coefficients ψi ∈ C, then we have an ortho-normal basis (ONB).
In physics, unless stated otherwise, “basis” always means “ortho-normal basis”.

Also, one usually assumes that every Hilbert space comes with some distinguished
basis, ideally with a clear physical interpretation.
Every ONB fulfills the completness relation
|ei ⟩⟨ei | = 1,
X
(A.8)
i
where 1 : |ψ⟩ 7→ |ψ⟩ is the identity map.

P
To prove it, calculate for an arbitrary |ψ⟩ = i ψi |i⟩,
! !
X X X
|ei ⟩⟨ei | ψj |ej ⟩ = ψj |ei ⟩ ⟨ei |ej ⟩ = |ψ⟩.
i j i,j
| {z }
δi,j
The converse is not true: There are complete sets that are not ortho-normal bases.
Using just the completeness relation, the following important properties of ONBs can
be easily verified:
1. Expansion coefficients are given by inner products
X
|ψ⟩ = 1|ψ⟩ =
X
|ei ⟩⟨ei | |ψ⟩ = ⟨ei |ψ⟩ |ei ⟩.
| {z }
i i
ψi
2. Expansion coefficients of bras are the complex conjugate:
⟨ψ| = ⟨ψ|1 =
X
⟨ψ|ei ⟩ ⟨ei |.
| {z }
i
ψ̄i
3. Inner products with respect to an arbitrary ONB:
⟨ψ|ϕ⟩ = ⟨ψ|1|ϕ⟩ =
X X
⟨ψ|ei ⟩⟨ei |ϕ⟩ = ψ̄i ϕi .
i i
The special case where ϕ = ψ is sometimes called the Parseval relation:

X
∥ψ∥2 = ⟨ψ|ψ⟩ = |ψi |2 .
i
4. Description of operators via matrix elements
A = 1 A1 =
X X
|ei ⟩⟨ei |A|ej ⟩⟨ej | = Ai,j |ei ⟩⟨ej |, Ai,j := ⟨ei |A|ej ⟩. (A.9)
i,j i,j
so that
⟨ϕ|A|ψ⟩ = ⟨ϕ|1A1|ψ⟩ =
X
ϕ̄j Aij ψi .
ij
The expression (A.9) also shows that for every basis {|ei ⟩}i of the Hilbert space, the
set {|ei ⟩⟨ej |}ij is a basis for the vector space of linear operators.
The Dirac notation allows one to save a bit of ink when working with one fixed ONB.
Say we have agreed to work with {|ei ⟩}i . Then quantum physicists (and no-one else. . . )
commonly drop the symbol e and just put the index into the ket:
|i⟩ := |ei ⟩.
Vector and matrix representations

Assume that H is finite-dimensional and that some ONB {|i⟩}di=1 has been fixed. Then
the calculations above define a one-one relation between H and the Hilbert space Cd of
row vectors. Concretely, take the dictionary
 
ψ1
 .. 
kets ↔ column vectors |ψ⟩ ↔  . 
ψd
bras ↔ row vectors ⟨ψ| ↔ (ψ̄1 , . . . , ψ̄d )

 
A1,1 ... A1,d
 .. .. 
operators ↔ matrices A↔ . . 
Ad,1 ... Ad,d
with
ψi = ⟨i|ψ⟩, Ai,j = ⟨i|A|j⟩.
Under this identification, the composition rules of bra’s, ket’s, and operators correspond to
the usual rules of matrix-vector multiplication. This representation is particularly useful
for computer implementations!
For example, a spin-1/2-degree of freedom is associated with the Hilbert space
H = {α|↑⟩ + β|↓⟩ | α, β ∈ C}
with basis {|↑⟩, |↓⟩}. Then one can introduce operators either abstractly or as matrices,
e.g.:

0 1
σx = |↑⟩⟨↓| + |↓⟩⟨↑| “ = ” ,
1 0

1 0
σz = |↑⟩⟨↑| − |↓⟩⟨↓| “ = ” .
0 −1
A.1.5 The adjoint

For any operator A on a Hilbert space H, there is a unique A† such that
⟨ϕ|A† |ψ⟩ = ⟨ψ|A|ϕ⟩ ∀ ψ, ϕ ∈ H.
The operator A† (pronounced “A dagger”) is called the adjoint of A. If one chooses a basis
of H and expands
X
A= Aij |i⟩⟨j|,
ij
then
X
⟨i|A† |j⟩ = ⟨j|A|i⟩ = Āji ⇒ A† = Āji |i⟩⟨j|.
ij
The matrix representation of A† is therefore the “conjugate transpose” of the one of A. An

operator A is self-adjoint or Hermitian if A = A† . It is easy to see that
†
z|α⟩⟨β| = z̄|β⟩⟨α|.
One can think about the adjoint A† as acting on “bras” the way that A acts on “kets”.
More precisely, recall that we introduced ⟨ϕ| as “the functional projecting onto |ϕ⟩”. Writ-
ing the projection of |ψ⟩ onto A|ϕ⟩ as
⟨ψ|A|ϕ⟩ = ⟨ϕ|A† |ψ⟩, (A.10)
we conclude that the functional that projects onto A|ϕ⟩ is given by ⟨ϕ|A† .
Examples: The Pauli matrices are self-adjoint. The momentum operator is self-adjoint:
Z ∞
⟨ϕ|P |ψ⟩ = ϕ̄(x)(−iℏ)ψ ′ (x) dx
−∞
Z ∞
=− (ϕ̄)′ (x)(−iℏ)ψ(x) dx (integration by parts)
−∞
Z ∞
= ψ̄(x)(−iℏ)ϕ′ (x) dx = ⟨ψ|P |ϕ⟩,
−∞
where we have used that for square-integrable functions limx→±∞ ψ(x) = 0 so that no
boundary terms appear when integrating by parts.
A.1.6 Spectral decomposition (discrete case)

Recall our old friend, the eigenvalue problem: Given an operator A : H → H, find all
λi , |ψi ⟩ such that
A|ψi ⟩ = λi |ψi ⟩
Of course, the λi ’s are the eigenvalues and the |ψi ⟩’s the eigenvectors of A.
A spectral decomposition (or eigendecomposition) of A is a representation of the form
1=
X X
A= λi |ψi ⟩⟨ψi |, |ψi ⟩⟨ψi |. (A.11)
i i
Given a spectral decomposition, compute

X
A|ψj ⟩ = λi |ψi ⟩ ⟨ψi |ψj ⟩ = λj |ψj ⟩.
i
| {z }
δij
It follows that A has an eigendecomposition if and only if one can find an ONB comprised
of eigenvectors. In this case, one refers to it as A’s eigenbasis, and the λi ’s appearing in
the decomposition are exactly the eigenvalues of A.
Not every operator has an eigenbasis, e.g. the spin-1/2 raising operator

1 0 1
σ+ = (σx + iσy ) =
2 0 0
does not (why?). There’s a theorem in functional analysis that essentially says that A
has an eigendecomposition if and only if A commutes with its adjoint, i.e. AA† = A† A.
(Though the case when there is a continuum of eigenvalues needs more attention, see
section below).
The most important class of operators for which this holds are, of course, the self-
adjoint ones A = A† . What is more, in this case, all eigenvalues are real. Indeed, A|ψ⟩ =
λ|ψ⟩ implies (taking |ψ⟩ to be normalized without loss of generality)
λ = ⟨ψ|A|ψ⟩ = ⟨ψ|A† |ψ⟩ = ⟨ψ|A|ψ⟩ = λ̄.
Thus the self-adjoint operators are exactly those of the form
λi ∈ R, {|ϕi ⟩}i an ONB.
X
A= λi |ϕi ⟩⟨ϕi |,
i
A.1.7 Spectral decomposition (continuous case)

When working out eigendecompositions in infinite dimensions, we can run into trouble.
Let’s see what can go wrong.
d
First, consider the momentum operator P = −i dx . The eigenvalue equation is trivial
to solve:
−iψ ′ = λψ ⇔ ψ(x) = c eiλx .
Trouble is that these solutions are not square integrable:
Z ∞
∥ψ∥2 = |c|2 dx = ∞. /
−∞
There are additional problems! For the position operator (Xψ)(x) = xψ(x), the eigen-
value equation
xψ(x) = λψ(x) ∀x
is solved by

c x=λ
ψ(x) = ,
0 else
which has norm ∥ψ∥ = 0. So it seems like there are no eigendecompositions for the two
most important operators of QM. //
To get around the problem, we widen our domain of discourse by allowing for more
general objects than just square-integrable functions. Let’s first see how this formally
solves our problem. Whether we are “allowed to do this”, i.e. whether the formal con-
struction will lead to inconsistencies is something we’ll worry about later.
Delta distributions
The distribution δy is a formal object whose inner product with a smooth function ϕ is
defined to be
Z
⟨δy |ψ⟩ = δ̄y (x)ψ(x) dx := ψ(y).
Then the expression

Z
x|δx ⟩⟨δx | dx
x
provides an eigendecomposition of the position operator X in the sense that for any pair
of smooth functions ϕ, ψ we get the correct result
Z Z Z
⟨ϕ| x|δx ⟩⟨δx | dx |ψ⟩ = x⟨ϕ|δx ⟩⟨δx |ψ⟩ = xϕ̄(x)ψ(x) = ⟨ϕ|X|ψ⟩. (A.12)
x
Likewise, we have the completeness relation

Z
|δx ⟩⟨δx | dx = 1 (A.13)
x
in the same sense, i.e.

Z Z Z
⟨ϕ| |δx ⟩⟨δx | dx |ψ⟩ = ⟨ϕ|δx ⟩⟨δx |ψ⟩ = ϕ̄(x)ψ(x) = ⟨ϕ|ψ⟩.
x
So when integrated against smooth functions, the expressions above behave just like an
eigendecomposition should. We can work this that! ,
Plane waves
We now turn to eigendecomposition of the momentum operator. For k ∈ R, define the
non-normalizable eigenfunction
ϕk (x) = (2π)−1/2 eikx .
We claim that
Z Z
|ϕk ⟩⟨ϕk | dk = 1, ℏk |ϕk ⟩⟨ϕk | dk = P,
k k
in the sense that for ψ, ϕ vanishing at infinity

Z
⟨ϕ| |ϕk ⟩⟨ϕk | dk |ψ⟩ = ⟨ϕ|ψ⟩, (A.14)
Z k
⟨ϕ| ℏk |ϕk ⟩⟨ϕk | dk |ψ⟩ = ⟨ϕ|P |ψ⟩. (A.15)
k
To see that this is true, note that the inner product with a function ψ
Z
−1/2
⟨ϕk |ψ⟩ = (2π) e−ikx ψ(x) dx = ψ̃(k)
gives the Fourier transform ψ̃ of ψ evaluated at k. Recall that the inverse transform is
Z
−1/2
(2π) eikx ψ̃(k) dk = ψ(x).
The completeness relation Eq. A.14 thus follows from

Z Z
⟨δx | |ϕk ⟩⟨ϕk | dk |ψ⟩ = (2π)−1/2 eikx ψ̃(k) dk = ψ(x).
k k
Next, for ψ vanishing at infinity, integration by parts give

Z
−1/2
ℏk⟨ϕk |ψ⟩ = (2π) ℏke−ikx ψ(x) dx
Z
−1/2 d −ikx
= (2π) iℏ e ψ(x) dx
dx
Z
d
= (2π)−1/2 −iℏ ψ(x) e−ikx dx = ⟨ϕk |P |ψ⟩
dx
which implies Eq. (A.15):

Z Z
⟨ϕ| ℏk|ϕk ⟩⟨ϕk | dk |ψ⟩ = ⟨ϕ| |ϕk ⟩⟨ϕk | dk P |ψ⟩ = ⟨ϕ|P |ψ⟩. ,,
k k
General eigendecompositions
We can now sketch the way in which a general Hermitian operator A has an eigendecom-
position. Consider all solutions to the eigenvalue equation
A|ψλ ⟩ = λ|ψλ ⟩,
regardless of whether |ψλ ⟩ is square-integrable or not. Assume for simplicity that A is

non-degenerate, i.e. that for every λ ∈ R, there is at most one eigenfunction |ψλ ⟩. An
eigenvalue λ ∈ C is called discrete if it is separated from all other eigenvalues by a finite
distance. Let D be the set of discrete eigenvalues. Eigenvalues that are not discrete are
called continuous. Collect them in another set C. Choose normalization such that
⟨ψλ′ |ψλ ⟩ = δλ′ ,λ λ ∈ D,

′
⟨ψλ′ |ψλ ⟩ = δ(λ − λ) λ ∈ C.
Then we have the completeness relation and spectral decomposition

Z
1 = |ψλ ⟩⟨ψλ | dλ +
X
|ψλ ⟩⟨ψλ |,
C λ∈D
Z X (A.16)
A= λ|ψλ ⟩⟨ψλ | dλ + λ|ψλ ⟩⟨ψλ |.
C λ∈D
We can unify the treatment of the discrete and the continuous part. Define
1 λ′ ∈ C
X
ρ= δλ + IC in terms of the indicator function IC (λ′ ) = .
0 else
λ∈D
The delta functions allow us to incorporate the sums in (A.16) into the integral:
Z
A = λ|ψλ ⟩⟨ψλ |ρ(λ) dλ. (A.17)
The completeness relation generalizes like this: For any subset S ⊂ R,

Z
|ψλ ⟩⟨ψλ |ρ(λ) dλ = PS , (A.18)
S
where PS projects onto the space spanned by {|ψλ ⟩ | λ ∈ S}. This looks somewhat like
the formula
Z
ρ(λ) dλ = µ(S)
S
for computing the measure of a set S given a density ρ. Therefore, the map S 7→ PS is
called a projection-valued measure and ρ the density of states (with respect to dλ). The
interpretation of ρ is particularly clear when applied to sets S that do not intersect the
continuous part S ∩ C = ∅. Then
Z
ρ(λ) dλ = |S ∩ D|
S
equals the number of eigenvalues of A in S.
See Chapter 1 of Quantum Mechanics by Ballentine for a more careful, but not too
technical exposition. A rigorous version is the spectral theorem of functional analysis.
A.1.8 More on delta distributions

How to think about distributions
Our account of general eigendecompositions and distributions is not mathematically rig-
orous. It can be made precise, but doing so takes a lecture in functional analysis (c.f. the
spectral theorem and the theory of distributions). Given that we won’t take the time here
to go into more details, how should one deal with distributions that pop up in equations?
Some strategies:
1. Integrate against smooth functions that quickly vanish at infinity. As in (A.12),
even if the intermediate mathematical expression contains δ’s, they should have all
vanished after one has integrated the expression over smooth functions in order to
extract physical quantities. The mathematically rigorous approach is based on this
strategy, and it is the one we will have at the back of our heads in this document.
2. Think of δ is an idealization of “highly concentrated”. One can in principle replace
(ϵ)
δx by functions δx that are supported on an ϵ-ball around x, where ϵ is much
smaller than any relevant length scale. The final physical results should then only
weakly depend on the actual choice of ϵ, and one should, in fact, be able to take a
limit ϵ → 0. In this sense, the actual distribution is an idealization that allows one
to directly obtain the limit, without first introducing an ϵ and eliminating it again in
the end.
3. Shut-up-and-calculate. The reason δ’s are so ubiquitous is that they work well as a
computational tool. So in reality, people just use them whenever they would have
used a Kronecker delta in a discrete analogue, and pretend that all algebraic manip-
ulations that are valid for Kronecker deltas also extend to distributions. This mostly
works.
Derivatives of delta functions

While the mathematicians look the other way, let’s get adventurous and represent the mo-
mentum operator in position basis.
The derivative of the delta function δy′ (x) is a formal object whose inner product with
smooth functions vanishing at infinity is defined so that formally the rule of integration by
parts holds:
Z Z
⟨δy′ |ψ⟩ = δx′ (x)ψ(x) dx := − δx (x)ψ ′ (x) dx = −ϕ′ (y)
and therefore
Z
P = iℏ |δx ⟩⟨δx′ | dx
is valid in the sense that for all smooth ψ, ϕ vanishing at infinity,

Z Z
iℏ ⟨ψ|δx ⟩⟨δx′ |ϕ⟩ dx = −iℏ ψ̄(x)ϕ(x′ ) dx = ⟨ψ|P |ϕ⟩. (A.19)
Other expressions are

Z Z Z Z
′
P = −iℏ |δx ⟩⟨δx | dx = −iℏ |δx ⟩∂x ⟨δx | dx = iℏ |δy ⟩ δy′ (z) ⟨δz | dy dz.
The first holds because shifting the derivative to the bra means that in (A.19), ψ instead of
ϕ gets differentiated, and to remedy that, we need to use integration by parts once more,
which causes the change in sign. The second one holds because ∂x δx (y) = ∂x δ(y − x) =
−δx′ (y), so differentiating the index rather than the argument of the delta function also
incurs a sign change. A similar argument verifies the third expression. This last one is
interesting, because it is a formal generalization of (A.9) to continuous bases. It expresses
P in terms of its “matrix elements”
Z Z
⟨δy |P |δz ⟩ = −iℏ δy (x)δz′ (x) dx = iℏ δy′ (x)δz (x) dx = iℏδy′ (z).
Using these formulas, the kinetic energy operator reads
P2 ℏ2 ℏ2
Z Z
=− |δx ⟩∂x2 ⟨δx | dx = |δx′ ⟩⟨δx′ | dx.
2m 2m 2m
A.1.9 More on Fourier transforms

Let’s have a closer look at the n-dimensional Fourier basis ϕk (x) = (2π)−n/2 eikx , for
k ∈ Rn , and the associated transforms
Z
−n/2
ψ̃(k) := ⟨k|ψ⟩ = (2π) e−ikx ψ(x) dn x,
Z (A.20)
ψ(x) := ⟨x|ψ⟩ = (2π)−n/2 eikx ψ̃(k) dn k.
Fourier transforms in finite regions

The Fourier basis for functions on Rn is continuous, which, as discussed above, comes
with technical difficulties. Things are much easier for spaces of functions in finite regions.
Concretely, choose some length L and consider the box B = [−L/2, L/2]n with side
length L centered at the origin. Let L2 (B) be the space of functions defined on the region
B with cyclic boundary conditions (i.e. functions take the same values on opposite faces
of the box) and with inner products given by integrals over B only:
Z
⟨ϕ|ψ⟩ = ϕ̄(x)ψ(x) dn x.
B
A plane wave eikx complies with the boundary conditions if and only if every component
ki of the wave vector is an integer multiple of 2π
L . Indeed, the discrete set of functions
1 ikx 2π n
ϕk (x) := e , k∈ Z ,
Ln/2 L
forms an ONB for L2 (B) and the formulas for the Fourier transform become
Z
1
ψ̃(k) = n/2 e−ikx ψ(x) dn x,
L B
1 X (A.21)
ψ(x) = n/2 ψ̃(k)eikx .
L 2π n
k∈ L Z
Comparison with (A.20) shows that, formally, the transition between a finite and an un-
bounded volume Fourier transform is facilitated by the substitution
Z
1 1 X
dn k ↔ (A.22)
π n/2
Rn L n/2
2π n
k∈ L Z
Note the asymmetry in (A.21): Fourier transformation takes the compact domain B to
the discrete domain 2πL Z . We can of course reverse the interpretation of the two functions
n
in (A.21). The formula then says that functions ψ(x) defined on a lattice Zn 2π L can be
expanded in terms of plane waves ϕk (x) = Ln/2 1
e−ikx with wave vectors k ∈ B. In
this context, B is sometimes called the Brillouin zone and k the crystal momentum or
quasi-momentum.
Of course, the universe isn’t actually a finite box with cyclic boundary conditions...
...but we may as well pretend it were! Physics is local, so we can assume that all phenom-
ena we are interested in take place in some box that is sufficiently large that the boundary
does not affect the predictions we extract from the theory.
Translation symmetry
Fourier transforms are intimately connected to translation symmetry. Let Ta be the trans-
lation operator that shifts functions along the vector a
(Ta ψ)(x) = ψ(x − a).
The Fourier basis diagonalizes translations:

Z
−ika
⟨x|Ta |ϕk ⟩ = e ik(x−a)
=e ⟨x|ϕk ⟩ ⇒ Ta = e−ika |ϕk ⟩⟨ϕk | dn k.
It is the unique common eigenbasis of all Ta (why?). Therefore, if A is any operator that
commutes with translations
[Ta , A] = 0 ∀ a, (A.23)
then T must be diagonal in the Fourier basis, too. Explicitly, (A.23) implies that A is fully
specified by its “first column”
f (z) := ⟨δz |A|δ0 ⟩
in the sense that
⟨δx |A|δy ⟩ = ⟨δx |ATy T−y |δy ⟩ = ⟨δx |Ty A|δ0 ⟩ = ⟨δx−y |A|δ0 ⟩ = f (x − y).
It then follows that the eigenvalues of A are proportional to the Fourier transform of f :
Z
−n/2
⟨δx |A|ϕk ⟩ = (2π) ⟨x|A|y⟩eiky dn y
Z
= eikx (2π)−n/2 f (x − y)e−ik(x−y) dn y = (2π)n/2 f˜(k) ⟨δx |ϕk ⟩
so that, summarizing,
Z
A = (2π) n/2
f˜(k) |ϕk ⟩⟨ϕk | dn k. (A.24)
Fourier transform for functions depending on space and time

Common notation and sign conventions slightly differ when one coordinate has the in-
terpretation of a time. Write x = (t, x) ∈ Rn , with t the “temporal” coordinate and
x ∈ Rn−1 the “spatial” ones. Wave vectors are denoted by k = (ω, k). To compute inner
products, we use the Minkowski form
⟨p, x⟩ = ωt − kx,
which (at least in the case of n = 4) determines the space-time metric in relativity. The
commonly used basis of plane waves is
ϕk (x) = (2π)−n/2 e−i⟨p,x⟩ = (2π)−n/2 e−iωt+kx
so that the forward and inverse Fourier transform are, respectively,

Z
ψ̃(ω, k) = (2π)−n/2 eiωt−ikx ψ(t, x) dt dn x,
Z (A.25)
−n/2
ψ(t, x) = (2π) e−iωt+ikx ψ̃(ω, k) dω dn k.
This convention extends to the case n =R 1. That is, if a function ψ depends only on
1
time, then its FT is taken to be ψ̃(ω) = 2π eiωt ψ(t)dt, whereas if the single parameter is
1
R −ikx
interpreted as a spatial coordinate or a generic parameter, then ψ̃(k) = 2π e ψ(x)dx.
Finite Fourier transform

Occasionally, we’ll come across the finite Fourier transform. It is defined for functions
ψ : ZN → C, where ZN = {0, . . . , N − 1} with arithmetic done modulo N . The
standard basis on this space is given by delta functions δx (y) = δxy so that
X
|ψ⟩ = ψ(x)|δx ⟩.
x∈ Z N
The Fourier basis is

1 X ikx 2π
|ϕk ⟩ = √ e |δx ⟩, k∈ ZN .
N x∈ZN N
The Fourier transform and its inverse thus take the form
1 X −ikx 1 X
ψ̃(k) = ⟨ϕk |ψ⟩ = √ e ψ(x), ψ(x) = ⟨δx |ψ⟩ = √ eikx ψ̃(k).
N x∈ZN N k∈ 2π Z
N N
The theory developed above can be easily translated to the finite case.
A.1.10 Functions of operators

Let
X
A= λi |ϕi ⟩⟨ϕi |
i
be the eigendecomposition of an operator. Then

X X
A2 = λi λj |ϕi ⟩ ⟨ϕi |ϕj ⟩⟨ϕj | = λ2i |ϕi ⟩⟨ϕi |
ij
| {z } i
δij
and likewise
X
Ak = · · · = λki |ϕi ⟩⟨ϕi |.
i
k
P
Thus, if p(x) = k ck x is a polynomial, then
X X
p(A) = c k Ak = p(λi )|ϕi ⟩⟨ϕi |.
k i
For an arbitrary function f : C → C, one can thus consistently define its action on
operators with an eigendecomposition as
X
f (A) := f (λi )|ϕi ⟩⟨ϕi |.
i
(This convention is sometimes called the spectral calculus).

A.1.11 Unitary operators

Unitary operators are the Hilbert space analogue of orthogonal rotations in Euclidean vec-
tor spaces: Invertible linear operators that preserve inner products. Let’s work out what
that means.
The inner product between U |ϕ⟩, U |ψ⟩ is ⟨ϕ|U † U |ψ⟩ (recall Eq. (A.10)). Thus U
preserves the inner product between any pair of operators if and only if
⟨ϕ|U † U |ψ⟩ = ⟨ϕ|ψ⟩ ∀ ϕ, ψ ∈ H.
We thus define: An operator is unitary if it is invertible and fulfills U † U = 1.

One can work out that these characterizations are equivalent:
1. U is unitary.
2. U has a spectral decomposition of the form
ϕi ∈ R,
X
U= eiϕi |ψi ⟩⟨ψi |,
i
i.e. all eigenvalues λi = eiϕi have absolute value equal to 1.

3. There is a Hermitian operator A such that U = eiA (in the sense of Sec. A.1.10).
4. U is such that U † U = 1 and U U † = 1 (in which case U is automatically invertible,
so we do not have to list this as an extra requirement).
5. If {|ei ⟩}i is an ONB, then so is {U |ei ⟩}i .
In quantum mechanics, unitary operators describe symmetries. The most important
symmetry is of course time evolution! The Hermitian operator that generates time evolu-
tion U (t) in the sense that U (t) = e−it/ℏH (as in Point 3.) is nothing but −1/ℏ times the
Hamiltonian.
A.1.12 Projections
Recall (see Fig. A.1) that in Rd with Euclidean scalar product (u, v) =
P
i ui vi , there is a
one-one relation between
• Subspaces V ⊂ Rd , and
• orthogonal projections P , i.e. linear maps fulfilling P = P t , P 2 = P .
The Hilbert space analogue works like this: An operator P is a projector (or projection)
if
1. P = P † , and
2. P 2 = P .
The first property means that P has a spectral decomposition. The second property then
implies that the eigenvalues are elements of {0, 1}. Thus,
X
P = |ψi ⟩⟨ψi |,
i
where the {|ψi ⟩} form an ONB for the subspace V ⊂ H onto which P projects.
Examples:
Figure A.1: Orthogonal projection of u onto the x-y-plane.
• For every normalized vector |ψ⟩ ∈ H, the outer product P = |ψ⟩⟨ψ| is the projec-
tion onto the one-dimensional subspace V = {z|ψ⟩ | z ∈ C}.
• Define the parity operator Π on H = L2 (R) by
Π|δx ⟩ = |δ−x ⟩, that is (Πϕ)(x) = ϕ(−x).
Then it’s easy to see that P± = 21 (1 + Π) are projection operators onto the space of
even and odd functions respectively (why?).
A.1.13 The trace

The trace of an operator is the sum over its eigenvalues. It can be expressed as
X
tr A = ⟨i|A|i⟩,
i
where the sum is over any ONB {|i⟩}i .

Some properties:
• Cyclic invariance:
X X
tr AB = ⟨i|A|j⟩⟨j|B|i⟩ = ⟨j|B|i⟩⟨i|A|j⟩ = tr BA.
ij ij
• Trace of outer products are inner products:

X X
tr |α⟩⟨β| = ⟨i|α⟩⟨β|i⟩ = ⟨β|i⟩⟨i|α⟩ = ⟨α|β⟩.
i i
A.1.14 Commuting operators

Assume that two operators A, B have a joint eigenbasis {|ψi ⟩}:
A|ψi ⟩ = ai |ψi ⟩, B|ψi ⟩ = bi |ψi ⟩.
Then
[A, B]|ψi ⟩ = (AB − BA)|ψi ⟩ = (ai bi − bi ai )|ψi ⟩ = 0 ∀ i.

so the operators commute.

Less obvious, but still true is that the converse also holds: If two operators commute,
one can construct a joint eigenbasis. In fact, the conclusion also holds for any set of
mutually commuting operators.
Warning: If two operators commute then it does not follow that every eigenbasis of
one is also an eigenbasis of the other (why?).
A.2 Some concrete systems

A.2.1 A single harmonic oscillator
Classical Hamiltonian mechanics
Let’s first retrace the solution of a harmonic oscillator
P2 1
H= + mω 2 X 2 .
2m 2
in classical mechanics. Choose problem-adapted units for length and momentum:
r r
mω 1 1
X̃ = X, P̃ = P ⇒ H = ℏω(P̃ 2 + X̃ 2 ).
ℏ mℏω 2
Wait, what’s ℏ doing in a classical calculation? Well, it’s convenient to work with dimen-
sionless quantities X̃, P̃ . But then XP/(X̃ P̃ ) is a constant having the dimension of an
action. There’s no preferred scale of action in classical mechanics – but ℏ does the job and
facilitates the later transition to QM.
Next, introduce complex coordinates
1 1
a := √ (X̃ + iP̃ ) ⇒ a† = √ (X̃ − iP̃ ),
2 2
where we use the “dagger” superscript to denote complex conjugation. These complex
coordinates may not have a direct physical interpretation, but they are easy to work with
and we can recover the original position and momentum coordinates as
√
r r r
ℏ † 2ℏ mℏω
X= (a + a ) = Re(a), P = −i (a − a† ) = 2mℏω Im(a).
2mω mω 2
The Poisson bracket {X, P } = 1 implies
1 1
{a, a† } =

− i{X̃, P̃ } + i{P̃ , X̃} = ,
2 iℏ
so the coordinate change (X, P ) → (a, a† ) is canonical up to the factor 1/(iℏ). The
Hamilton function reads in complex coordinates
1
H= ℏω(aa† + a† a) = ℏω|a|2 . (A.26)
2
and the equations of motion are (using standard properties of Poisson brackets)
∂t a = {a, H} = ℏω{a, a† a} = ℏω(a† {a, a} + {a, a† }a) = −iωa,
solved by a(t) = a(0)e−iωt .

Quantum mechanics
Now assume that the X, P are not classical phase space coordinates, but instead position
and momentum operators on L2 (R). Replacing Poisson brackets {·, ·} by commutators
1
iℏ [·, ·] and complex conjugates by Hermitian conjugates, the above derivation goes through
verbatim for the quantum case, up until Eq. (A.26). There, the fact that the a†i , ai do not
commute means that we cannot simplify 21 (a†i ai + ai a†i ) as |ai |2 . Using the commutation
relations of the ladder operators instead, the Hamiltonian becomes
1 1
H = ℏω (a† a + aa† ) = ℏω a† a + .
2 2
Momentarily switching back to the position-space representation of the operators, one can
2
easily see that there is a unique ground state |0⟩, with wave function ⟨x̃|0⟩ = π −1/4 e−x̃ /2 .
From the commutation relations of the ladder operators, it then follows that with
1
|n⟩ := √ (a† )n |0⟩,
n!
the set {|n⟩}n≥0 forms an ONB of the Hilbert space. It is indeed the eigenbasis of H:
√ √ 1
a|n⟩ = n|n − 1⟩ ⇒ a† |n⟩ = n + 1|n + 1⟩ ⇒ H|n⟩ = ℏω n + |n⟩.
2
A.2.2 Normal modes

Classical Hamiltonian mechanics
In classical mechanics, consider the Hamilton function
X P2 1X
k
H= + Vkl Xk Xl ,
2m 2
k kl
where the potential is given in terms of some coupling matrix V = (Vkl ). Without loss of
generality (why?), we can assume that V is symmetric and thus there exists an orthogonal
O that diagonalizes V :
(OOt )kl = δkl , (OV Ot )kl = δkl λk .

q
Instead of with the eigenvalues λk directly, we’ll work with ωk := λmk . (The energy of
the system is bounded below only if all λk ≥ 0. Only this case is physically interesting,
so don’t worry about imaginary ωk ). The rows of O form an ortho-normal basis called the
normal modes of the interaction. Expressing position and momentum in this basis gives
the normal coordinates
X X X X
ϕk = Okl Xl , πk = Okl Pl ⇒ Xl = Okl ϕk , Pl = Okl πk .
l l k k
This transformation is canonical:

X
{ϕk , πl } = Oki Ojl {Xi , Pj } = (OOt )kl = δkl , {πk , πl } = {ϕk , ϕl } = 0.
ij
| {z }
δij
Plugging in, the Hamiltonian decouples

X 1 1
H= πk2 + mωk2 ϕ2k .
2m 2
k
Each summand can now be treated as in Sec. A.2.1:

r r
mωk 1 X
ak := ϕk + i πk ⇒ H= ℏωk |ak |2 ,
2ℏ 2mℏωk
k
solved by ak (t) = ak (0)e−iωk t . The transformation back to the original coordinates reads
r
ℏ X 1
Xl (t) = √ (ak e−iωk t Okl + a†k eiωk t Okl ),
m 2ω k
k
√
r
X ωk
Pl (t) = −i mℏ (ak e−iωk t Okl − a†k eiωk t Okl ).
2
k
In words, the configuration X(t) of the particles is a linear combination of the normal
modes, with the coefficients oscillating with the eigenfrequencies ωk .
Some remarks:
• Even though V is real, it is sometimes convenient to choose the normal modes to be

a set of complex eigenvectors. The most important example are translation-invariant
couplings V , which are diagonal by suitable complex exponentials (Sec. A.1.9). The
calculations above can be easily adapted to this case.
• With a bit more effort, one can see that any Hamiltonian that is (i) a quadratic ex-
pression in position and momenta and (ii) has a lower bound on the ground state
energy can be diagonalized in a related way by a canonical change of coordinates.
Check out Williamson’s Theorem for details.
Quantum mechanics
As was done in the n = 1 case of Sec. A.2.1, assume now that the Xi , Pi are position and
momentum operators on L2 (Rn ). Then, again, the above applies verbatim to the quantum
case, except that the Hamiltonian reads
1
ℏωk a†k ak +
X
H= .
2
k
In terms of normal coordinates

r
mωk
ϕk |x̃⟩ = x̃k |x̃⟩,
ℏ
the ground state wave function is
n
Y 2 2
⟨x̃|0⟩ = π −1/4 e−x̃i /2 = (4π)−n/4 e−∥x̃∥ /2
i=1
and the eigenbasis arises from laddering (don’t confuse the quantum numbers ni with the
number of n of degrees of freedom)
Y 1 X 1
|n1 , . . . ⟩ = √ (a†i )ni |0⟩ ⇒ H|n1 , . . . ⟩ = ℏωk nk + . (A.27)
i
ni ! 2
k
In principle, it is possible to work out the wave function ⟨x|n1 , . . . ⟩ in terms of the original
coordinates. But this gets ugly pretty quickly, so one usually tries to extract physical
predictions without having to go there.
A.2.3 Central potentials

For some constant κ, consider the Hamiltonian
P2 κ
H= − .
2m ∥X∥
Because the Hamiltonian is rotationally invariant, we can find joint eigenvectors
H|E, l, m⟩ = E|E, l, m⟩,
L2 |E, l, m⟩ = ℏ2 l(l + 1)|E, l, m⟩,
Lz |E, l, m⟩ = ℏm|E, l, m⟩.
Negative energy solutions correspond to bound states. These energies are quantized:
EI mκ2
En = − , n ∈ N , EI = (ionization energy).
n2 2ℏ2
The angular momentum quantum number l takes integer values between 0 and n − 1 (as
always, the magnetic quantum number m is an integer between −l and l). In spherical
coordinates, the eigenfunctions are of the form
r ℏ2
⟨x|n, l, m⟩ = Ylm (θ, ϕ) e− a0 n ynl (r/a0 ), a0 = (Bohr radius)
mκ
where Ylm (θ, ϕ) are the spherical harmonics, and ynl a polynomial of degree n − 1 that
one can explicitly work out in terms of generalized Laguerre polynomials. (Though round-
about nobody is thrilled by the prospect of “working things out in terms of generalized
Laguerre polynomials” – and fortunately, one can often invoke more elegant arguments so
that one doesn’t have to).
Hydrogen atom, fine structure constant

The Hamiltonian for the electron of a hydrogen atom corresponds to m = me (electron
e2
mass) and κ = 4πϵ 0r
(Coulomb repulsion). For atomic problems, it makes sense to express
quantities in the natural units that can be formed by combining the constants ℏ, m, c, and
the fine structure constant
e2 1
α= ≃ .
4πϵ0 ℏc 137
In particular, the natural scales are mc2 for energy; mc for momentum; mc
ℏ
for length; and
ℏ
mc2 for time.
ℏ 1
Then the factor in front of the potential term reads κ = ℏcα, and we get a0 = mc α
2
(Bohr radius), and EI = mc2 α2 (ionization energy of hydrogen).
A.2.4 Fermionic oscillator

Let a be such that a, a† fulfill the canonical anti-commutation relations (CAR)
[a, a† ]+ = 1, [a, a]+ = 2a2 = 0, [a† , a† ]+ = 2(a† )2 = 0. (A.28)
The Fermionic number operator N = a† a is a projection:
N 2 = a† aa† a = a† (1 − a† a)a = N − (a† )2 a2 = N.
It follows that the Fermnionic occupation numbers can only be 0 and 1. Likewise,
aa† = 1 − a† a = 1 − N (A.29)
is the complementary projection.

As in the Bosonic case, the Hamiltonian H = ℏωN gives rise to the Heisenberg-picture
time evolution a(t) = e−iωt a(0). To see this, plug the expression into the Heisenberg
equation of motion iℏ∂t a(t) = [a(t), H] and use
[a, a† a]− = aa† a − a† aa = a(a† a) = a(1 − aa† ) = a.
Using (A.29)
1 = (a† a − aa† ),
ℏω ℏω
H ′ := H −
2 2
so the right hand side generates the same time evolution.
The simplest representation of the Fermionic oscillator is on H = C2 , with

† 1 0 0 1 0 1
a = (σx + iσy ) = ⇒ a = (σx − iσy ) = .
2 1 0 2 0 0
Then the occupation number operator and the occupation number basis is

1 0 0 1
N= , |0⟩ = , |1⟩ =
0 0 1 0
and the Hamiltonian

ℏω ℏω 1 0
H′ = σz = .
2 2 0 −1
A.3 Perturbation theory

A.3.1 Fermi’s golden rule
Here’s the problem we want to solve. Consider a quantum system that starts in some
initial state |ψ(t = 0)⟩ = |i⟩. Choose a projection operator PF onto a set of final states
orthogonal to the initial state PF |i⟩ = 0. The goal is to estimate the probability
Pi→F (t) = ⟨ψ(t)|PF |ψ(t)⟩
of finding the system in one of the final states when measured at time t.
We consider the situation where the Hamiltonian is of the form H = H0 + λV , where

H0 is sufficiently simple that the stationary Schrödinger equation can be solved
H0 |f ⟩ = Ef |f ⟩,
and where λV is a “small” perturbation.

As is standard in perturbation theory, we assume (without much in the way of proof)
that one can expand
∞
X
|ψ(t)⟩ = λs |ψs (t)⟩
s=0
as a power series in λ and that low orders give meaningful answers. Separating the
Schrödinger equation
X X
iℏ∂t λs |ψs ⟩ = (H0 + λV ) λs |ψs ⟩
s s
by degrees of λ gives
iℏ∂t |ψ0 ⟩ = H0 |ψ0 ⟩ 0th order

iℏ∂t |ψ1 ⟩ = H0 |ψ1 ⟩ + V |ψ0 ⟩ 1st order
.. ..
. .
With initial condition |ψ(t = 0)⟩ = |i⟩, the zeroth-order equation is solved by
t
|ψ0 (t)⟩ = e iℏ Ei |i⟩.
Plugging this into the first-order one and projecting onto an eigenstate |f ⟩ gives
t
iℏ∂t ⟨f |ψ1 (t)⟩ = Ef ⟨f |ψ1 (t)⟩ + e iℏ Ei ⟨f |V |i⟩
which is solved by
1
1 − e iℏ (Ei −Ef )t 1 Ef t
⟨f |ψ1 (t)⟩ = ⟨f |V |i⟩ e iℏ for Ef ̸= Ei , (A.30)
Ef − Ei
t 1
⟨f |ψ1 (t)⟩ = ⟨f |V |i⟩ e iℏ Ef t for Ef = Ei , f ̸= i. (A.31)
iℏ
Using L’Hôspital’s rule, one verifies that (A.30) tends to (A.31) for Ef → Ei . In this
sense, it suffices to work with (A.30) alone. Its square is
sin2 ((Ei − Ef ) 2ℏ
t
)
|⟨f |ψ(t)⟩|2 = 4|⟨f |V |i⟩|2 (i ̸= f ).
(Ei − Ef )2
With ϵ = (Ef − Ei ), τ = 2ℏ t
, the fraction is sin2 (ϵτ )/ϵ2 , the square of the “sinc
function” (Fig. A.2). It has a central peak of height τ , zeroes at ϵ = ± πτ , and shows
oscillations of quadratically decreasing amplitude for ϵ → ±∞. It is known (by the
Dirichlet integral) that the area under the curve is τ π. Therefore, the family of functions
1
fτ (ϵ) := πτ sin2 (ϵτ )/ϵ2 , converges to a δ-function centered at 0 as τ → ∞.
Qualitatively, we can now describe which parameters enter the probability Pi→F (t).
By the above, only states |f ⟩ with energy Ef in the range Ei ± 2πℏ t pick up significant
Figure A.2: Squared sinc function sin2 (ϵτ )/ϵ2 . x axis in units of τ , y axis in units of τ1 .
weight. For such states, the modulus squared is proportional to t and the squared coupling
coefficient |⟨f |V |i⟩|2 .
To get a more quantitative statement, let ρ(f ) be a measure such that
Z
PF = |f ⟩⟨f |ρ(f ) df.
F
In other words, ρ(f ) is the “density of states”, in the sense of Sec. A.1.7. Then
2 t

2 sin (Ei − Ef ) 2ℏ
Z Z
2
⟨ψ(t)|PF |ψ(t)⟩ = |⟨f |ψ(t)⟩| ρ(f ) df = 4|⟨f |V |i⟩| ρ(f ) df.
F F (Ei − Ef )2
Given the discussion above, optimistically,

Z
2π
⟨ψ(t)|PF |ψ(t)⟩ ≃ t |⟨f |V |i⟩|2 δ(Ei − Ef )ρ(f ) df =: t Γ. (A.32)
ℏ F
Let’s suspend disbelief for a while and take (A.32) at face value. It is called Fermi’s Golden
Rule: The probability Pi→F (t) increases linearly, with slope Γ proportional to the squared
coupling and the density of states, integrated over all final states with the right energy.
The “≃”–step in (A.32) involved quite the leap of faith. The squared-sinc-construction
gives a delta function only in the limit of large times, but first-order perturbation theory is
valid, at most, at short times. It’s unclear whether there’s an intermediate regime where
both approximations simultaneously hold. Also, if the spectrum is discrete, the density of
states ρ(f ) is itself a sum of delta functions (Sec. A.1.7), so that the integral has no obvi-
ous meaning. The cleanest (but not only) way around this issue is to restrict attention to
energies Ei that lie in the continuous part of the spectrum of H0 . This frequently involves
letting the “quantization region” L3 go to ∞ (c.f. Sec. A.1.9). One could analyze the con-
ditions for (A.32) to hold more carefully – but this is rarely done in practice. Experience
has shown that the “golden” rule gives the right answer more often than one could have
hoped, hence the moniker.
Appendix B
Miscellaneous Integrals
B.1 Gaussian and Fresnel integrals

Starting point is the famous formula due to Gauss
Z ∞
2 √
e−x dx = π,
−∞
which can be obtained by evaluating its square in polar coordinates.

From there, we one finds the general form
Z ∞ r
2 π β2 +γ
e−αx +βx+γ dx = e 4α (B.1)
−∞ α
which holds for all complex α, β, γ such that the integral converges: either Re[α] > 0;
or Re[α] = 0 and Re[β] = 0 (though in the latter case, the integralp is not absolutely
convergent, so it should be handled with care). In the formula, π/α is the principal
square root, defined to be the unique root with argument in (−π, π]. For real α, β, γ’s, the
above can be proven by completing the square and using the substitution rule. For complex
coefficients, one has to use a suitable contour integration. The special case α = ∓i and
β = γ = 0 is the complex asymptotic Fresnel integral
Z ∞
2 √
e±ix dx = πe±iπ/4 . (B.2)
−∞
The Gaussian integral (B.1) is taken over the entire real real line x → ±∞, but in
fact,
pis already close to its asymptotic value if the limits of the integral are large compared
to |α|. This is obvious if α has a large real part (because the absolute value of the
2
integrand is decaying with e− Re αx ). Imaginary parts of α also aid convergence, but for
a more subtle reason: They cause the integrand to oscillate rapidly for large arguments, so
that its contributions to the integral tend to cancel.
To visualize this effect, consider the non-asymptotic real Fresnel integrals
Z x Z x
C(x) := cos(t2 ) dt, S(x) := sin(t2 ) dt.
0 0
Separating real and imaginary parts in (B.2) gives
r
π
lim C(x) = lim S(x) = . (B.3)
x→∞ x→∞ 8
Their convergence is shown in (Fig. B.1).
122
APPENDIX B. MISCELLANEOUS INTEGRALS 123
Figure B.1: The Fresnel integrals C(x)p (orange) and S(s) (blue). The integral quickly
converges towards its asymptotic value π/8 (black line), with contributions of larger
arguments canceling to the oscillating behavior of the integrand.
B.2 Some Fourier transforms

Rotationally invariant functions
Let V (x) = V (∥x∥) be a rotationally-invariant function in R3 . Its Fourier transform
Ṽ (k) is computed most easily in the coordinate system (r, µ = cos θ, ϕ) where (r, θ, ϕ)
are spherical coordinates with polar vector parallel to k. The volume element is
r2 sin θ dr dθ dϕ = r2 dr dµ dϕ
so that the Fourier transform

Z
Ṽ (k) = (2π)−3/2 e−ikx V (r) d3 x
Z ∞ Z 1
= (2π)−1/2 dr r2 V (r) dµ e−ikrµ
0 −1
∞ 1
e−ikrµ
Z
−1/2 2
= (2π) dr r V (r)
0 −ikr −1
Z ∞
i
dr rV (r) e−ikr − eikr

= 1/2
(B.4)
(2π) k 0
21 ∞
r Z
= rV (r) sin(kr) dr (B.5)
πk 0
reduces to a one-dimensional integral.

Appendix C
Function spaces and distributions
In this chapter, we take a more pedantic look at the function spaces that occur in QM. For
simplicity of presentation, we’ll mainly restrict attention to the one-dimensional case.
C.1 Square-integrable functions

What mathematical properties should a “wave function” ψ : R → C for a particle in one
dimension have?
First, according to the Born interpretation, p(x) := |ψ(x)|2 is the probability density
describing theR distribution of position measurement outcomes. For this interpretation to
make sense, |ψ(x)|2 dx must equal 1.
Next, physical predictions only depend on integrals involving ψ. Integrals stay the
same if the value of the integrand is changed on a set of measure zero. Therefore, two
functions that agree almost everywhere (i.e. everywhere except on a set of measure zero)
define the same physical state and should therefore be identified. For any function ψ :
R → C, write [ψ] for the set of functions that agree with ψ almost everywhere.
These two conditions suggest that wave functions should belong to the space
Z
L (R) = [ψ] ψ : R → C, |ψ(x)| dx < ∞
2 2
of equivalence classes of square-integrable functions.1 This is indeed the standard choice.

In practice, the identification of functions agreeing almost everywhere is usually left
implicit. That is, L2 (R) is called “the space of square-integrable functions” instead of the
more precise “space of equivalence classes of square-integrable functions”, and one writes
ψ ∈ L2 (R) as a short-hand for [ψ] ∈ L2 (R). We will also follow this convention.
The Cauchy-Schwarz inequality says that
Z Z 1/2 Z 1/2
ψ(x)∗ ϕ(x) dx ≤ ψ(x)∗ ψ(x) dx ϕ(x)∗ ϕ(x) dx , (C.1)
so that
Z
⟨ψ|ϕ⟩ := ψ(x)∗ ϕ(x) dx
1 See any textbook on analysis, e.g. Folland’s Modern analysis, Chapter 2 for more details on integration
theory. Just two comments on terminology: (1) All integrals in the theory of R function spaces are to be understood
in the sense of Lebesgue. (2) A function f is integrable if the integral f exists and is finite. (So, counter-
intuitively, “f is integrable” and “the integral of f exists” are different statements!)
124
APPENDIX C. FUNCTION SPACES AND DISTRIBUTIONS 125
is well-defined as a sesquilinear form L2 (R) → C.
Remarks on the use of equivalence classes

Identifying functions that lead to the same physical predictions makes sense. Let’s re-
iterate, though, that consequently elements of L2 (R) aren’t strictly speaking functions,
but rather equivalence classes of functions. In particular, “the value ψ(x)” of an element
[ψ] ∈ L2 (R) at a point x is not a well-defined concept! This might be surprising, be-
cause in practice, we work with point-wise values ψ(x) all the time. We get away with
this because either: (1) We use ψ(x) in a context (e.g. under an integral) where it does not
matter which representative of the equivalence class has been chosen. Or, (2), there is an
(implicit) convention that fixes a representative. For example, it is easy to see that every
equivalence class contains at most one continuous function (Fig. ??). Thus, if we agree to
use continuous representatives whenever possible, there is no ambiguity for such classes.
The identification also makes the mathematical theory cleaner. For example, for a
function ψ : R → C, the integral ∥[ψ]∥2 := |ψ(x)|2 dx vanishes if and only if ψ is
R
supported on a set of measure zero, i.e. iff [ψ] = [0]. The implication ∥[ψ]∥ = 0 ⇒
[ψ] = 0 is part of the mathematical definition of a norm. It is frequently invoked in
physics arguments: For example, in the algebraic treatment of the harmonic oscillator,
one typically shows that ∥a|0⟩∥ = 0 and concludes that a|0⟩ = 0, i.e. that the attempt to
construct a negative-energy eigenstate by laddering leads to the 0 function.
C.1.1 Why go beyond L2 ?

The choice of L2 (Rn ) as the space of wave functions was physically well-motivated. But
it turns out that for the purpose of doing some calculations, it is “too small”, while for
others, “too large”.
Too small: L2 (R) does not contain the eigenfunctions of some important operators.
The eigenfunctions of the momentum operator are plane waves, which have norm ∞, and
therefore do not belong to L2 . The eigenfunctions of the position operator are supported
only on one single point. As elements of L2 , they are therefore equivalent to the function
that is identically 0.
Too large: L2 (R) contains elements for which important operators are undefined. For
example, elements of L2 (R) can have discontinuities, in which case the action of the
momentum operator is not well-defined. For an example involving the position operator,
take the function
1
ψ(x) = √ . (C.2)
π(x + i)
Then
Z Z
1 1 1 ∞
|ψ(x)|2 dx = dx = [arctan(x)]−∞ = 1,
π x2 + 1 π
so ψ ∈ L2 (R). But (by comparison with a 1/x dx = ∞), one can easily see that the
R∞
integral ⟨ψ|X k |ψ⟩ = x2x+1 dx is infinite for even k ∈ N and undefined for odd k ∈ N.
R k
In particular, ∥Xψ∥2 = ⟨ψ|X 2 |ψ⟩ = ∞, implying that Xψ ̸∈ L2 (R).

Figure C.1: Rigged Hilbert spaces are “rigged” in the sense of “fully equipped” (like
Imperator Furiosa’s War Rig, pictured above), not in the sense of “manipulated with the
goal to deceive”, like a loaded die. (OK, maayybe I was just looking for an excuse to
include that picture in my lecture notes).
Discussion
Do these issues mean that L2 (R) is not an appropriate mathematical model for the space
of wave functions? Arguably not!
For the eigenfunction examples, note that infinitely extended or infinitely concentrated
states are unphysical, so we cannot complain that the space L2 (R), designed to model
physical wave functions, does not contain them.
Now let’s look at the function ψ defined in (C.2). The fact that Xψ ̸∈ L2 (R) does not
mean that position measurements aren’t well-defined. To the contrary, p(x) = |ψ(x)|2 =
1
π(x2 +1) is a perfectly good probability density describing position measurement outcomes.
It’s just that none of the moments ⟨X k ⟩ (including the expectation value, k = 1) exist and
are finite. But nobody ever promised us that all probability distributions can be character-
ized via moments, so there is no fundamental issue with this. Likewise, any ψ ∈ L2 (R),
even if it exhibits discontinuities, has a Fourier transform ψ̃, and thus a probability density
p(ℏk) = |ψ̃(k)|2 over momentum measurement outcomes.
However, the discussion does suggest that for the purpose of doing calculations, it
would be good to identify a “sandwich of spaces”
Φ ⊂ L2 (R) ⊂ Φ′ , (C.3)
where Φ is “sufficiently small” that all relevant operators are well-defined on it, and Φ′ is
“large enough” that it contains a complete set of eigenvectors for all relevant operators.
As we’ll see, the spaces Φ and Φ′ are usually constructed together. Elements of Φ are
called test functions and those of Φ′ distributions. Constellations as in (C.3) are studied as
Gelfand triples or rigged Hilbert spaces (Fig. C.1)).
Which spaces of functions are the best choice for Φ, Φ′ depends on the problem one
wants to solve. An important set for quantum mechanics is Schwartz space (after Laurent
Schwartz, not to be confused with Hermann Schwarz of Cauchy-Schwarz-inequality fame)
for Φ and the associated space of tempered distributions for Φ′ . We’ll look at this case next,
and briefly sketch the general theory in Sec. C.3.
C.2 Distributions
C.2.1 Schwartz space
The most important set of test functions Φ in QM is Schwartz space S, the “smooth func-
tions whose derivatives vanish rapidly”:
n o
S = ϕ ∈ C ∞ (Rn ) ∀α, β ∈ N0 : sup |xα ∂xβ ϕ(x)| < ∞ . (C.4)
x
The condition ϕ ∈ C ∞ (R) means that elements of Schwartz space are infinitely diff-
entiable; while the second condition says that ϕ and its derivatives vanish faster than any
polynomial function as |x| → ∞. It follows that S is invariant under P and X. It is also
easy to see that any square-integrable function can be arbitrarily-well approximated by
Schwartz-class functions, i.e. for every ψ ∈ L2 (R) and every ϵ > 0, there exists a ϕ ∈ S
such that ∥ψ − ϕ∥ ≤ ϵ. (Technically: S is dense in L2 (R) w.r.t. norm topology).
This already solves half of our problems: Because well-behaved functions are dense,
there is little loss of generality in assuming that any wave function of physical interest lies
in S. One can then apply X and P without any issue.
C.2.2 Tempered distributions

Constructing the space that contains the generalized eigenvectors requires us to to take a
little detour: We will first have to study linear functionals S → C.
A function u : R → C is locally integrable if for any compact set K ⊂ R,
Z
|u(x)| dx < ∞.
K
R1
For example, continuous functions are locally integrable, whereas 1/x isn’t (e.g. 0 | x1 | dx =
∞). Now, for any locally integrable function u that grows at most polynomially as |x| →
∞, and for any l ∈ N0 , define a functional TDl u : S → C by
Z
TDl u (ϕ) := u(x)(−∂x )l ϕ(x) dx. (C.5)
(The notation Dl u will be explained below). Then TDl u is well-defined as a linear func-
tional S → C. That’s because ϕ ∈ S implies that ∂xl ϕ ∈ S as well; local integrability of u
and continuity of ∂xl ϕ implies that the integrand is locally integrable; and finally fact that
∂xl ϕ vanishes faster than any polynomial, together with the matching growth restriction on
u, means that the integral remains finite as |x| → ∞. A functional of this form is called a
tempered distribution, and the space of all tempered distributions is denoted by S ′ .
In contrast, note that TDl u is rarely well-defined as a functional on L2 (R). For one,
elements ψ ∈ L2 (R) aren’t in general differentiable, and even if they are, they generally
vanish too slowly for the integral to converge. So we see that S, on account of being
smaller than L2 (R), allows for a larger set of linear functionals! Recall that we’re out to
find a set larger than L2 (R), so this seems like a promising direction to explore. Let’s look
at some examples.
Plane waves: Because eikx is not normalizable Teikx defines a linear functional on
Schwartz space, but not on L2 (R).
Delta functional: Let θ(x) be the step function that is 0 for x < 0 and 1 for x ≥ 0.
Then, using integration by parts,
Z Z ∞
TDθ(x) (ϕ) = − θ(x)∂x ϕ(x) dx = − ∂x ϕ(x) dx = ϕ(0). (C.6)
0
The operation only makes sense for functions ϕ that are differentiable at 0 – so certainly
for elements of S, but not necessarily elements of L2 (R).
“Bra vectors”: For every ψ ∈ L2 (R), the “bra” ϕ 7→ ⟨ψ|ϕ⟩ = Tψ∗ defines a tempered
distribution. (Indeed, every square-integrable function is also locally integrable. That’s an
easy consequence of the Cauchy-Schwarz inequality).
The principal value is important in the theory of partial differential equations, where
one often wants to associate a distribution with the function u(x) = x1 in some way
(c.f. Chap. D). Unfortunately, x1 is not locally integrable, and indeed, ϕ(x)
R
x dx does not
in general exist. But as we’ll see, the principal value
Z
1 ϕ(x)
pv (ϕ) := lim+ dx (C.7)
x ϵ→0 R\(−ϵ,ϵ) x
is finite for all ϕ ∈ S and, what is more, is given by the tempered distribution TD log |x| (ϕ).
To see that this makes sense, we first need to convince ourselves that log |x|, even though
it diverges as x → 0, is locally integrable. This follows from the fact that the anti-
derivative of log |x| is F (x) = x log |x| − x + C, which remains finite at the singularity:
limx→0 F (x) = C. Therefore, TD log |x| is indeed a tempered distribution. It remains to
be shown that it evaluates to the principal value:
Z
TD log |x| (ϕ) = − log |x| ϕ′ (x) dx
Z −ϵ Z ∞
= lim+ − log(−x) ϕ′ (x) dx − log x ϕ′ (x) dx
ϵ→0 −∞ ϵ
Z −ϵ Z ∞
ϕ(x) ϕ(x)
= lim+ dx − ϕ(−ϵ) log ϵ + dx + ϕ(ϵ) log ϵ
ϵ→0 −∞ x ϵ x

1
= pv (ϕ) + lim log(ϵ)(ϕ(ϵ) − ϕ(−ϵ))
x ϵ→0+

1 ′ 1
= pv (ϕ) + 2ϕ (0) lim+ ϵ log(ϵ) = pv (ϕ).
x ϵ→0 x
| {z }
=0
Powers of 1/r in higher dimensions: In contrast to the previous example, u(x) =

∥x∥−k is locally integrable as a function on Rn , as long as n > k. This can be seen by
switching to n-dimensional spherical coordinates, where the volume element is propor-
tional to rn−1 , which lifts the singularity at 0. The definition (C.5) is easily adapted to
higher dimensions, and integrating against such a u thus defines a tempered distribution.
Unsurprisingly, the case k = 1, n = 3 is important due to its relation to the Coulomb and
the gravitational potential.
Regular distributions
Distributions of the form Tu (i.e. those that can be expressed without differentiating the
argument before integrating) are called regular. For regular distributions, it is common
to use the same symbol for both the distribution S → C and for the function R→C
defining it:
Z
T (ϕ) = T (x)ϕ(x) dx. (C.8)
You might complain that such an overloading of notation is not a nice thing to do. And
you’d be right. But things are about to get worse. Such a convention is even used for
non-regular distributions!
Consider e.g. the delta distribution δ(ϕ) = ϕ(0) discussed above. It is not regular.
(Because a hypothetical function giving rise to it would have to be zero everywhere except
at x = 0 – but an integral over a function supported on only one point is zero). But, in
analogy to (C.8), one still writes
Z
δ(ϕ) = δ(x)ϕ(x) dx.
The r.h.s. is not an integral and δ(x) not a function – the entire r.h.s. is to be read as an
elaborate notation for δ(ϕ). Whether this convention is genius (because it allows practi-
tioners to work with distributions without having to learn the abstract theory) or horrific
(because the one job of mathematics is to be rigorous and not to pretend that objects exist
when in fact they don’t) is a question that may be controversially debated.
C.2.3 Operations on distributions

Our goal is still to find generalized eigenvectors for X and P . These will turn out to be
distributions. For that to even make sense, we have to define what it means for an operator
to act on distributions.
Let A be any operator that maps S to S. There is a unique operator At , the transpose
of A, such that, for ϕ, ψ ∈ S,
Z Z
(Aψ)(x)ϕ(x) dx = ψ(x)(At ϕ)(x) dx.
(This is the bilinear analogue of the definition of the adjoint for sesquilinear inner prod-
ucts). It directly follows that for regular distributions with u ∈ S, TAu (ϕ) = Tu (At ϕ).
Using the notation in (C.8), this means
(AT )(ϕ) = T (At ϕ). (C.9)
We take Eq. (C.9) as the general definition for the action of an operator on distributions.
In words: Operations on distributions are defined by shifting them onto the argument.
Derivatives of distributions
The most important application is the differentiation operator (Dϕ)(x) = ∂x ϕ(x). By
partial integration, Dt = −D from which we get
Z
(DTu )(ϕ) = u(x)(−∂xl )ϕ(x) dx = TDu (ϕ)
and, more generally, Dl Tu = TDl u (which justifies the notation Dl u, as promised).

With these conventions in place, we can explain the notion of “derivative in the sense
of distribution” that you will likely have come across before. Take for example the step
function θ. Seen as a function R → C, it is not differentiable, due to the discontinuity at 0.

But the distribution Tθ does have a derivative: DTθ = δ, as computed in (C.6). Identifying
θ with Tθ , this fact is often expressed as “∂x θ(x) = δ(x) in the sense of distribution”. Note
that every locally integrable function is infinitely differentiable in the sense of distribution
(by virtue of the elements of the test function space S having this property).
As an application, let’s derive a famous identity that expresses the principal value in
terms of a “side limit of a deformed version of 1/x”, namely

1 1
lim+ = pv ∓ iπδ. (C.10)
ϵ→0 x ± iϵ x
First, recall that the principal complex logarithm
p x
Log(x + iy) = log x2 + y 2 + i arctan
y
is an analytic continuation of the logarithm to the complex numbers, except for a branch
cut on the negative real axis (Fig. ??). It follows that
lim Log(x ± iϵ) = log |x| ∓ iπθ(x)

ϵ→0+
which immediately implies (C.10) by differentiating both sides in the sense of distribution.
Generalized eigenvectors
We say that a distribution T is a generalized eigenvector of an operator A : S → S if
A T = λ T.
Plane waves are therefore eigenvectors of the differentiation operator D or the momen-
tum operator P = −iD:
D Teikx = T∂x eikx = Tikeikx = ik Teikx , P Teikx = k Teikx .
Likewise, if δa = ∂x θ(x − a) : ϕ 7→ ϕ(a) is the delta distribution at a ∈ R, then
(X δa )(ϕ) = δa (Xϕ) = aϕ(a) = a δa (π) ⇒ X δa = a δa .
So, with all these preparations in the bag, it was pretty easy to identify the generalized
eigenvectors!
Fourier transforms of tempered distributions

Because the Fourier transform exchanges X and P , the characterization (C.4) of S, and
hence the space itself, is invariant under Fourier transforms. Applying the general scheme
(C.9), the Fourier transform of a tempered distribution T is (FT ) : ϕ 7→ T (F t ϕ).
To get more explicit formulas, first note that F t = F:
Z Z Z Z
1 −ikx
(Fψ)(k)ϕ(k) dk = √ e ψ(x)ϕ(k) dx dk = ψ(x)(Fϕ)(x) dx.
2π
Thus, using the shorthand “tilde notation” for the Fourier transform,
T̃ (ϕ) = T (ϕ̃).
Delta distribution. For δ, compute

Z
1
δ̃(ϕ) = δ(ϕ̃) = ϕ̃(0) = √ ϕ(x) dx = T √1 (ϕ), (C.11)
2π 2π
that is, the FT of δ is a regular distribution, arising from the constant function
1
δ̃(k) = √ . (C.12)
2π
One could be tempted to use the following formal calculation to arrive at the same conclu-
sion:
Z
1 1
δ̂(k) = “ √ e−ikx δ(x) dx” = √ .
2π 2π
But, unlike, (C.11), this is not a rigorous argument given our development of the theory so
far! That’s because we have defined δ(ϕ) only for ϕ ∈ S, but e−ikx is most definitely not
an element of Schwartz space. The integral is therefore only heuristically defined. One
can sometimes make sense of products of distributions – but the issue is subtle and we will
not pursue it here.
Constant functions. The constant function 1(x) = 1Rdoes not have a Fourier trans-
form in the ordinary sense. For one, the integral (2π)−1 dx that would define 1̃(0) is
infinite. However, because T1 defines a tempered distribution, it does have a FT. Slightly
abusing language once again, we call it the FT of 1 (in the sense of distribution).
We can find it by expressing F −1 in terms of F and applying it to (C.12). To this end,
let Π be the parity operator, which mirrors functions about the origin: (Πϕ)(x) = ϕ(−x).
Then it is easy to see that F † = ΠF t and hence unitary of F implies
(ΠF)F = F † F = 1 ⇒ F −1 = ΠF. (C.13)
Applying this to (C.12) gives

√
F (1) = 2πδ.
The principal value. From an easy contour integration, the FT of 1/(x + iϵ) is
1
Z
1 √ √
√ e−ikx dx = −i 2πe−ϵk θ(k) → −i 2πθ(k) (ϵ → 0+ ).
2π x + iϵ
Using (C.10), we then find that the FT of the principal value is a regular distribution:

1
F pv(1/x) (k) = lim+ F (k) + iπF(δ)(k)
ϵ→0 x + iϵ
r r
π π
=i (−2θ(k) + 1) = −i sign(k). (C.14)
2 2
Combining this result with (C.13) gives further transforms of common distributions:
r
2
(F sign) (k) = i pv(1/k) (C.15)
π
1 1
(Fθ) (k) = F (sign +1) (k) = i √ (pv(1/k) − iπδ) . (C.16)
2 2π
Coulomb and Yukawa potentials. Up to constants, the Coulomb potential is u(x) =

1
− ∥x∥ in R3 . Just like the constant function treated above, it does not have an ordinary
Fourier transform. For example, ũ(0) would be given by
Z Z
1 3
−(2π)−3/2 d x = −2(2π)−1/2 r dr = −∞. (C.17)
∥x∥
But as discussed in Sec. C.2.2, u(x) defines a regular distribution whose Fourier transform
turns out to be regular again, given by
r
2 1
ũ(k) = − . (C.18)
π ∥k∥2
Here’s how to find (C.18).Express
Z Z −s∥k∥
ϕ̃(k) 3 e
Tu (ϕ̃) = − d k = − lim ϕ̃(k) d3 k. (C.19)
∥k∥ s→0 + ∥k∥
as a limit of integrals involving the “regularizing” factor e−s∥k∥ .
This is valid, because the integral, interpreted as a function of s ∈ [0, ∞), is con-
tinuous at 0. In fact, it is even differentiable:
Z −s∥k∥ Z
e
−∂s |0 ϕ̃(k) d3 k = ϕ̃(k) d3 k, (C.20)
∥k∥
which is finite for ϕ̃ ∈ S. (Note that the same regularization does not work for the
integral in (C.17), which formally corresponds to the case ϕ̃(k) = 1. Of course,
the constant function is not an element of Schwartz space, and indeed, this choice
would cause (C.20) to diverge).
Plugging in the definition of the FT and exchanging integrals,

Z Z −s∥k∥
e
Tu (ϕ̃) = lim −(2π)−3/2 e−ik·x ϕ(x)d3 k d3 x.
s→0 ∥k∥
The expression in parentheses is the FT of
1 −s∥x∥
V (x) = − e
∥x∥
which, up to constants, is known as the Yukawa potential. Its Fourier transform follows
from the general formula (B.4) for rotationally-invariant functions in terms of k = ∥k∥:
Z ∞
i
e−sr−ikr − e−sr+ikr dr.

Ṽ (k) = 1/2
(2π) k 0
The one-dimensional integral can immediately be solved as
r(−s−ik) ∞ r(−s+ik) ∞
e e 1 1 2ik
− + =− + =− 2 .
−s − ik 0 −s + ik 0 −s − ik −s + ik s + k2
Collecting constants, we get the FT of the Yukawa potential, which gives (C.18) as s → 0:
r
2 1
Ṽ (k) = − . (C.21)
π s + k2
2
Figure C.2: Sequence continuity is equivalent to the more familiar “ϵ-δ-definition” of

continuity for functions f : R → R. It is more general, though, and can also be applied to
spaces whose topologies do not derive from a distance measure.
C.3 Topological aspects, more pedantry, and generalizations

Our definition of tempered distributions in Eq. (C.5) was constructive: We showed how
to build distributions concretely given a function u and derivatives ∂xl . The mathematical
theory is usually formulated axiomatically. Distributions are defined indirectly, as linear
functionals on test function spaces, subject to some abstract properties. These properties
are phrased in the language of point set topology. In this section, we briefly introduce this
more abstract point of view.
Topological spaces
Consider a set X. A topology on X is a rule that allows us to decide when a sequence
xk : N → X converges to an element x ∈ X.
As a first example, assume that X is a vector space equipped with a norm ∥ · ∥. This
covers an extremely wide range of spaces, from X = R, the real numbers with norm
∥x∥ = |x| the absolute value, to X = L2 (R) with norm ∥x∥ = ⟨x|x⟩ derived from the
inner product. We say that a sequence xk converges in norm topology to x,
xk → x, if lim ∥xk − x∥ = 0. (C.22)

k→∞
We’ll use these concepts to give very general definitions of continuity and complete-
ness.
Continuity
A function f between two topological spaces is continuous if it maps convergent sequences
to convergent sequences (Fig. C.2), i.e. if
xk → x ⇒ f (xk ) → f (x).
Example (important!): If ψ ∈ L2 (R), then the linear functional ⟨ψ| is continuous.

The proof reduces to the Cauchy-Schwarz inequality. If limk→∞ ∥ϕk − ϕ∥ = 0, then
1/2
⟨ψ|ϕk ⟩ − ⟨ψ|ϕ⟩ = ⟨ψ|(|ϕk ⟩ − |ϕ⟩) ≤ ∥ψ∥1/2 ϕk − ϕ →0 (k → ∞).
Completeness and Hilbert spaces

Now for completeness. A sequence xk is Cauchy if “its elements eventually become arbi-
trarily close” in the sense that
∀ ϵ > 0, ∃ n ∈ N such that ∀ k, l > n, it holds that ∥xk − xl ∥ ≤ ϵ.
A space X is complete if every Cauchy sequence converges to an element√of X.

As an example of a Cauchy sequence, let xk be the approximation of 2 to k decimal
√ are not complete: There is no q ∈ Q
places. This example shows that the rational numbers
such that xk → q (for then q would have to equal 2, which, famously, is not rational).
In the mathematical literature, a Hilbert space is defined as a
• complex vector space,

• with a sesquilinear inner product ⟨·|·⟩,
• that is complete with respect to the norm derived from the inner product.
The final condition is often glossed over in physics presentations. It is important, though.
For one, it means that series like
∞ K
X 1 k
X 1
|ψ(t)⟩ = (itH) |ψ(0)⟩ := lim (itH)k |ψ(0)⟩,
k! K→∞ k!
k=0 k=0
used ubiquitously, are actually well-defined. Another reason is that the equivalence of
“kets” and “bras” requires this property: The set of continuous linear functionals on a
Hilbert space H is denoted by H′ . If |ψ⟩ ∈ H, then we’ve shown above that ⟨ψ| is
continuous, i.e. an element of H′ . The Riesz representation theorem says that the converse
is also true: Every continuous linear functional of a Hilbert space is given by some “bra
vector”.
One can show that L2 (R) is complete, i.e. actually a Hilbert space.
Contrast this with Schwartz space S. It, too is a complex vector space with the same
sesquilinear inner product as L2 (R). But it is not √
complete in norm topology and hence no
Hilbert space. The argument works just like the 2-example above. Because S is dense
in L2 (R), for every ψ ∈ L2 (R), there exists a sequence ϕk : N → S converging to ψ in
norm. Thus, if ψ ̸∈ S, the sequence ϕk has no limit point in S.
Topology on Schwartz space

Return to Schwartz space S. Because it is a subspace of L2 (R), we can use the norm
topology also for S. However, there’s a second, important, topology on that space. For
α, β ∈ N0 , define the (semi)-norms
∥ϕ∥α,β := sup |X α ∂β ψ(x)|.

x∈R
A sequence ϕk : N → S converges with respect to this family of semi-norms,
ϕk → ϕ, if lim ∥ϕ − ϕk ∥α,β = 0 ∀α, β ∈ N0 . (C.23)

S k→∞
To avoid confusion, we’ll write ϕk →2 ϕ if we mean convergence with respect to the

L
Hilbert space norm. It is easy to see that ϕk → ϕ implies ϕk →2 ϕ, but not the other way
S L
round. One says that the topology (C.23) is finer than norm topology.
There’s a non-trivial regularity theorem (Reed-Simon, Thm. V.10), which states that
the constructive definition (C.5) of tempered distribution characterizes exactly the space
of linear functionals on Schwartz space that is continuous in the sense of (C.23).
Generalizations
The topological formulation above is the basis of generalizations. The common recipe
is to choose a test function space Φ (often norm-dense in L2 (R)), endow it with a finer
topology, and then consider the continuous dual Φ′ .
The most important choice is to take Φ to be the space of bump functions Cc∞ (R):
smooth functions with compact support. “Compact support” means that these functions
are identically zero for |x| large enough. (It is not obvious that one can define functions
that transition smoothly from being identically zero in some region to being non-zero in
other regions, but such functions do exist). In the context of distributions, the space of
bump functions is usually denoted by D.
Recall that Schwartz functions vanish faster than any polynomial, and thus integrals
against locally integrable functions u(x) that grow at most polynomially are finite. Be-
cause bump functions vanish identically for large x, integrals against any locally integrable
u are well-defined. This suggests, correctly, that the space of distributions D′ is even larger
than the space of tempered distributions S ′ .
The structure of D′ is somewhat more complicated than was the case for S ′ . We will
not discuss it here, but, for completeness, give the topology from which it derives. It is
defined in terms of the (semi-)norms
∥ϕ∥K,α := sup |∂xα ϕ(x)|

x∈K
indexed by compact subsets K ⊂ R and a number n ∈ N0 , in the same way as (C.23):
ϕk → ϕ if lim ∥ϕ − ϕk ∥K,α = 0 ∀α, K.

D k→∞
Because elements of D are smooth, distributions in D′ are arbitrarily often differen-

tiable. However, the space D is easily seen not be be invariant under Fourier transforms,
so the Fourier transform is not defined on D′ . This is the reason the space plays a less
prominent role in quantum theory.
Terminology
The word “distributions” used without qualification is most likely to refer to D′ , but can
also mean a general continuous dual space Φ′ , and may also refer to tempered distributions
S ′ , depending on context. Making matters worse, S is always called “Schwartz space”,
but the name “Schwartz” is also associated with the general mathematical theory of dis-
tributions and in particular also with D′ . “Tempered distributions” always means S ′ , at
least.
Lastly, if a science professor answers an inquiry about a questionable derivation by
claiming that it is to be understood “in the sense of distribution”, they likely mean neither
D′ nor Φ′ nor S ′ . Instead, they are probably vaguely aware of the fact that what they are
doing isn’t quite rigorous, but are optimistic that a smart mathematician could figure it
out, and in any case, want to get through their lecture with their dignity intact and have
found that “distribution” is a fully general incantation that reliably suppresses follow-up
questions.
Needless to say, I would never engage in such tactics.
Appendix D
Green’s functions
D.1 Introduction
In this chapter, we are interested in the affine equation
Lu = f (D.1)
for u, given L and f . We’ll restrict attention to the most important special case, where L
is a translation-invariant differential operator on Rn .
Example: The damped harmonic oscillator. Newton’s equation for the position
u(t) of a particle subject to a driving force mf (t), viscous damping coefficient
(mγ)/2, and undamped eigenfrequency ω0 is
∂t2 + 2γ∂t + ω02 u = f.

The problem is to find u(t) given f (t) and the boundary condition u(−∞) = 0.
Now here’s the basic idea: Formally, f (x) = f (x′ )δ(x − x′ ) dx′ is a superposition
R
of “delta impulses”. Thus, if we could work out how the system reacts to a delta im-
puls, we should be able to solve the general problem by linearity. Exploiting translational
invariance, we may even get away treating just the case of f (x) = δ(x).
This indeed works out. Assume we can find a G such that
LG = δ. (D.2)
Then defining u to be a superposition of shifted solutions G(x − x′ ), weighted by f (x),

Z
u(x) = G(x − x′ )f (x′ ) dn x′ . (D.3)
we get a solution of Lu = f , as anticipated:

Z Z
(Lu)(x) = (LG)(x − x )f (x ) d x = δ(x − x′ )f (x′ ) dn x′ = f (x).
′ ′ n ′
Some terminology: G is the Green’s function of L.1 The expression (D.3) for u is
known as the convolution G ⋆ f of G and f .
1 Yes, it’s “the Green’s function of L”, not “the Green function of L” as would be more in line with the
standard naming convention in mathematics (or, I guess, English grammar).
136
APPENDIX D. GREEN’S FUNCTIONS 137
Technically, (D.2) should be interpreted in the language of Chapter C. One assumes

(often implicitly) that f is an element of some space Φ of test functions. Then a Green’s
function G is a distribution in Φ′ and LG = δ is to be understood in the sense of distribu-
tion.
If h is a solution to the homogeneous equation Lh = 0, then G is a Green’s function
if and only if G + h is one. Thus G is unique if and only if L is invertible, in which
case G(x − x′ ) = ⟨x|L−1 |x′ ⟩ gives the matrix element of the inverse. Else, one can find
an entire affine space worth of Green’s functions (just as there’s an entire affine space of
solutions to Lu = f ). Typically, physically motivated boundary conditions are used to
select a particular one.
Elementary examples
The simplest example is L = ∂t . The general solution to Lu = f is, of course, the integral
Z t
u(t) = f (t′ ) dt′ .
a
Because ∂t is not invertible on the space of all differentiable functions, there is an ambigu-
ity in the solution, represented by a. Fixing a amounts to choosing the boundary condition
u(a) = 0. In these notes, we only treat the translationally-invariant theory, so we will
restrict attention to the choices a = ±∞. For a = −∞,
Z t Z
u(t) = f (t′ ) dt′ = θ(t − t′ )f (t′ ) dt′ .
−∞
+
The Green’s function G of ∂t under the boundary condition u(−∞) = 0 is therefore the
step function θ. And indeed, by Sec. C.2.3, ∂t θ(t) = δ(t) holds in the sense of distribution,
so that the step function fulfills (D.2).
An analogous calculation for a = +∞ leads to G− (t) = −θ(−t). Because the solu-
tions u(t) constructed using G+ only depend on the u(t′ ) for t′ ≤ t, one usually calls G+
a retarded Green’s function and, likewise, G− an advanced Green’s function. Their differ-
ence h(t) := G+ (t) − G− (t) = 1 is a solution of the homogeneous equation ∂t h(t) = 0,
consistent with our reasoning above.
For another example, take a quick look at L = ∂t2 . Because δ(t)t = 0 as distributions,
∂t2 (tθ(t)) = ∂t tδ(t) + θ(t) = ∂t θ(t) = δ(t),

which shows that G+ (t)tθ(t) is a (retarded) Green’s function for ∂t2 . Other solutions are
1 − 1 + 1
G− (t) = −tθ(−t), G= G + G = |x|.
2 2 2
D.2 Green’s functions from Fourier transforms

Fourier transforms turn derivatives into multiplications. Thus, for every differential oper-
ator L with constant coefficients, the equation Lu = f is equivalent to an equation
P (k)ũ(k) = f˜(k) (D.4)
involving a polynomial multiplication operator P (k) and the Fourier transforms of the
functions. Formally, (D.4) is trivial to solve:
f˜(k) f˜(k) n
Z
ũ(k) = ⇒ u(x) = (2π)−n/2 eikx d k. (D.5)
P (k) P (k)
Likewise, the Fourier transform of the defining equation LG = δ for Green’s function is
P G̃ = (2π)−n/2 1, (D.6)
which has the formal solution

Z
1 1
G̃(k) = (2π)−n/2 ⇒ G(x) = (2π)−n eikx dn k. (D.7)
P (k) P (k)
The trouble is, of course, that if P has real zeros, the integrals in Eqs. (D.5, D.7) might
not exist. In the next sections, we’ll go through a variety of methods for anyway extracting
solutions by modifying these equations.
The problem of characterizing all G̃ ∈ Φ′ that satisfy (D.6) is known as the “prob-
lem of division” in the theory of distributions. In the univariate case, n = 1, it is
fairly easy to solve (we’ll introduce all necessary ingredients in Sec. D.2.4). Essen-
tially, this case is simple because univariate polynomials have finitely many roots. If
P has continuous sets of zeros, the problem can become very complicated, though.
D.2.1 Direct integration

No real zeros
The easiest case is the one where P (k) has no real roots. Then L is invertible (at least as
long as one assumes that u has a Fourier transform). Therefore, there is a unique Green’s
function. It is given by (D.7), which is absolutely integrable for all x.
As an example, let’s treat the damped harmonic oscillator with strictly positive viscous
damping coefficient, γ > 0. It corresponds to
L = ∂t2 + 2γ∂t + ω02 ⇒ P = −ω 2 − 2iγω + ω02 .
The polynomial factorizes as

q
P = −(ω − ω+ )(ω − ω− ), ω± = iγ ± ω02 − γ 2 .
Thus (employing the sign convention for the FT of time variables, as in (A.25)),
e−iωt
Z
1
G(t) = − dω.
2π (ω − ω+ )(ω − ω− )
A simple exercise in contour integration gives

1 √ 2 2 1 √ 2 2
G(t) = iθ(t)e−γt − ei ω0 −γ t + e−i ω0 −γ t .
ω+ − ω− ω+ − ω−
Since we’re done integrating in Fourier space, we can re-use the letter ω, defining it to be
p
|ω02 − γ 2 |. Then the above may be simplified (using l’Hôpital for the equality case) to

sin(ωt) ω0 > γ
e−γt  −γt
G(t) = θ(t) e t ω0 = γ . (D.8)
ω 
sinh(ωt) ω0 < γ
Locally integrable case

Even if P (k) does have real roots, (D.7) might be locally integrable and thus well-defined
as a distribution
P(c.f. Sec. C.2.2). The most important example is the Laplace operator
3
L = −∆ = − i=1 ∂xi . Then P (k) = ∥k∥2 , and, by (C.18),

3/2 −1 1 1 1
G(x) = (2π) F (x) = −
∥k∥2 4π ∥x∥
is a Green’s function. (As the homogeneous equation −∆h = 0 has plenty of solutions, G
is far from unique).
D.2.2 Complex integration

Univariate case
Given a univariate polynomial P (ω), choose a deformation γ of the real axis in the com-
plex plane that avoids the zeros of P and consider the complex integral
Z −iωt
1 e
Gγ (t) := dω. (D.9)
2π γ P (ω)
The good news is that the integral now exists. The bad news is that it is not clear any more
that it has anything to do with Green’s functions. But actually, it does! Applying L,
e−iωt
Z Z Z
1 1 −iωt 1
LGγ (t) = P (ω) dω = e dω = e−iωt dω = δ(t).
2π γ P (ω) 2π γ 2π
The cool trick that makes this calculation work is that after multiplying with P (ω), the
integrand is an entire function, so we can move the integration path right back to the real
line without changing the value of the integral.
For the same reason, two paths γ, γ ′ will lead to the same Green’s function if they can
be deformed into each other without crossing a pole. In general, however, Gγ does depend
on the choice of γ.
“Infinitesimal” deformations
There’s a variant of this constructing that can be interpreted as “shifting the roots of P
away from the real axis” instead of “deforming the integration path to avoid the roots”.
Write the complex ω’s on the path γ as ω = u + iv(u). Then
Z −iωt
e−iut
Z
1 e 1
dω = ev(u)t du.
2π γ P (ω) 2π P (u + iv(u))
If there are no zeros between γ and the real line, the integral does not change under the
substitution v(u) 7→ ϵv(u) for 1 ≤ ϵ < 0. In particular, by continuity of the exponential,
e−iut e−iut
Z Z
1 1
lim+ eϵv(u)t du = lim+ du.
2π ϵ→0 P (u + iϵv(u)) 2π ϵ→0 P (u + iϵv(u))
In the common special case where v(u) is constant, the limit is typically written as
e−iωt
Z
1
G± (t) = dω, (D.10)
2π P (ω ± i0)
with the sign depending on sgn v. (This construction should remind you of the formula
(C.10), expressing the principal value as the “side limit” 1/(x ± i0)). Of course, ω 7→
P (ω ± iϵ) is a polynomial whose roots are shifted by ∓iϵ compared to the ones of P .
Multivariate case
We can reduce the problem for arbitrary n to the n = 1-case. While the technique works
in general, it is particularly natural if one of the variables is distinguished in some way. In
physics applications, this is typically the time. With this in mind, we’ll use x = (t, x) for
the arguments of u, f, G, and k = (ω, k) for the arguments of Fourier transforms.
Define pk (ω) = P (ω, k). Then pk is a univariate polynomial, and we can just repeat
the n = 1-construction from above, but in a k-dependent way. That is to say, choose
deformations γk avoiding the zeros of pk and define
e−iωt
Z Z
Gγ (x) := dω eikx dn−1 k. (D.11)
γk pk (ω)
The proof that Gγ is a Green’s function works exactly as in the univariate case.
Example: The Klein-Gordon equation

P3
The Klein-Gordon equation L = ∂t2 − i=1 ∂x2i + m2 corresponds to the polynomial
3
X p
P = −ω 2 + ki2 + m2 = −(ω − ωk )(ω + ωk ), ωk = m2 + ∥k∥2 . (D.12)
i=1
There are three natural choices for deformations γk avoiding the poles. The simplest ones
are γk± : straight lines parallel to the real axis with imaginary parts ±ϵ. From the discussion
above, the integral does not depend on the value of ϵ > 0. The third contour is γkF , which
moves around −ωk in the lower half-plane and around +ωk in the upper half-plane (the
superscript “F ” is for Feynman; see Fig. ??).
Let’s look at G+ (x) = Gγ + (x). The frequency integral can be evaluated exactly as in
the damped harmonic oscillator example above, leading to
e−iωk t − eiωk t
2πiθ(t)
2ωk
so that the full integral is
d3 k
Z
1
G+ (x) = iθ(t) e−iωk t − e+iωk t e−ikx

.
2ωk (2π)3
The k-integral can be expressed in terms of Bessel functions (but the result isn’t pretty).
In any case, the θ term means that the convolutions u = G+ ⋆ f only depend on f (t, x)
for t ≤ 0. We have thus constructed a retarded Green’s function.
D.2.3 Using the resolvent

Given a linear operator L, the function that maps complex numbers z to
R(z; L) := (L − z 1)−1
(if the inverse exists) is called the resolvent. It turns out that one can learn a lot about an
operator by studying its resolvent, and the concept is a central tool in functional analysis.
From the discussion in Sec. D.1, it is clear that if L is invertible, its Green’s function is
G(x) = ⟨0|R(0; L)|x⟩.

In the general case R(0; L) does not exist, but one might hope that suitable limits of
G(z; x) = ⟨0|R(z; 0)|x⟩
as z → 0 make sense as a distribution and can be used to construct Green’s functions.

This is indeed the case (and some books approach the subject of Green’s functions entirely
through this perspective, e.g. Economou’s Green’s functions in quantum physics). Here,
we will just give one example.
Example: Klein-Gordon revisited

Let’s look again at the Klein-Gordon equation and consider
Z
1 1 1
G(iϵ; x) = ⟨0| |x⟩ = eikx d4 x.
L − iϵ (2π)2 P (ω) − iϵ
In complete analogy to (D.12),
p
P − iϵ = −(ω − ωk,ϵ )(ω + ωk,ϵ ), ωk,ϵ = m2 + ∥k∥2 − iϵ,
where, for definiteness, we take ωk,ϵ to be the principal square root, i.e. the one with
positive real part. Then ωk,ϵ sits below the positive real axis and −ωkϵ above the negative
real axis. By comparison with Fig. ??,
lim G(iϵ; x) = Gγ F (x)

ϵ→0+
is Feynman’s Green’s function for the Klein-Gordon equation.
D.2.4 Using the principal value

One can generalize the principal value (Sec. C.2.2) to find Green’s functions. We’ll out-
line how one can use this approach to construct all Green’s functions for a given one-
dimensional problem.
Start with L = ∂t . A Green’s function is a solution to
i
ω G̃(ω) = √ . (D.13)
2π
Thus we’re looking for distributions G̃ “proportional to 1/ω”. That was exactly our moti-
vation for introducing the principal value in Eq. (C.7). And indeed,
Z Z
1 ˜ 1 ˜ ˜ 1
ω pv (f ) = lim ω f (ω) dω = 1 f (ω) dω ⇒ ω pv = 1.
ω ϵ→0 +
|ω|≥ϵ ω ω
Using that ωδ(ω) = 0, an affine space of solutions to (D.13) is given by

i 1
G̃λ (ω) = √ pv + λδ(ω),
2π ω
and one can show that these are all. An inverse Fourier transform gives
1 λ
Gλ (t) = sign(t) + √
2 2π
Choosing λ = ± π2 , we recover the retarded/advanced Green’s functions G± found

p
before in Sec. D.1.
Let’s now sketch how to generalize this construction to solve P G̃ = (2π)−1/2 for a
general polynomial P .
First assume that all real roots of P are simple, i.e. that
d
Y
P (ω) = Q(ω) (ω − al )
l=1
where Q is a polynomial with only complex roots and the al are distinct real numbers.
Define
Z ˜
1 f (ω)
G̃(f˜) = √ P dω,
2π P (ω)
R
where the symbol P denotes the principal value integral that is computed by approach-
ing each of the singularities al symmetrically, the same way integration around 0 is han-
dled in pv(1/ω). The proof that G̃ is indeed a Green’s function then works as the one for
pv(1/ω) given above. The ambiguity in defining G̃ corresponds to adding multiples of
delta distributions supported on the real zeros of P :
l
X
G̃ 7→ G̃ + λl δal .
i=1
It remains to treat roots with higher multiplicity. Here, we only discuss the case
P (ω) = −ω 2 ; the general case works similarly. The trick is to write
1 1
− = ∂ω .
ω2 ω
But we already know how to associate a distribution with 1/ω (the principal value) and
how to differentiate distributions (by differentiating minus the test function). Indeed, with
1
G̃ = √ D pv(1/ω)
2π
we get
Z
2 ˜ 1 1
(−ω )G̃(f ) = √ P (−∂ω )(−ω 2 ϕ(ω)) dω
2π ω
Z
1 1
=√ P (2ωϕ(ω) + ω 2 ∂ω ϕ(ω)) dω
2π ω
Z Z Z
1 1
=√ 2 ϕ(ω) dω − ϕ(ω) dω = √ ϕ(ω) dω.
2π 2π
One may verify that the ambiguity is given by
G̃ 7→ G̃ + λ1 δ + λ2 δ ′ .

aqm-23

Uploaded by

aqm-23

Uploaded by

Advanced Quantum Mechanics

1 Multi-partite quantum systems 5

2.6 Bose gas: Take 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 Field quantization and quantum theory of light 63

4 Relativistic quantum mechanics 73

5 Symmetries in Quantum Mechanics 97

A Quantum mechanics recap 98

A.2.2 Normal modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

B Miscellaneous Integrals 122

C Function spaces and distributions 124

D Green’s functions 136

• With every quantum system, one associates a Hilbert space H.

Prψ [i] = |⟨ϕi |ψ⟩|2 = tr(|ψ⟩⟨ψ|ϕi ⟩⟨ϕi |)

iℏ∂t |ψ(t)⟩ = H|ψ(t)⟩.

H|n, l, m⟩ = En |n, l, m⟩,

Multi-partite quantum systems

1.1 Mixed states

Here’s the proof. Equation (1.1) implies the normalization property

and the positivity property

The density operator ρ is Hermitian

von Neumann entropy. Density matrices allow us to define a quantum-mechanical no-

1.1.1 Visualizing mixed states: The Bloch ball

Non-uniqueness of ensemble decompositions. From Fig. 1.1, it is geometrically obvi-

1.1.2 Time evolution of density operators

Differentiating with respect to t:

iℏ∂t ρ = [H, ρ]. (1.6)

1.1.3 Dynamics of a noisy spin

or ∂t a = γa × B. In particular, for B = Bez , with ω := γB the Lamor frequency,

We say the system experiences of phase kick by eiϕ .

Quantum computers rely on interference effects. Therefore, a system can serve as

• General states are described by density operators, Hermitian operators

1.2 Multi-partite Hilbert space

1.2.1 Tensor product Hilbert spaces

Pr[ai and bj ] = |⟨α|ei ⟩|2 |⟨β|fi ⟩|2 . (1.8)

|α, β⟩ = |αβ⟩ = |α⟩|β⟩ = |α⟩ ⊗ |β⟩,

and, if the bases referenced are (hopefully) clear from context:

|ei , fj ⟩ = |i, j⟩.

A|α, β⟩ = (A|α⟩)|β⟩, B|α, β⟩ = |α⟩(B|β⟩).

C (1) |α, β⟩ = (C|α⟩)|β⟩, C (2) |α, β⟩ = |α⟩(C|β⟩).

C (1) D(2) = C ⊗ D, C (1) = C ⊗ 1, C (2) = 1 ⊗ C.

|α⟩⟨γ| ⊗ |β⟩⟨δ| = |αβ⟩ ⟨γδ|. (1.10)

1.2.2 The partial trace

p(2) (x1 , x2 ) = Pr[X1 = x1 and X2 = x2 ].

The result p(1) is called the marginal distribution associated with X1 .

one can extend tr2 linearly to all operators:

Using this expression:

We have found that

Pure product states. For pure product states, we find

Entropy of entanglement Let |Ψ⟩ ∈ H1 ⊗ H2 . If |Ψ⟩ = |αβ⟩ is a product state, then

• The global Hilbert space of particles with individual Hilbert spaces H1 , H2

• The restriction of a global density operator ρ(12) to one subsystem is given

1.3 Dynamics of coupled systems

1.3.1 The measurement and the classicality problem

Why is the macroscopic world classical?

1.3.2 A quantum model for measurements

|ψS (t = 0)⟩ = (α|↑⟩ + β|↓⟩)|k = 0⟩.

= α|↑⟩|k = δt⟩ + β|↓⟩|k = −δt⟩. (1.13)

The marginal distribution for the spin variable alone is

|ψI (t)⟩ = α|↑⟩|δt⟩|,⟩ + β|↓⟩| − δt⟩|/⟩, (1.14)

with a similar interpretation if now the experimentalist’s state gets measured.

so that, if we take |ψS (0)⟩ = |ψ0 ⟩,

|ψI (t)⟩ = α|↑⟩|ψδt ⟩ + β|↓⟩|ψ−δt ⟩.

S(t → ∞) = −|α|2 log |α|2 − |β|2 log |β|2 .

• Q: Are these these processes irreversible?

1.4 Quantum many-body systems as computers

log Z = log tr e−βH .

davidg@repos:˜$ sudo grep davidg /etc/shadow

1.4.1 Grover’s algorithm

The gate model

H|0⟩ = 2−1/2 (|0⟩ + |1⟩),

(x, y) 7→ (x, y ⊕ f (x)) x ∈ {0, 1}×n , y ∈ {0, 1}.

Uf : |x, y⟩ 7→ |x, y ⊕ f (x)⟩

which can indeed be realized by a time-dependent Hamiltonian running in time propor-

Uf (H ⊗n ⊗ 1) |0, ..., 0⟩ = 2−n/2

Vf : |x⟩ 7→ (−1)f (x) |x⟩

Vδ : |x⟩ 7→ (−1)δ(x) |x⟩,

The same issue