aqm-23
aqm-23
— Incomplete Notes —
DAVID G ROSS
Institute for Theoretical Physics
University of Cologne
M AY 5, 2024
Contents
Contents 1
2 Indistinguishable particles 36
2.1 Bosonic and Fermionic Hilbert spaces . . . . . . . . . . . . . . . . . . . 36
2.1.1 Permutations and occupation numbers . . . . . . . . . . . . . . . 37
2.1.2 Single-particle operators . . . . . . . . . . . . . . . . . . . . . . 40
2.1.3 The exchange interaction . . . . . . . . . . . . . . . . . . . . . . 41
2.2 Second quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.1 Fock space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.2 Creation and annihilation operators . . . . . . . . . . . . . . . . 44
2.2.3 Single- and two-particle operators . . . . . . . . . . . . . . . . . 47
2.3 Quasiparticles and collective excitations . . . . . . . . . . . . . . . . . . 51
2.3.1 Phonons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2 Global phase gauge symmetry and particle number conservation . 53
2.4 Bose gas: Take 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.1 Approximate solution part 1 . . . . . . . . . . . . . . . . . . . . 55
2.5 Detour: Spontaneous symmetry breaking . . . . . . . . . . . . . . . . . 55
2.5.1 Ferromagnetism . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5.2 SSB and Bose-Einstein condensation . . . . . . . . . . . . . . . 57
1
CONTENTS 2
This symbol indicates that you may skip forward without missing much.
CONTENTS 4
Warm-up
To warm up, let’s recall the most basic notations from undergraduate quantum mechanics.
For more details, consult Sec. A of the Appendix.
where the {|ϕi ⟩}i are an ortho-normal basis for H and the λi ∈ R the eigenvalues
of A. The possible numerical outcomes of a measurement process are then the λi ,
the i-th one occurring with probability
if the system is in a state described by |ψ⟩. If one repeats the measurement many
times, the average of the observed outcomes will then tend to the expectation value
X X
⟨A⟩ψ = λi Prψ [i] = λi |⟨ϕi |ψ⟩|2 = ⟨ψ|A|ψ⟩ = tr(|ψ⟩⟨ψ|A).
i i
• Also, every system is associated with a distinguished Hermitian operator, the Hamil-
tonian H. It serves two roles:
– It is the observable describing energy measurements.
– It determines the time evolution of the system via Schrödinger’s equation
• We usually choose a preferred basis for every Hilbert space H, ideally with a clear
physical interpretation. If the Hamiltonian is non-degenerate, the eigenbasis of H is
a natural choice. In this case, saying that the system is in an eigenstate with given
energy completely specifies the state vector.
– Example: The harmonic oscillator, with |n⟩ defined by
1
H|n⟩ = ℏω(n + )|n⟩.
2
If the Hamiltonian is degenerate, it is natural to add additional observables commut-
ing with H until their common eigenbasis is unique.
– Example: The bound states of the hydrogen atom, for which |n, l, m⟩ is defined
by
Imagine a process that prepares the state |ψj ⟩ with probability qj . The probabilities
could e.g. reflect fluctuations of control fields, see below. The collection of states |ψj ⟩ and
probabilities qj is called an ensemble. We do not require that the states |ψj ⟩ be orthogonal
to each other.
If we measure an observable A on this ensemble, the expected value will be
X X
qj tr |ψj ⟩⟨ψj |A = tr qj |ψj ⟩⟨ψj | A .
j j
Thus, the statistics of the experiment are described by replacing the projection |ψ⟩⟨ψ| with
the more general density operator
X
ρ := qj |ψj ⟩⟨ψj | (1.1)
j
so that ⟨A⟩ = tr(ρA). The density operator ρ has the following properties:
1. It is Hermitian ρ† = ρ,
2. Its eigenvalues form a probability distribution (which is equal to the qj if and only
if the states |ψj ⟩ are orthogonal).
Conversely, every operator with these two properties can be realized by an ensemble as in
(1.1).
5
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 6
If ρ is a density operator with only one non-zero eigenvalue, then ρ = |ψ⟩⟨ψ|. In this
case, we say that ρ describes a pure state. Otherwise, the state is mixed.
Example: Canonical ensemble. Consider a classical system where the i-th microstate
has energy Ei . Then, in the Gibbs ensemble, we expect to find the i-th state with probabil-
ity
1 −Ei /(kT ) X
pi = e , Z= e−Ei /(kT ) .
Z i
Here, k is P
the Boltzmann constant, T the temperature, and Z the partition function. Now
let H = i Ei |Ei ⟩⟨Ei | be a quantum-mechanical Hamiltonian. The quantum Gibbs
ensemble is, by definition, the one described by the density operator
1 X 1
ρ= pi |Ei ⟩⟨Ei | = e−H/(kT ) , Z = tr e−H/(kT ) .
Z i Z
Thus, ρ is the operator that is diagonal in the eigenbasis of the Hamiltonian and has the
classical canonical probabilities as eigenvalues. Convince yourself: ρ is pure if and only if
T = 0 and there is a unique ground state.
If ρ is a density operator, then the von Neumann entropy H(ρ) is defined as the Shannon
entropy of its eigenvalues. In addition to its central role in statistical physics, von Neumann
entropy can also be used to quantify entanglement, as we will see later.
Figure 1.1: Up to a factor of ℏ2 , the i-th component ai = tr ρσi of the Bloch vector
is the expectation values of the angular momentum along the ei -axis. The length of the
Bloch vector encodes the “purity” of the state. Take an ensemble decomposition ρ =
(j)
P
j j j ⟩⟨ψj | of a density operator ρ. If a
q |ψ is the Bloch vector of the j-th state, then
the Bloch representation of ρ is the convex combination a = j qj a(j) .
P
(this is just saying that the Pauli matrices form a basis of the linear space of matrices). One
directly sees that the matrix is Hermitian iff the ai are real and has trace equal to one iff
a0 = 1. Thus density operators are of the form
1
ρ= (1 + a · σ), (1.5)
2
where a ∈ R3 is the Bloch vector. The eigenvalues of ρ are non-negative iff the Bloch
vector lies in the unit ball of R3 ; it lies on the unit sphere exactly if ρ is pure.
To see this, use (1.4) to compute det ρ = 41 (1−∥a∥2 ). Because tr ρ = 1, the eigen-
values are of the form λ, (1 − λ). The determinant is the product of the eigenvalues,
so that
1 1
λ(1 − λ) = (1 − ∥a∥2 ) ⇔ λ= (1 ± ∥a∥).
4 2
The maximally mixed state. The center point of the Bloch ball seems special. From
(1.5), it corresponds to ρ = 21 1. For a d-dimensional Hilbert space, ρ = d1 1 is called the
maximally mixed state. It has eigenvalues (1/d, . . . , 1/d) and thus entropy log d, which
is the highest one can get in d dimensions. In statistical physics language, the maximally
mixed state is thus the Gibbs state for T → ∞.
for any ONB {|ψj ⟩}j . (This is just the completeness relation for the basis). What seems
like a geometric curiosity at this point is in fact fundamental for a number of uniquely
quantum phenomena, in particular to quantum steering. We’ll come back to this point
later.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 8
It is called so, because it is the quantum analogue of the classical Liouville equation ∂t ρ =
{H, ρ}, which governs the time evolution of a probability density ρ on phase space. Up to
a sign, the quantum Liouville equation is the same as the Heisenberg picture time evolution
for observables (why?):
t t
A(t) = e− iℏ H Ae iℏ H , iℏ∂t A = [A, H].
γℏ
H=− B·σ
2
and the Bloch ball description of the state ρ = 1
2( 1 + a · σ) into the Liouville equation
gives
3
1 γℏ X γℏ X
iℏ ∂t a(t) · σ = [H, ρ] = − [Bi σi , aj σj ] = −i ϵijk Bi aj σk ,
2 4 ij=1 2
ijk
Thus, the main diagonal of the density matrix (corresponding to the spin component paral-
lel to the field) remains constant, while the off-diagonal (corresponding to the spin compo-
nents orthogonal to the field) picks up a complex phase factor oscillating with the Lamor
frequency.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 9
Dephasing of a spin
So far, we have just re-packaged undergrad calculations in new language. Let’s go further,
by treating a noisy time evolution of a spin-1/2 system in a magnetic field.
Assume that during the time period t ∈ [0, T ], the field strength is not B, but B + ∆B.
Then the Lamor frequency adapts accordingly, so that the phase factor picked up by the
upper-right term of the density matrix during the time interval changes as
eiT ω 7→ eiT ω eiϕ , ϕ = γT ∆B.
Figure 1.2: Left panel: Decoherence time measurement on a spin qubit operated at the
Research Center Jülich and RWTH Aachen with support of the project Matter and Light
for Quantum Computing. From [Struck et al., Low-frequency spin qubit energy splitting
noise in highly purified 28 Si/SiGe, npj Quantum Information (2020). Right panel: The
trajectory of a dephasing spin in the Bloch ball.
It is instructive to work out how the Liouville equation has to be modified to take
dephasing into account. Noting that one can write the projection of the density
matrix onto its off-diagonal as
1
(ρ − σz ρσz† ),
2
it is easy to verify that ρ(t) satisfies the differential equation
i λ
σz ρσz† − ρ .
∂t ρ(t) = − [H, ρ] +
ℏ 2
Such differential equations that describe the time evolution of noisy quantum sys-
tems are called quantum master equations.
Summary
Goals
We will introduce tensor product Hilbert spaces, and argue why this is the right
space for multiple distinguishable particles. We’ll have to spend a lot of time on
notation (boring, but necessary) and have a first look at entanglement.
it makes physical sense to prepare the first particle in the state |α⟩, the second one in the
state |β⟩, and perform the measurements A and B. We also demand that in this case, the
outcome probabilities are independent:
We now construct the Hilbert space H12 associated with the combined system. The
above implies that H12 contains vectors associated with the outcomes ai , bj . Let’s call
them |ei , fj ⟩. Because they correspond to different outcomes of an observable, they have to
be orthogonal. The Hilbert space must also contain a vector associated with the preparation
procedure, let’s call it |α, β⟩. The independence condition (1.8) is fulfilled if, for
X X
|α⟩ = αi |ei ⟩, |β⟩ = αj |fj ⟩,
i j
we define
X
|α, β⟩ = αi βj |ei , fj ⟩ (1.9)
ij
(One can show that this is essentially the only way to satisfy independence.) The resulting
Hilbert space
nX o
H12 = ψij |ei , fj ⟩ | ψij ∈ C ,
ij
together with the rule (1.9), is called the tensor product space H1 ⊗ H2 .
States that describe independent preparations of the particles, i.e. those of the form
given in (1.9), are called product states. Alternative notations:
This defines A, B on all of H12 , because the product vectors |ei , fj ⟩ form a basis.
Notation and conventions If not clear from context, the system on which an operator
acts is explicitly specified
There’s also the “tensor product of operators” notation (sometimes called the Kronecker
product, in particular in computer algebra systems):
This implies that “tensor products of outer products” equal “outer products of tensor prod-
ucts” (yeah, I know... you’ll get used to it):
Example: The singlet state. In the theory of the addition of angular momentum (every
student’s favorite topic!), one comes across the singlet state
1
|Ψ− ⟩ = √ (| ↑↓⟩ − | ↓↑⟩)
2
in H1 ⊗ H2 , where the Hi are two-dimensional with basis {| ↑⟩, | ↓⟩}.
If one has access only to the first variable, one can obtain its distribution from the joint one
by summing over the irrelevant outcomes
X
p(1) (x1 ) = Pr[X1 = x1 ] = p(2) (x1 , x2 ). (1.11)
y
describes measurements performed on the first particle alone. More precisely, for every
observable A on H1 , we demand
tr ρ(1) A = tr(ρ(12) (A ⊗ 1)). (1.12)
To solve this problem, define the partial trace tr2 of a product operator by computing
the usual trace of the second factor only:
tr2 (C ⊗ D) = C tr(D).
Note that the partial trace maps an operator on the tensor product Hilbert space to an
operator on the first system alone. Next, because any operator M on H1 ⊗ H2 can be
expanded in terms of product operators
X X
M= Mijkl |ij⟩⟨kl| = Mijkl |i⟩⟨k| ⊗ |j⟩⟨l|,
ijkl ijkl
The singlet state. The partial trace of the singlet state is much more interesting:
1
tr2 |Ψ− ⟩⟨Ψ− | = tr2 |↑↓⟩⟨↑↓| − |↑↓⟩⟨↓↑| − |↓↑⟩⟨↑↓| + |↓↑⟩⟨↓↑|
2
1 1 1
= |↓⟩⟨↓| + |↑⟩⟨↑| = 1.
2 2 2
While the global state ρ = |Ψ ⟩⟨Ψ | was pure, the partial trace tr2 ρ = 21 1 is maxi-
− −
mally mixed! If this were a thermodynamic equilibrium state, the total system would be
at temperature 0, while Alice’s subsystem had temperature ∞. In classical physics, this is
impossible. This example shows that mixed states can occur in QM even in the absence
of any form of classical randomness. We’ll explore the conceptual implication in the next
section.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 14
The rest of this chapter present some topics in multi-partite quantum systems,
which are, I think, conceptually highly interesting. But we won’t build on them
in the remainder. So it’s fine to skip ahead to Chapter 2.
Goals
Things will get much more interesting! Our direct objective in this section is to
work out a model in which entangled states arise naturally. Even though the model
is extremely simple, we will, as a by-product, be able to make progress on issues
that seem to pose conceptual problems to QM: The measurement problem and the
question of why the world looks classical, even though it seems to be fundamen-
tally governed by QM.
Figure 1.3: The formalism of QM divides the universe into degrees of freedom that are
modeled quantum-mechanically and those that are classical. The boundary between these
two regimes is the Heisenberg cut. For a Stern-Gerlach experiment, the quantum side could
include just the spin (1), but also the motional degrees of freedom (2) of the silver atom, or
the measurement device (3) that records its final position, or even the experimentalist (4)
observing the outcome.
Given that these are completely different, quantum physicists take great care to very care-
fully explain when to use the one and when to use the other. ... Huh huh, just kidding.
Check out your introductory textbook and try to find a definition of which properties ex-
actly a physical process has to fulfill in order to qualify as a “measurement”. I wish you
good luck!
The standard presentation of quantum mechanics divides the world into a “quantum
part” and a “classical part”. The measurement rules connect the two. But it is not clear
which degrees of freedom belong to which side of this cut.
Example: In the standard treatment of the Stern-Gerlach experiment, the spin is mod-
eled quantum mechanically, but the spatial position of the atom classically. The spin-
dependent movement of the atom is treated as a measurement. But it also seems reasonable
to put the atom’s position to the quantum side of the cut (Fig. 1.3). The interaction between
spin and spatial coordinates is then described by a coherent Hamiltonian time evolution.
A measurement only takes place once an observer records the atom’s position.
We can now state to aspects of quantum mechanic’s measurement problem:
• The pragmatic problem: Why can physicists get away with being so vague about
the notion of “measurement”? Why don’t different modeling decisions produce
different predictions? (We’ll be able to answer this).
• The philosophical problem: Given that quantum mechanics is supposedly more fun-
damental than classical theories, how do we deal with the fact that its predictions
are stated with respect to a classical world? Who’s measuring the wave function of
the universe? (We won’t make progress here. In fact, there’s no agreement what’s
the best solution to this issue. Or whether there is a solution. Or whether there was
a problem in the first place. It’s a mess.)
Figure 1.4: Why do planets and electrons behave differently? An unconventional take.
Source: xkcd.com.
Likewise, why do marbles seem to be in one place at any one time, while from the
perspective of elementary QM, it would be much more natural to assign a momentum
eigenstate to them (which diagonalizes the free Hamiltonian)? Due to the their macro-
scopic mass, it is compatible with Heisenberg’s uncertainty relation that a marble can be
in a state in which both position and momentum are very precisely determined – but it is by
no means necessary that such a state be adopted. So why then does this seem to happen?
More generally: Which process breaks the unitary invariance of quantum state space
and selects the basis in which we encounter physical objects?
P2 γℏ
H= − B·σ
2m 2
Assume that B = Bzez . Then only the z-coordinate participates in the interaction, so
nothing is lost by only treating the spin and the spatial z-coordinate explicitly. The time
evolution is best calculated in interaction picture. Decompose the Hamiltonian as
Pz2 γℏB
H = H0 + HI , H0 = , HI = − zσz .
2m 2
Then the Schrödinger and the interaction–picture wave functions are
1 1 1
|ψS (t)⟩ = e iℏ tH |ψS (0)⟩, |ψI (t)⟩ = e− iℏ tH0 |ψS (t)⟩ = e iℏ tHI |ψS (0)⟩,
where |ψI (t)⟩ describes the change of dynamics caused by an interaction term HI .
First treat the case where the particle is initially in a momentum-0 eigenstate:
ℏγB
Then, with δ := 2 ,
iγ
|ψI (t)⟩ = e 2 tBzσz |ψS (t = 0)⟩
iγ iγ
= α e 2 tBz |↑⟩|k = 0⟩ + β e− 2 tBz |↓⟩|k = 0⟩
This is an entangled state! A measurement of spin and momentum gives correlated out-
comes:
|α|2 (s, k) = (↑, +δt)
Pr[s, k + dk] = .
|β|2 (s, k) = (↓, −δt)
|α|2 s = ↑
Pr[s] = .
|β|2 s = ↓
This is exactly what we would have obtained by treating just the spin quantum mechan-
ically! Thus: Using a quantum model for the spatial z-component does not change the
prediction about the measured spin state. All it does is to entangle the measured and the
measuring degree of freedom so that the global state becomes a superposition of consistent
configurations. Indeed, we could have included further degrees of freedom – e.g. the ex-
perimentalist observing the particle momentum. If we model them – simplifying slightly
– as a two-dimensional system with (mental) states |,⟩ when seeing an upwards mov-
ing atom, and |/⟩ when encountering one moving downwards, an analogous calculation
would have resulted in
Then
iγ
e 2 Bz |ψk0 ⟩ = |ψk0 +δt ⟩
The correlations between spin and position now build up over time. Indeed:
(k−δt)2
(
1 |α|2 e− 2 dk s = ↑
Pr[s, k + dk] = √ (k+δt)2
.
2π |β|2 e− 2 dk s = ↓
At t = 0, the momentum distribution is independent of the spin state. For times t ≃ 1/δ,
the two spin-dependent Gaussian distributions become distinct, but overlap significantly.
Only for t ≫ 1/δ does the sign of a measured momentum value identify the spin state
with certainty.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 18
Let’s summarize: The coupling term (zσz ) caused the spin and the positional degree
of freedom to become entangled over time. A measurement on the entangled state in
the eigenbases of the two factors z, σz leads to correlated outcomes. Asymptotically, the
correlations are perfect, and a direct measurement of one observable on the initial state
is equivalent to a measurement of the other observable on the final state. One can then
define a measurement to be any process to which the above analysis applies. In this case,
the measuring degree of freedom (called a pointer in this context) can be treated either
classically or quantum-mechanically.
The same framework can be used to identify the basis in which objects present. To see
how, compute the reduced density matrix for the spin. From
|ψI (t)⟩⟨ψI (t)| = |α|2 |↑⟩⟨↑| ⊗ |ψδt ⟩⟨ψδt | + αβ ∗ |↑⟩⟨↓| ⊗ |ψδt ⟩⟨ψ−δt | + . . .
and
Z
1 (−δt−k)2 +(δt−k)2
tr |ψδt ⟩⟨ψ−δt | = ⟨ψ−δt |ψδt ⟩ = √ e− 4 dk
2π
Z
1 k2 +(δt)2 2
=√ e− 2 dk = e−(δt) /2 ,
2π
we can read off the reduced density matrix in the {|↑⟩, |↓⟩}-basis:
2
!
|α|2 αβ ∗ e−(δt)
ρspin (t) = trspace |ψI (t)⟩⟨ψI (t)| = 2 .
α βe−(δt)
∗
|β|2
Thus, the state of the spin part alone dephases from a pure state at t = 0 to a probabilistic
mixture of |↑⟩ and |↓⟩ for times t ≫ 1/δ. The entropy (of entanglement) gradually builds
up from S(t = 0) = 0 to
Let’s again interpret this calculation from a broader perspective. After the dephasing
time, an unrelated observer will find the spin in a σz -eigenstate and will not encounter
superpositions. Recall what distinguishes the z-axis: It is the one in which the interac-
tion takes place! The bases which we perceive as “classical” are the ones in which the
interaction terms are diagonal, and the emergence of probabilistic mixtures is a result of
entanglement building up. Interactions are local, which is why quantum systems usually
appear to be well-localized in space. However, some interactions select for different bases:
e.g. electrons bound in an atom couple to the environment via the electromagnetic field.
This interaction is sensitive to atomic energy scales and angular momentum – but the wave
lengths of the involved photons is too large for the position of the electron within the atom
to make a meaningful difference. Therefore, the semi-classical description of electrons
in terms of atomic quantum numbers (“n, l, m”) makes sense in this case. In contrast,
whether or not a photon is scattered off the surface of venus depends on the planet’s posi-
tion within its orbit, not on its internal energy or angular momentum.
Further conceptual points:
• Q.: Are measurements discontinuous in time?
A.: Nope! Correlations between the measured system and the environment are built
up at a time scale proportional to the inverse coupling strength. The instantaneous
process postulated in introductory QM can be understood as an effective description
valid for times much larger than that.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 19
Goals
Quantum computing is all the rage! We’ll introduce the basic concepts here and
discuss one very cool and comparatively
√ simple application: Grover’s algorithm,
which can search through N times in N time. You heard me right.
One can iterate the construction of the two-particle Hilbert space to find the space for
n > 2 systems. Assume, for simplicity, that every single-system Hilbert space Hi has
dimension d and basis {|1⟩, . . . , |d⟩}. Then a general state vector in the joint Hilbert space
H = H1 ⊗ · · · ⊗ Hn is of the form
d
X
|ψ⟩ = ψi1 ,...,in |i1 , . . . , in ⟩.
i1 ,i2 ,...,in =1
You should immediately notice that the sum is over dn terms, i.e. the dimension of the
joint space is exponentially large in the number of constituents! For a collection of spin-
1/2s arranged on a cube with side length only 10, this gives an way-bigger-than-merely-
astronomical 21000 . This is:
• Bad news if you work in computational physics. It is absolutely out of the question
even to just store the coefficients ψi1 ,...,in in memory. Fortunately, one can some-
times use clever tricks to make statements about large-n systems without having to
work with explicit representations. More on this: See rest of these notes.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 20
• Potentially good news if you can carefully control large quantum systems. Because
Nature seems to be able to track quantum states that our classical computers can’t, it
stands to reason that quantum systems could be used to solve otherwise intractable
computational problems. More on this: See Sec. 1.4.
Many-body Hamiltonians can usually be efficiently represented, though. The reason
is that physical interaction involve only few particles at a time. A Hamiltonian with only
single- and two-body terms is of the form
X 1 X (k,l)
H= h(k) + h
2
k k̸=l
where h(ij) acts non-trivially only on the i-th and j-th Hilbert spaces and can therefore be
specified as a d2 × d2 -matrix (or, if d = ∞, will typically be a simple function of position
and momentum operators).
Given the Hamiltonian, typical questions of interest are:
1. Obtain information about the eigenvalues of H, e.g. the energies of the ground states
and of low-lying excitations.
2. Compute thermodynamical potentials, e.g. the free energy
3. Compute the expectation value ⟨ψ(t)|A(i) |ψ(t)⟩ of a local observable. Here, |ψ(t)⟩ =
t
e iℏ H |ψ(0)⟩ is the time evolution of a state that started out in a simple form, say
|ψ(0)⟩ = |i1 , . . . , in ⟩.
In general, finding answers to these questions is intractable. The task of quantum many-
body theory is to find special cases or approximations where progress can be made.
Quantum algorithms
It is not obvious that simulating the time evolution of quantum many-body systems actu-
ally is classically intractable. Sure, we have argued above that storing a many-body wave
function in memory is impossible. But we have also seen that any physical time evolution
can be described using only a small number of parameters (the local terms of the Hamilto-
nian, a simple initial state). So it is conceivable that there exists a smart universal way of
keeping track of |ψ(t)⟩ that does not involve working with the full state vector.
Today, there is strong evidence that such a universal strategy does not exist1 .
One piece of evidence is given by the existence of quantum algorithms. These are
methods that allow one to solve a difficult classical computational problem efficiently by
outsourcing parts of the calculation to a quantum device.
1 There is no rigorous proof of this, though! The issue is that is has so far been beyond the wit of humankind
to prove any reasonable problem to be computationally hard. For example, the infamous “P vs NP” problem asks
for a proof that finding solutions to problems is generally harder than verifying that a proposed solution indeed
works. An imprecise analogue would be: Appreciating classical music is easier than becoming the next Mozart.
Of course this is true – so the fact that there is no mathematical proof that “P ̸= NP” is not generally be taken to
be indicative of there being serious doubts about the statement, but rather as testament to the limitations of the
human mind. Sadly, a detailed account is beyond the scope of this lecture.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 21
Figure 1.5: Stored SHA-512 hash of my actual university login. If you find a pre-image,
you can read my emails and adjust your grades. Knock yourself out! [If you do succeed,
you could also answer my emails. Come to think of it, maybe I should just post my
password...].
Overview
There are some computational puzzles for which the best-known approach is to just try
every possible input to see whether it is a solution.
The most clear-cut cases are used in cryptography. For example, your computer does
not actually know your password! Instead, it stores an n-bit image y ⋆ = h(x⋆ ) of the
password x⋆ under a cryptographic hash function h. It is designed such that computing
y = h(x) for an input x is easy, but the best-known way of finding a pre-image x ∈
h−1 ({y}) given y is to try ≃ 2n random inputs (Fig. 1.5). To authenticate a user who
claims their password is x, the computer compares y = h(x) to the hash y ⋆ on file. The
advantage of such an indirect procedure is that not much harm is done if the stored hashes
fall into the wrong hands: A typical value of n is 512 and 2512 ≫ (hadrons in universe),
so recovering the passwords x⋆ is impractical. (Unless, of course, the user chooses a
password that can be guessed with reasonable effort. No hash magic makes “birthday-of-
romantic-partner123lol” a secure choice.)
Finding an inverse by trying random inputs does not require that we understand any-
thing about the inner workings of h. All we need is the ability to compute h(x) given x.
Methods that interact with h only in this way are called black box (or oracle) algorithms.
Are black box algorithms really the best way to invert a hash function? Don’t take my
word for it! The vast wealth stored in “crypto currencies” is secure only to the degree that
this assumption is true. BitCoin is effectively a multi-billion dollar bounty on an improved
algorithm. It hasn’t been claimed as of 2023 (Fig. 1.6).
In light of this, it is truly remarkable√that in 1996, Lov Grover showed that a quantum
computer can find x from y in roughly 2n = 2n/2 time steps. In fact, this square root
speedup is possible for any puzzle for which a solution can be efficiently recognized!
Here’s a high-level overview: We model “puzzle for which a solution can be recog-
nized” by a function f that maps n-bit strings x (“candidates”) to 0 (“no solution”) or 1
(“solution!”). In the above example: f (x) = 1 if h(x) = y ⋆ and 0 else. Assume you
have a piece of code that evaluates f on a classical computer in Tn time steps. Then
Grover’s recipe turns that code into a time-dependent two-body Hamiltonian H(t) such
that if |ψ(0)⟩ = |0, . . . , 0⟩, then
√
ψ t = cTn 2n ≃ |x⋆1 , . . . , x⋆n ⟩,
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 22
Figure 1.6: Left: Cryptocurrency mines consists of racks of computers that try random
inputs hoping to find a solution to a mathematical puzzle. Right: If you can do better,
there’s 300b dollars on the table (as of early 2023). Credit: Wikipedia, Statista.
where x⋆ is such that f (x⋆ ) = 1 and c a (reasonably small) constant. A measurement will
then reveal the bits of x⋆ with high probability.
Grover’s algorithm is also “black box” in the sense that no understanding of “the inner
workings” of f beyond the ability to compute it is required. So how is it possible to find a
solution in drastically less time than it would take to consider a fixed fraction of all inputs?
The answer is that Grover constructs a quantum black box Uf : |x, 0⟩ 7→ |x, f (x)⟩ that
can be run on a superposition of inputs
X X
Uf cx |x, 0⟩ = cx |x, f (x)⟩.
x x
Thus, just a single invocation of the quantum black box results in a wave function that
carries information about all possible inputs. The tricky part is then to read this information
out. Grover’s contribution was to find a clever trick for getting the amplitudes for all
|x, f (x)⟩ with f (x) = 0 to interfere destructively, so that only the solutions survive.
We’ll work our way through the details next.
Figure 1.7: Classical gates and circuits. (i) The N OT gate inverts the state of a single
bit. (ii) The X OR gate computes the exclusive or x ⊕ y of its inputs. (iii) The C NOT
(or controlled not) gate toggles the state of the second bit if and only if the first bit is
in the 1-state. Note that the C NOT and the N OT gate are reversible: I.e. the input can
be reconstructed given the output. (iv) A reversible circuit. It turns out that anything a
classical computer can do can be represented in this way.
Likewise, the matrix that is represented in the {|00⟩, |01⟩, |10⟩, |11⟩}-basis by
|xi xj ⟩ C NOT|xi xj ⟩
1
|00⟩ |00⟩
1
C NOT = : |01⟩ |01⟩
0 1
|10⟩ |11⟩
1 0
|11⟩ |10⟩
acts like the C NOT gate, but on qubits. Because U is unitary, it can be implemented
by a suitable two-qubit Hamiltonian (homework). This construction generalizes to all
reversible logic gates and, as per our previous comment, any classical computation can
thus be realized by a time-dependent Hamiltonian on qubits.
But of course, most unitaries are not permutations! A quantum gate is any unitary that
acts on a small number of qubits. Prominent examples with no classical counterpart are
1 0
Z= Z-gate, (1.15)
0 −1
1 0
P = phase gate, (1.16)
0 i
1 1 1
H=√ Hadamard gate. (1.17)
2 1 −1
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 24
The Hadamard gate, e.g., turns basis states into uniform superpositions:
Grover iterations
Let f : {0, 1}×n → {0, 1} be a classical function as introduced in Sec. 1.4.1. To represent
f in a quantum computer, we have to consider a reversible version. The common choice
is this:
As is the case for any reversible function, it can be expressed as a circuit consisting of
reversible classical gates. Re-interpreting these as quantum gates, we arrive at the (n + 1)-
qubit unitary
That’s promising, because a single invocation of the quantum black box did indeed leave
information about x⋆ in the output. But it’s not yet useful, because the coefficient in
front of |x⋆ , 1⟩ is exponentially small. Performing a measurement will reveal it only with
probability 2−n , exactly the same as a classical random guess would give.
Grover found a way to amplify the coefficient in front of the solution. His construction
involves the following elements, whose relevance will become clear soon:
1. Instead of Uf , which indicates whether a solution has been first by flipping an aux-
iliary qubit, use
which changes the sign of the coefficient for the solution. One can construct Vf
from Uf by throwing in an extra Hadamard gate. Verifying this is homework.
2. Introduce a second unitary
where f is replaced by the “Kronecker delta for bit-strings”, i.e. Vδ flips the sign of
the coefficient for x = 0.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 25
The big claim now is that starting from H ⊗n |0⟩, every application of the √ Grover operator
G will rotate the state vector closer to |x⋆ ⟩, hitting the target after ≃ π4 2n iterations.
Proof: Define
1 X
|/⟩ = √ n |x⟩
2 − 1 x̸=x⋆
to be the uniform superposition of all non-solutions. Then {|/⟩, |,⟩ := |x⋆ ⟩} form on
ONB for a two-dimension subspace. Remarkably, the state vector will regularly end up
in this 2-dimensional space, so we can track the progress of the algorithm solely by con-
sidering the dynamics in this small space. (This makes Grover comparatively easy to
analyze. Don’t get your hopes up, though. This never happens again). Indeed, the state
|+⟩ = H ⊗n |0⟩ can be expanded as
r
2n − 1 1
|+⟩ = |/⟩ + √ |,⟩.
2n 2n
As you can see, the initial superposition is almost parallel to the non-solutions |/⟩. The
angle they enclose is
1 1
θ := ∠(|ψ0 ⟩, |/⟩) = arcsin √ ≃√
2n 2n
(an excellent approximation for reasonably large n). Now the application of Vf changes
the sign of the coefficient in front of |,⟩. Geometrically, this corresponds to a reflection
about the plane orthogonal to |,⟩. Likewise,
−H ⊗n Vδ H ⊗n = H ⊗n (−1 + 2|0⟩⟨0|)H ⊗n = −1 + 2|+⟩⟨+|
is a reflection about |+⟩. The combinations of two reflections is a rotation, and a simple
geometric analysis in the |,⟩–|,⟩–plane (Fig. 1.8) shows it is by an angle of 2θ toward
the solution vector |,⟩. It is reached after k iterations of G, for
π π 1 π√ n
θ + k2θ = ⇔ k= − ≃ 2 (1.18)
2 4θ 2 4
as claimed.
Remarks:
• Don’t run Grover for too long! Otherwise, you’ll rotate past the solution |,⟩.
• Don’t worry if (1.18) has no integer solution. If ⟨ψ|x⋆ ⟩ = 1 − ϵ, you’ll get a
wrong solution x ̸= x⋆ with probability ≃ 2ϵ. But by assumption, we can check
the solution efficiently by computing f (x). If f (x) = 0, just rerun the quantum
algorithm.
• While impressive in its generality, the practical utility of Grover’s algorithm is lim-
ited. The “square root speedup” isn’t as large as the exponential speedup some
quantum algorithms promise. What is more, quantum computers are much harder
to build than classical ones and might require a substantial overhead to compensate
for errors. On top of all that, Grover, unlike an exhaustive classical search, cannot
√ root advantage might only materialize for n’s such
be parallelized. Thus the square
that not only 2n , but already 2n is astronomical.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 26
Figure 1.8: Time evolution of the Grover algorithm. Left panel: A Grover iteration per-
forms two reflections that combine to a rotation by θ toward the target state |,⟩ = |x⋆ ⟩.
Angles not to typical scale! Right panel: The effect of consecutive Grover rotations.
Summary
• The exponential size of the many-body Hilbert space can potentially be put
to use to solve classically hard computational problems.
• Time evolutions of few qubits are described by small unitaries, which are
called quantum gates and generalize classical logic gates.
• Classical computations can be made reversible and reversible gates re-
interpreted as unitaries. This way, classical subroutines can be can be eval-
uated on superpositions of inputs. The resulting state carries information
about their global behavior. Putting this information into a form that can be
read out may require non-trivial efforts (e.g. Grover iterations).
has been experimentally falsified as a general property of Nature! On top of the surprising
conclusion, this is remarkable because (1.19) feels like a philosophical statement that is
too vague to have testable implications. Yet here we are.
In the following derivation, we have to keep in mind that we want to reason about
theories different from quantum mechanics. This means that we cannot use any concept
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 27
Figure 1.9: Left panel: 1935 New York Times headline reporting on Einstein-Podolsky-
Rosen paper arguing that quantum mechanics was incomplete. I wonder how Podolsky
and Rosen felt about the framing. Right panel: 2015 New York Times headline reporting
on Einstein being wrong.
that has a meaning only in the context of QM. “Hilbert space”, “entanglement”, “commu-
tators”, even “photon”... ...all these terms are verboten until further notice.2
Goals
The goals of this section? You got to be kidding me! Understand that, of course.
This has got to be one of the coolest thing physics has to offer.
• Q.: So what’s up with the talk of “systems”? What are these? Photons? Spins?
A.: Unspecified. For now, these could be puffs of hot air and the measurement
2 Physicists talking about Bell inequalities have a tendency of emphasizing entanglement, or the singlet state
and how the fact that it’s spin-0 means that angular momentum measurements are anti-correlated, and some such
things. These are not wrong and even mildly helpful for the design of experiments that lead to the falsification
we are after. All this is also completely secondary to the main point; a case of people sticking to their comfort
zone.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 28
Figure 1.10: The ingredients of the CHSH scenario (for Clauser, Horne, Shimony and
Holt). Two experimentalists are located at different ends of a laboratory. Each can perform
one of two measurements on systems emanating from a box in the middle. Surprisingly, the
analysis of the set of correlations that are compatible with this extremely vaguely defined
scenario offers profound insights!
devices random number generators. Our analysis does not depend on assumptions
about their nature. (Also, what’s a photon?)
• Q.: Are Alice’s devices 1 and 2 different? Is Alice’s device 1 different from Bob’s
device 1?
A.: We do not need to make any assumptions about this.
• Q.: Why are the outcomes labeled ±1?
A.: That’s not really essential. This particular choice will work well with our analy-
sis, though.
• Q.: Can Alice rig her boxes together such that she can perform both measurement
on the same incoming system?
A.: For all we know at this point... maybe?
• Q.: Look man. You are clearly just avoiding my questions. Why don’t you study your
system first, and come back once you can give specific answers?!
A.: You got it backwards! The fewer assumptions I need to make, the more generally
applicable my conclusions will be.3
• Q.: How in the world does one come up with this?
A.: Well, it took physics a few decades. Also, literal Einstein missed it.
With the setup established, let’s look at the lab book produced by A&B. Here’s a
possible snapshot:
Alice Bob
i A1 A2 B1 B2
1 + −
2 + +
3 − −
4 + +
.. .. .. .. ..
. . . . .
3I once had a long discussion with colleague who refused to conceit this point, despite me applying all the
logic, persuasion, and appeals to authority I could muster. Very frustrating.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 29
Obviously, in each round i, both Alice and Bob can fill out only the column corresponding
to the measurement they chose to make.
We will now argue that Assumption (1.19) puts quantitative constraints on the type
of data that can appear in this setting. Later, we will see that there are experiments that
violate these constraints—thereby disproving the general validity of (1.19). (Also, QM
predicts the violations correctly. That’s also interesting, but less relevant, because at this
point, we’re open to the idea that QM could be mistaken).
Concretely, if physical properties exist independently of observations, then there exits
a complete table, say
Alice Bob
i A1 A2 B1 B2
1 + − − −
2 + − + +
3 − − + −
4 + + + −
.. .. .. .. ..
. . . . .
and in each round, A&B just decide which of the pre-existing values to uncover.
In what may feel like an unmotivated move even by the standards of the present dis-
cussion, associate the expression
C = A1 B 1 + A1 B 2 + A2 B 1 − A2 B 2
which each complete row. There’s an elegant geometric construction that leads to this
particular formula (the keyword is Bell polytope) – but it takes some time to develop, so
let’s just work with it regardless of where it comes from. In our example:
Alice Bob
i A1 A2 B1 B2 C
1 + − − − −2
2 + − + + 2
3 − − + − 2
4 + + + − 2
.. .. .. .. .. ..
. . . . . .
Despite being the sum of four terms each valued ±1, the expression (in fact: its absolute
value) is upper-bounded by 2: Factoring out Alice’s variables and applying the triangle
inequality,
It may seem that we can’t extract observable predictions out of this discussion, because
the expression C involves all four variables, and by assumption, we only have access to
two of them in every round. But there’s a nice trick to get around this! Indeed, if C ≤ 2 in
every run, then so is the average
N
1 X (i)
⟨C⟩ = C
N i=1
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 30
over N runs. But averages are linear, and therefore ⟨C⟩ equals
Each of the four terms ⟨Ai Bj ⟩ can be estimated by A&B! If they choose their settings
at random, then by the law of large numbers (or, quantitatively, by the Chernoff bound),
their observed mean will converge to the true expected value in the limit of large N . Thus,
Assumption (1.19) implies that the linear combination of these four experimentally acces-
sible numbers be no larger than 2, up to statistical fluctuations that vanish in the large-N
limit. Such a test of (1.19) is called a Bell inequality.
Following up on pioneering works that led to the 2022 Nobel Prize, it is today fairly
routine to perform experiments that are compatible with the CHSH setup and yield a value
of ⟨C⟩ ≃ 2.7.
Thus, Assumption (1.19) must be rejected as a general feature of Nature.
Joint measurements
2
Recall the Heisenberg uncertainty principle Varψ [X] Varψ [P ] ≥ ℏ4 . It is often verbally
summarized as stating that “position and momentum can’t be measured simultaneously.”
But the relation says no such thing. (Rather, it says that there’s no state |ψ⟩ that would
cause both position and momentum measurements to produce arbitrarily sharply concen-
trated outcomes.)
It is still true, however, that position and momentum cannot be measured simultane-
ously. What is more, this is true for any pair of observables that Alice can use in an
experiment that violates the CHSH inequality. Even better: This no-go statement does not
assume the validity of QM, but is an empirical fact about the universe we live in.
To state the result, we first have to say what we mean by “joint measurement”, again
without using quantum-mechanical concepts. Let’s say two measurement devices are
equivalent if give the same probability distribution over outcomes for every possible input
(Fig 1.11). Now consider two measurements 1, 2, say with two outcomes each. A joint
measurement machine for 1, 2 is a device J with two pairs of outcomes (Fig. 1.12). It
must be such that if one only considers the first pair, one obtains a measurement proce-
dure equivalent to 1; and if one only considers the second pair, one obtains a measurement
procedure equivalent to 2. The two original machines are said to be jointly measurable if
there exists a joint measurement machine for them.
Now assume that the two properties probed by Alice in the CHSH scenario are jointly
measurable and that the same is true for the two properties measured by Bob. They could
then use joint measurement machines to produce a complete table, with all properties
A1 , A2 , B1 , B2 provided in every round. The definition of a joint measurement machine
and of equivalent measurement implies that each pair i, j, the marginal distributions for
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 31
Figure 1.11: Top panel: Each physical property can be measured in many equivalent ways.
Bottom panel: Formalization this observation for probabilistic theories. Two measurement
devices 1, 1′ are equivalent if for every preparation procedure P , measuring 1 or 1′ leads
to identical probability distribution over outcomes.
Figure 1.12: (i) Two two-outcome measurement devices, 1 and 2, like the ones held by
Alice in the CHSH scenario. They are jointly measurable if there exists a measurement
device J that produces two pairs of outcomes such that: (ii) The first pair (cyan) alone
defines a measurement that is equivalent to 1, and (iii) The second pair (pink) alone defines
a measurement that is equivalent to 2.
Ai Bj the arise this way are identical to the ones that the original measurement devices
realize. In particular, the correlation function C must be the same in both cases. But, as
proven above, in this case |C| ≤ 2.
The contrapositive: In a universe where the CHSH inequality can be violated (such as
ours), there must be pairs of physical properties that cannot, as a matter of principle, be
jointly measured. This is a remarkably far-reaching statement to follow from empirical
observations alone!
• Q.: Wait. In our earlier Q&A, you said that as far as you knew, Alice could measure
her two properties jointly.
A.: And that was the right answer at that point in the analysis! We didn’t have to
assume incompatibility. We derived it. Like the cool kids.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 32
No cloning
Define a universal cloning machine to be a process that takes one physical system as input
and outputs two systems such that: Applying any measurement device to the first or to
the second output is equivalent to applying it to the input. It is clear that the existence of
a universal cloner implies the existence of a joint measurement machine for any pair of
properties (Fig. 1.13). Again, we conclude that in a universe where CHSH violations are
observed, cloning is impossible.
Figure 1.13: Top: A universal cloning machine (i) is a device that takes one physical
system as input and outputs two physical systems, where each of the outputs is indistin-
guishable from the input under any measurement (ii), (iii). Bottom: A cloner can be used
to construct a joint measurement machine.
There’s a famous paper (cited in an academic publication about once every day!) that
derives the no-cloning theorem from quantum mechanics. Here’s their proof: If U is an
operator that “clones two orthogonal states” in that
then by linearity,
1 1 1 1
U √ (|0⟩ + |1⟩) = √ (|00⟩ + |11⟩) ̸= √ (|0⟩ + |1⟩) ⊗ √ (|0⟩ + |1⟩),
2 2 2 2
so it necessarily fails to clone superpositions of the two states. That’s cool and all, but note
that it assumes the validity of quantum mechanics, whereas our argument doesn’t!
True randomness
Assume I put a dice in a cup, shake it vigorously, and put the cup upside down on a table.
Nobody will have any idea how many eyes the dice shows, so one might well model the
situation by ascribing a probability of 1/6 to any of the possible outcomes. But note that
this description only reflects my ignorance about the true state of the dice. There is no
doubt that some side is facing up even before I lift the cup. In fact, it is conceivable
in principle that a computer coupled to a camera that captured my motions might solve
Newton’s equations and predict the state of the dice accurately. Let’s refer to a variable as
pseudo-random if such a prediction is possible in principle, and as truly random otherwise.
A priori, it is unclear whether true randomness exists at all. (Pascal’s demon refers to
a thought experiment that suggests that none does).
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 33
But CHSH violations are only possible if the outcomes of Alice and Bob are truly
random. For if some process could predict the outcomes, it could do so independently of
which property they choose to measure. It could therefore predict the full table, and we
are back at the proof by contradiction outlined above.
The fact that no outside observer can predict the outcomes of Alice and Bob means
that they are, in this sense, “private” to them. This observation is the basis of provably
secure quantum key distribution protocols.
1.5.3 Interpretations
We have presented a negative argument that rules out the classical model of a world that
evolves independently from observations. It is widely argument accepted today. How-
ever, there is no positive agreement what, if anything, should replace it. Below are some
common reactions as I see them.
The orthodox position is to say that the purpose of science is to make empirically
testable predictions. QM excels at this task. Counterfactual questions about “what would
have happened had you measured something else” just amount to storytelling and lie out-
side the remit of science. So Bell is interesting for its operational consequences (Sec. 1.5.2),
but philosophically, there’s not much to be done other than to shrug and move on.
Problems with this position: (1) It is rather unambitious. Theoretical physics has his-
torically offered more than just the ability to predict detector click patterns. To just dis-
allow hypotheticals feels like giving up too early. (2) The elements of reality critique
explained next.
The Bohmians point out that sometimes, one can predict the outcome of a measure-
ment with 100% certainty. (E.g., for a system in the singlet state, when Alice measured
spin along one axis and obtained ↑, Bob will definitely obtain ↓ w.r.t. the same axis). They
argue that in such situations, reality doesn’t change if the now somewhat redundant mea-
surement is performed – so that if we consider outcomes to be real, there must already
have been some element or reality representing them before the measurement. Speaking
in terms of the lab books we analyzed above, they therefore posit that there always is a full
table representing the true state of all elements of reality at any time, measured or not.
By Bell’s argument, the table can’t be independent of the measurements made. A
more detailed analysis shows that one can accommodate CHSH violations only if Bob’s
variables change as a result of Alice interacting with her side of the joint system (and / or
vice versa). There is a simple model developed by David Bohm showing that QM can in
principle be interpreted in such a realistic (i.e. properties have values whether measured
or not) but non-local (i.e. the unmeasured parameters change due to actions far away)
way. In Bohm’s model, the change of unmeasured parameters happens in a subtle way
that is strong enough to enable CHSH violations, but too weak to allow for the exchange
faster-than-light signals between far away parties.
Therefore, the Bohmians argue, such a description is both necessary (by the elements-
-of-reality argument) and possible (by Bohm’s model). There is thus no paradox, and we
should concentrate on working out the details.
The problem with this position is that you get into tension with special relativity even in
the absence of superluminal signals. Recall that if A&B’s actions are space-like separated,
their order in time is observer-dependent. So how can I think about Alice’s actions causing
change at Bob’s end, when in some reference frames, Bob acted first?
The loopholers maintain that there are further implicit assumptions in the analysis,
some of which have to be rejected. After improved experimental techniques in the past
few years, the (unfortunately-named) free will loophole is the last major one standing.
CHAPTER 1. MULTI-PARTITE QUANTUM SYSTEMS 34
Recall that we have assumed that A&B choose their settings randomly. More precisely,
the empirical means for Ai Bj only converge to the expected values ⟨Ai Bj ⟩ if the probabil-
ity of choosing a setting is independent of its value. (Think of an election pollster calling
random citizens on their landlines during work hours, to ask about their voting intentions.
Retired people are more likely to answer the phone—potentially skewing the result, as their
voting preferences are different from the population as a whole). But A&B are physical
systems, too! They share a common history with the central box. It is therefore unjustified,
it is argued, to assume that they can make independent choices.
Problems: (i) The position “proves too much”. It seems like it can be used as a general
argument against all of empirical science (“apples mostly fall up, but we only look when
they fall down”). (ii) One can design the choice function of A&B in such a way that it
would take one sophisticated cosmic conspiracy to still produce a CHSH value of 2.7.
People have performed Bell experiments where the settings were driven by fluctuations in
the cosmic background radiation measured at different sections of the night sky, XOR’ed
against the input of internet users participating in an online action game.
The many-worlders content that QM anyway has a philosophical problem (the one
we didn’t address in Sec. 1.3.1), so let’s fix all issues in one fell swoop. They then throw
out the measurement postulate and posit that there exists a “wave function of the universe”
that evolves under a global Hamiltonian. The reality we experience is an emergent feature
of this wave function – not a pre-existing concept like in standard QM.
Without a measurement postulate that will probabilistically pick one “branch of a su-
perposition”, all of them have an equal right to being considered as “real”. For example,
if |,⟩ is the state of all of my elementary particles that correlates with me feeling happy,
then summands in a superposition state like α|↑⟩|δt⟩|,⟩ + β|↓⟩| − δt⟩|/⟩ (encountered
in Eq. (1.14)) should be interpreted as different co-existing “worlds” in which my feelings
are correlated with other degrees of freedom of the universe. In particular, in a CHSH
experiment, all possible outcomes are simultaneously realized in different branches of the
wave function. Any philosophical problem tied to the assumption that only one branch
actually happens is thus spurious.
The problem here is that the measurement postulate, clunky as it may be, is what
connects the formalism to reality! If you claim it’s unnecessary, it’s on you to re-derive
the empirical content of the theory in this reduced framework. One important touchstone
is the Born rule which says in this language that “if my wave function splits into two
branches with amplitudes α and β, I experience these with probability |α|2 , |β|2 respec-
tively”. Researchers working on many-world formulations therefore spend a lot of time
thinking about probabilities and their interpretation (but, to my personal taste, haven’t
cracked this nut yet).
Indistinguishable particles
There’s a simple construction that seems to account for all fundamental particles. Let
H(1) be a single-particle Hilbert space with basis {|i⟩}. If the particles were distinguish-
able, a general element of the n-body joint Hilbert space would be
X
|ψ⟩ = ψi1 ,...,in |i1 , . . . , in ⟩ ∈ (H(1) )⊗n .
i1 ,...,in
Let’s look for subspaces of (H(1) )⊗n that make sense for indistinguishable particles. Let
τkl be the operator that exchanges the k-th and the l-th factor:
τkl (| . . . , ik , . . . , il , . . . ⟩) = | . . . , il , . . . , ik , . . . ⟩.
If the particles are indistinguishable, then |ψ⟩ and τkl |ψ⟩ should describe the same physics.
This is certainly true of they differ at most by a phase factor. Because τkl 2
= 1, such a
36
CHAPTER 2. INDISTINGUISHABLE PARTICLES 37
phase must be ±1. The totally symmetric or Bosonic subspace Symn (H(1) ) consists of all
vectors such that
At this point, many texts “prove” that the construction leading to Fermions and
Bosons are the only conceivable ways for building a quantum theory of indistin-
guishable particles. I find all these arguments inconsistent and unhelpful to the
degree that I’m prepared to claim the world would be better if they all just be for-
gotten. Ask me about it, or maybe don’t.
The Bosonic and Fermionic Hilbert spaces can therefore also be defined as the sets of
vectors such that
You have encountered this concept before, in the definition of the determinant of an
(n × n)-matrix:
X n
Y
det M = sgn(π) Mi,π(i) . (2.1)
π∈Sn i=1
(a) (b)
π1
π1 = π2π1
π2
Figure 2.1: (a) A permutation can be represented as a graph, where each position indicates a letter,
and where the arrows points to where each letter is mapped. (b) One can multiply permutations σ1
and σ2 by performing one after the other.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 38
How many permutations of n letters are there? There are n ways of choosing a new
place for the first symbol, then n − 1 ways for the second symbol (as we can’t repeat the
first one), etc, for a total of
|Sn | = n(n − 1) · · · 2 · 1 = n! .
This explains the various “factorials” that will appear in formulas below.
We can now find bases for the Bosonic / Fermionic subspaces. Indeed, if
X
|ψ⟩ = ψi1 ,...,in |i1 , . . . , in ⟩
i1 ,...,in
The vector in parentheses only depends on the number of times nk each single-particle
basis element |k⟩ appears in the product |i1 ⟩ . . . |in ⟩. This P
motivates the definition of the
occupation number basis. For ni ∈ {0, 1, 2, . . . } such that i ni = n, set
1 X
|n1 , n2 , . . . ⟩ := p Q π| 1, . . . , 1, 2, . . . , 2, . . . ⟩
n! k nk ! π∈Sn
| {z } | {z }
n1 × n2 ×
1 X
=p Q π(|1⟩⊗n1 |2⟩⊗2 . . . ). (2.3)
n! k nk ! π∈Sn
The funky factorial factor makes the vector normalized (check it!). By (2.2), any Bosonic
state vector can be expanded in the occupation number basis.
Anti-symmetry makes things a bit more exciting, though: Again look at the vector in
parentheses for some choice i1 , . . . , in of single-particle states. If one state occurs twice
(say ik = il ), then
| . . . , ik , . . . , il , . . . ⟩ + sgn(τkl ) τkl | . . . , ik , . . . , il , . . . ⟩ = 0
which implies that the sum is 0. Therefore, in the Fermionic occupation number basis
1 X
sgn(π)π |1⟩⊗n1 |2⟩⊗n2 . . . ,
|n1 , n2 , . . . ⟩ := √ (2.5)
n! π∈Sn
CHAPTER 2. INDISTINGUISHABLE PARTICLES 39
nk must be either 0 or 1. This explains the Pauli principle! Beware that in the Fermi
case, the sign of the occupation number basis elements (2.5) depend on an ordering of
single-particle basis vectors.
For the anti-symmetrization of general single-particle vectors |α1 ⟩, . . . , |αn ⟩, one also
uses the wedge product notation
1 X
|α1 ⟩ ∧ · · · ∧ |αn ⟩ := √ sgn(π)π |α1 ⟩ ⊗ · · · ⊗ |αn ⟩
n! π∈Sn
pronounced “alpha one, wedge alpha two, ...”. Wedge products are also called Slater de-
terminants. That’s because one can express the wedge product as a “formal determinant”:
Here, the super-scripts indicate which tensor factor the vector belongs to.
The singlet state √12 |↑↓⟩ − |↓↑⟩ = |↑⟩ ∧ |↓⟩ is clearly anti-symmetric. In occu-
pation number notation with respect to the |↑⟩, |↓⟩-basis, it is given by |1, 1⟩.
Assume dim H(1) = d < ∞. In both the Bose and the Fermi case, the occupation
number bases give us a combinatorial way to compute the dimension of the Hilbert
spaces.
Fermions: Basis elements are labeled by subsets S ⊂ {1, . . . , 1} of size |S| = n.
Thus
d
dim ∧n Cd =
.
n
Can you find it? (Spoiler: Search for “stars and bars”).
The occupation number basis adds another possible meaning to the heavily over-
P of “a list of numbers in a ket”. In particular, in |n1 , n2 , . . . ⟩ =
loaded notation
1
√ Q π π|1, . . . , 1, 2, . . . , 2, . . . ⟩ the numbers in the ket on the l.h.s. count
n! k nk !
occupations, while the numbers in the ket on the r.h.s. are indices of some single-
particle basis. Which of these definitions is meant, and which single-particle basis it
is relative to, and whether the occupation numbers are for Fermions or for Bosons,
or whether the numbers have nothing to do with these many-body concepts and are
more general “quantum numbers” (like the labels |n, l, m⟩ of the atomic basis) has
to be inferred from context. There’s no general, reliable rule.
Look. If I were the emperor of physics, I’d outlaw this mess. But I’m not and
everybody is using it. After you got used to it, you’ll find that this convention
causes surprisingly few catastrophic misunderstandings.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 40
Summary
Let H(1) be a single-body Hilbert space with basis {|i⟩}. Then a general state of n
indistinguishable particles can be expressed in the occupation number basis as
X
|ψ⟩ = cn1 ,n2 ,... |n1 , n2 , . . . ⟩,
n1 ,n2 ,...
1 X
|n1 , n2 , . . . ⟩ = p Q (sgn π)ζ π(|1⟩⊗n1 |2⟩⊗2 . . . ), (2.7)
n! k nk ! π∈Sn
πA(i) π −1 = A(πi ) .
A measurement on any one particle is thus equal to the average over all of them – the
formalism no longer allows us to pick out the properties of individual particles.
Now assume A has an eigendecomposition
X
A= λi |i⟩⟨i|.
i
Then for an element of the occupation number basis with respect to the eigenbasis {|i⟩} of
A, one computes from (2.7)
n
X X
A(j) |n1 , n2 , . . . ⟩ = λi ni |n1 , n2 , . . . ⟩. (2.9)
j=1 i
In particular, single-body operators are diagonal in the occupation number basis. If the
single-body eigenvalues are sorted λ0 ≤ λ1 ≤ . . . , then the lowest n-body eigenvalue in
the Bosonic case is nλ0 and in the Fermionic case λ0 + · · · + λn−1 . For Fermions, if the
λi ’s describe energies, then λn−1 , the largest energy still occupied in the ground state, is
called the Fermi energy.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 41
The Coulomb repulsion term between two electrons, h(1,2) ∝ ∥x1 − x2 ∥−1 , does
not depend on spin. However, when combined with the anti-symmetrization postu-
late for Fermions, an effective coupling between electron spins arises. It is impor-
tant, e.g. in magnetism and atom physics. We’ll look at a simple case: the electrons
of the Helium atom in first-order perturbation theory.
Treating the nucleus as fixed, the Hamiltonian for the Helium atom is
We could also define operators τ (space) and τ (spin) that only act on one of them:
τ (space) (|ϕ1 ⟩|s1 ⟩)(|ϕ2 ⟩|s2 ⟩) = (|ϕ2 ⟩|s1 ⟩)(|ϕ1 ⟩|s2 ⟩),
τ (spin) (|ϕ1 ⟩|s1 ⟩)(|ϕ2 ⟩|s2 ⟩) = (|ϕ1 ⟩|s2 ⟩)(|ϕ2 ⟩|s1 ⟩)
so that τ = τ (space) τ (spin) . The Hamiltonian H commutes not only with τ , but (in this
case) with τ (space) and τ (spin) individually. We can therefore find a common eigenbasis, i.e.
energy eigenvectors that also have well-defined parity with respect to the exchange of each
of the spatial and the spin parts. To get anti-symmetry under τ , exactly one of these two
parts has to be anti-symmetric. That’s what happened in (2.10).
CHAPTER 2. INDISTINGUISHABLE PARTICLES 42
Excited states
The first excited states of H0 are the ones where one electron remains in the ground state
and one is in |ϕ2,0,0 ⟩ =: |2⟩ (spectroscopic: “1s, 2s”). Taking spin into account, the first
excited energy of the non-interacting Hamiltonian is thus four-fold degenerate:
As discussed above, we can choose a basis of states that are symmetric / anti-symmetric in
the spatial and spin degrees individually:
o
√1 |1⟩|2⟩ + |1⟩|2⟩ √1 |↑↓⟩ − |↓↑⟩
2 2
(S = 0, “singlet”)
√1 |1⟩|2⟩ − |1⟩|2⟩ √1 |↑↓⟩ + |↓↑⟩
2 2
√1 |1⟩|2⟩ − |1⟩|2⟩ |↑↑⟩ (S = 1, “triplet”)
2
√1 |1⟩|2⟩ − |1⟩|2⟩ |↓↓⟩
2
Again, the energy correction only depends on the spatial part. In particular it is the same
for the last three vectors. For the first two, we get
1
⟨1|⟨2| ± ⟨1|⟨2| h(1,2) |1⟩|2⟩ ± |1⟩|2⟩ = ⟨1|⟨2|h(1,2) |1⟩|2⟩ ± Re⟨1|⟨2|h(1,2) |2⟩|1⟩.
2
The first matrix element is again a “Coulomb integral”
2e2
Z
1
I := ⟨1|⟨2|h(1,2) |1⟩|2⟩ = |⟨x1 |1⟩|2 |⟨x2 |2⟩|2 d3 x1 d3 x2 > 0,
4πϵ0 ∥x1 − x2 ∥
which allows for the same probabilistic interpretation as given for Eq. (2.11). The second
one is called the exchange integral
2e2
Z
1
J := ⟨1|⟨2|h(1,2) |2⟩|1⟩ = ⟨1|x1 ⟩ ⟨2|x2 ⟩ ⟨2|x1 ⟩ ⟨1|x2 ⟩ d3 x1 d3 x2 .
4πϵ0 ∥x1 − x2 ∥
The exchange integral is also positive, although that’s less obvious.
2e2
Z
1
J= ⟨1|x1 ⟩ ⟨2|x1 ⟩ ⟨1|x2 ⟩⟨2|x2 ⟩ d3 x1 d3 x2 .
ϵ0 4π∥x1 − x2 ∥
Defining
Z
1
ϕ(x) := ⟨1|x⟩⟨2|x⟩, A := |x1 ⟩ ⟨x2 | d3 x1 d3 x2 ,
4π∥x1 − x2 ∥
CHAPTER 2. INDISTINGUISHABLE PARTICLES 43
2e2 |⟨ϕ|k⟩|2 3
Z
J= d k > 0.
ϵ0 ∥k∥2
The effect of the interaction is thus twofold: (i) It uniformly increase the energies by
the Coulomb term I describing the expected repulsion felt by the two electrons (as one
would expect). (ii) It introduces a splitting by 2J of the energies between the symmetric
S = 1 and anti-symmetric S = 0 spin states. The physical way to think about the second
effect is that anti-symmetry in the spatial part “allows the electrons to avoid each other”,
thus decreasing the energy penalty due to electron-electron repulsion.
In this two-spin Hilbert space, the effective Hamiltonian is, up to an irrelevant global shift
of energies,
Heff = −Jτ.
σj σj = 1 + σ (1) · σ (2)
(1) (2)
X
τ=
j∈0,x,y,z
The exchange principle can thus be described as an effective interaction between the two
spins. Equation (2.12) is an embryonic version of the Heisenberg model of magnetism.
Goals
This section is mostly formal (definitions, generic constructions). Not too excit-
ing? Maybe. But familiarizing you with the formalism of “second quantization” is
one of the most important goals of this lecture. Much builds on it. Be alert!
So far, we have considered systems with a fixed number n of particles. We will now
treat the particle number as variable. Mathematically, this actually simplifies some cal-
culations (we won’t have to worry about combinatorial expressions like (2.6) any more).
Physically, this step is necessary e.g. for relativistic theories, where different species of
particles can be converted into each other.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 44
with |ψn ⟩ ∈ Symn (H(1) ) (Bosons) or |ψn ⟩ ∈ ∧n (H(1) ) (Fermions). Terms corresponding
to different particle numbers are taken to be orthogonal, so that inner products are
∞
X
⟨ψ|ψ ′ ⟩ = ⟨ψn |ψn′ ⟩.
n=0
Wait, n = 0 is included? That’s right, we allow for systems with zero particles. To make
sense of that, define
(H(1) )⊗0 , ∧(0) (H(1) ), Sym(0) (H(1) ) := C1 ,
the Hilbert space of one-component vectors. Up to a phase, it only contains a single
normalized vector, which is called the vacuum and denoted as |vac⟩ or |0⟩.
This construction is very transparentP in the occupation number basis, where it basically
amounts to removing the constraint i ni = n (and all the combinatorial nastiness that
comes with it). With respect to a basis {|i⟩} of H(1) , Fock space is the Hilbert space with
basis |n1 , n2 , . . . ⟩, where ni ∈ {0, 1, 2, . . . } for Bosons and ni ∈ {0, 1} for Fermions.
The vacuum is |0, 0, . . . ⟩ = |vac⟩ = |0⟩.
We’ll usually fix a basis {|i⟩} of the single-body Hilbert space and work in the as-
sociated occupation number basis, where the ladder operators act in a transparent way.
Eq. (2.14) implies
√
a†i | . . . ni−1 , ni , ni+1 . . . ⟩ = ni + 1(−1)ζ j<i nj | . . . ni−1 , ni + 1, ni+1 . . . ⟩.
P
Here, we use the convention that |n1 . . . ⟩ equals 0 if one of the occupation numbers is
negative, or, in the Fermionic case, additionally if one occupation number exceeds 1. Ex-
plicitly, for Bosons:
√
a†i | . . . ni−1 , ni , ni+1 . . . ⟩ = ni + 1| . . . ni−1 , ni + 1, ni+1 . . . ⟩,
√ (2.15)
ai | . . . ni−1 , ni , ni+1 . . . ⟩ = ni | . . . ni−1 , ni − 1, ni+1 . . . ⟩,
and for Fermions
a†i | . . . ni−1 , ni , ni+1 . . . ⟩ = (−1)
P
nj
j<i | . . . ni−1 , ni + 1, ni+1 . . . ⟩,
P
nj
(2.16)
ai | . . . ni−1 , ni , ni+1 . . . ⟩ = (−1) j<i | . . . ni−1 , ni − 1, ni+1 . . . ⟩.
Iterating, any basis element can be written using creation operators acting on the vacuum:
into (2.14) shows that “creation operators can be expanded like kets and annihilation op-
erators like bras”:
⟨i|α⟩a†i ⇒ aα =
X X
a†α = ⟨α|i⟩ai . (2.18)
i i
We don’t need to restrict ourselves to normalizable states. For example, if |α⟩ = |x⟩
is a delta function centered at x ∈ R3 and |i⟩ = |ϕi ⟩ for some smooth function ϕi (x) in
L2 (R3 ), then the above reads
X X
|x⟩ = |ϕi ⟩⟨ϕi |x⟩ = ϕ̄i (x)|ϕi ⟩,
i i
CHAPTER 2. INDISTINGUISHABLE PARTICLES 46
ϕ̄i (x)a†i ⇒ ax =
X X
a†x = ϕi (x)ai .
i i
Recall that a classical field is any physical quantity that depends on points in space.
The ax are quantum operators depending on points in space, and thus a first example
of a quantum field. These annihilation field operators and their Heisenberg-picture time
evolution are commonly written as
t t
Ψ̂(x) := ax , Ψ̂(t, x) := ax (t) = e− iℏ H ax e iℏ H .
Despite the similarity in notation, the field operators Ψ̂(x) should not be confused with
wave functions ψ(x) ∈ L2 (R3 )!
All the caveats that apply to delta functions (App. A.1.8) likewise apply to the
Ψ̂(x). In particular, formulas involving field operators have physical content only
when integrated against smooth functions. (In the mathematical literature, the Ψ̂(x)
are therefore referred to as operator-valued distributions, to indicate that they give
proper operators only after an integration). See the discussion around (2.23) for an
example of how this pans out.
The converse of the above construction also works. From the completeness relation for
delta functions (A.13):
Z Z
|α⟩ = α(x) |x⟩ d3 x ⇒ a†α = α(x) Ψ̂† (x) d3 x. (2.19)
Commutation relations
As is the case for the treatment of the harmonic oscillators with ladder operators, their
commutation relations are important in calculations.
To treat the Bosonic and Fermionic cases in parallel, introduce the notation
so that
How should one interpret Eq. (2.22)? Recall the general rule that expressions in-
volving delta functions carry meaning only when integrated against smooth func-
tions. Viewed this way, (2.22) turns out to be an equivalent restatement of the un-
problematic version (2.21). Indeed, for smooth functions α(x), β(x), combining
Eq. (2.19) and Eq. (2.22) gives
Z Z
[aα , a†β ] = ᾱ(x)β(y)[Ψ̂(x), Ψ̂(y)† ] d3 x d3 y
Z Z
= ᾱ(x)β(y)δ(x − y) 1 d3 x d3 y (2.23)
Z
= ᾱ(x)β(y)1 d3 y = ⟨α|β⟩1.
for a single-particle term h(k) (e.g. h(k) = Pk2 /(2m)) and an interaction term h(k,l) (e.g.
h(k,l) = V (xk − xl )). On Fock space, we have to sum over all possible particle numbers
n, so that, e.g., the single-particle term becomes
∞ X
M n
h(k) .
n=1 k=1
These formulas become much cleaner when expressed in terms of creation and annihilation
operators.
Indeed, choose a single-particle basis {|i⟩} and consider the expansion
X X
h= ⟨i|h|j⟩ |i⟩⟨j| = hij |i⟩⟨j|. (2.24)
ij ij
We claim that for both Bosons and Fermions, the following holds:
∞ X
n
hij a†i aj .
M X
h(k) = (2.25)
n=1 k=1 ij
In other words: We can formally move from single-body operators to many-body operators
replacing “ket’s by creation operators and bra’s by annihilation operators”.
This is not so surprising if we look at (2.24) in the right way. The bra ⟨j| is a linear
map from H(1) to the complex numbers, a space that we have since identified as
the “vacuum sector”. In this sense, ⟨j| maps the single-particle state |j⟩ to |vac⟩.
Dually, we can re-interpret the ket |i⟩ as a linear map C(1) → H(1) , (z) 7→ z|i⟩, or
|vac⟩ 7→ |i⟩. Thus, the familiar matrix element expansion (2.24) can be interpreted
as a superposition of processes that “destroy a particle in state |j⟩ and create one in
state |i⟩, weighted by the amplitude hij ”. From this point of view, (2.25) amounts to
the claim that the same description remains valid in higher particle number sectors.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 48
To verify (2.25) start with the case where {|i⟩} is an eigenbasis of h. We have already
found in (2.9) that in this case, the occupation number basis diagonalizes the single-body
operator, so that
∞ X
M n X X
h(k) |n1 . . . ⟩ = λi ni |n1 , n2 , . . . ⟩ = λi a†i ai |n1 , n2 , . . . ⟩
n=1 k=1 i i
as claimed. The general case follows from the fact that, as remarked around (2.18), “cre-
ation operators transform like kets and annihilation operators like bras”: If {|αi ⟩} is an-
other single-particle basis, then inserting completeness relations and using (2.18) gives
⟨i|h|i⟩a†i ai = ⟨i|h|j⟩a†i aj
X X
(h is diagonal in {|i⟩}-basis)
i ij
where the super-script denotes the two particles on which the operator acts non-trivially.
The factor 1/2 is there to avoid double-counting of (k, l) and (l, k). As above, one can
show that
∞ n
1 M X (k,l) 1X
h = hijrs a†i a†j as ar , hijrs = ⟨ij|h|rs⟩.
2 n=1 2 ijrs
k̸=l=1
Note that the indices s, r of the annihilation operators are reversed as compared to the
indices in the matrix element! This makes the sign come out right in the Fermionic case.
We omit the proof.
leading to
Z
(2π) −3/2
Ũ (k′ − k)a†k′ ak d3 k′ d3 k. (2.26)
Read that as: A potential term can change the momentum of particles. The amplitude
associated with a change of q = k′ − k is proportional to the Fourier transform Ũ (q) of
the potential.
If one works in a box of finite volume V = L3 , then (in the sense of App. A.1.9), the
expression becomes
1
Ũ (k′ − k)a†k′ ak d3 k′ d3 k.
X
√
V k,k′′ ∈ Z /(2πL)
3
P2 ℏ2 ℏ2
Z Z
= 2 3
∥k∥ |k⟩⟨k| d k 7→ ∥k∥2 a†k ak d3 k.
2m 2m 2m
In the sense of App. A.1.8, one can also express these in position basis:
Z
P 7→ −iℏ Ψ̂† (x) ∇ Ψ̂(x) d3 x,
P2 −ℏ2 ℏ2
Z Z
7→ Ψ̂† (x) ∇2 Ψ̂(x) d3 x = (∇Ψ̂(x)† )(∇Ψ̂(x)) d3 x.
2m 2m 2m
These expressions are very suggestive, but also easy to misinterpret. Keep in mind
that Ψ̂(x) = aδx is not a complex function on R3 , but rather a field of annihilation
operators for delta functions indexed by x. If you are confused, read the explanation
in App. A.1.8. If you are not confused, then you’re probably missing something
(confusion is the natural state at this point!), so you should really read App. A.1.8!
Chemical potential. The point of Fock space is that the particle number is variable. The
problem with Fock space is that the particle number is variable. Let’s say you want to find
the ground state of a gas (as we’ll do later). There will be some mechanism (walls of a
container, pressure exerted by other gases, ...) that controls at least the average number of
particles in the gas. We could explicitly describe this mechanism (sounds complicated),
or just follow the lead of the grand canonical ensemble of stat mech and add an effective
term −µN̂ that formally adjusts the energy carried by a particle, and then vary µ until the
ground state shows the right average particle number. The operator implementing this is
just
Z Z
Ψ̂(x) (−µ)Ψ̂(x) dx = a†k (−µ) ak dk.
†
CHAPTER 2. INDISTINGUISHABLE PARTICLES 50
where the amplitude f (k1 , k2 , q) remains to be found. Comparison with (2.26) suggests
that f might be related to the Fourier transform of the potential. That turns out to be true:
Z 3
d x1 d3 x2 i(k1 −k1′ )x1 +i(k2 −k2′ )x2
⟨k1′ , k2′ |V |k1 , k2 ⟩ = e V (x1 − x2 )
(2π)3 (2π)3
Z 3
d x1 d3 x2 i(k1 −k1′ )x1 +i(k2 −k2′ )x2 d3 q iq(x1 −x2 )
Z
= 3 3
e e Ṽ (q)
(2π) (2π) (2π)3/2
d3 q
Z Z 3 Z 3
d x1 i(k1 −k1′ +q)x1 d x2 i(k2 −k2′ −q)x2
= Ṽ (q) e e
(2π)3/2 (2π)3 (2π)3
d3 q
Z
= (2π)−3/2 Ṽ (q) δ(k1 + q − k1′ )δ(k2 − q − k2′ ).
(2π)3/2
Summary
• Commutation relations
⟨i|α⟩a†i
X
a†α =
i
Formally, one says that the two systems are unitarily equivalent. Define a linear map
U from L2 (Rn ) to FS (Cn ) by requiring that it sends an element |n1 , . . . ⟩L (R )
2 n
The most elementary case are lattice vibrations, or phonons. Let’s have a look.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 52
2.3.1 Phonons
Goals
The phonon Hamiltonian is conceptually easy to solve (by undergrad mechan-
ics tools), but has much to teach us! Here, phonons will serve as an example of
how Fock space describes collective excitations, rather than arising from a single-
particle space. We’ll also have the opportunity to recall normal mode expansions.
A continuum limit will later motivate rules for field quantization.
We have to specify boundary conditions. If the chain is longer than the length scale of any
phenomenon we’ll be studying, boundary effects shouldn’t matter much (c.f. App. A.1.9).
We therefore opt for the mathematically simplest case: cyclic boundary conditions, i.e. we
assume that the indices of the operators in (2.28) only depend on r modulo N .
The chain Hamiltonian is quadratic in positions and momenta and can therefore be
diagonalized using canonical transformations (App. A.2.2). Working out the details is an
excellent exercise, so we only present the final result here.
For n = 1 . . . N and k = n 2π
L with L = N a the total length, define
r N r N
1 X −ikra 1 X ikra
ϕk = e Xr , πk = e Pr .
N r=1 N r=1
In the sense of App. A.2.2, the ϕk , πk correspond to complex normal coordinates associ-
ated with standing waves with quasi-momentum k. Then
r r r
1 mωk 1 κ
ak = √ ϕk + i π−k , ωk = 2| sin(ka/2)|
2 ℏ mℏω k m
define annihilation operators ([ak , a†k′ ] = δk,k′ ) that diagonalize the Hamiltonian
X 1 1
ℏωk a†k ak +
X X
H= πk π−k + 2κ sin2 (ka/2)ϕk ϕ−k = .
2m 2
k k k
The (Heisenberg picture) equations of motion iℏ∂t ak (t) = [ak , H] are then solved by
ak (t) = e−iωk t ak (0). For the original observables this means
r
ℏ X 1
Xr (t) = √ (ak e−iωk t+ikar + a†k eiωk t−ikar ),
Nm 2ω k
k
r r (2.29)
mℏ X ωk −iωk t+ikar † iωk t−ikar
Pr (t) = −i (ak e − ak e ).
N 2
k
CHAPTER 2. INDISTINGUISHABLE PARTICLES 53
In these expressions, we’ve grouped adjoint terms together, to emphasize that Xr is Her-
mitian. Sometimes it’s more advantageous to group terms by complex normal modes
instead:
r
ℏ X 1
ak (t) + a†−k (t) eikar ,
Xr (t) = √
Nm 2ωk
k
r r (2.30)
mℏ X ωk † ikar
Pr (t) = −i ak (t) − a−k (t) e .
N 2
k
Finally, note that every formula in this section equally applies to the classical case,
with the only exception that the Hamilton function reads (c.f. App. A.2.2):
X
H= ℏωk |ak |2 .
k
is the total particle number operator, then U (1) acts on Fock space as eiϕN̂ . Thus, U (1)-
transformations induce relative phases between subspaces of different particle numbers.
These will change the expectation values of observables that do not commute with N̂ .
So can we observe global phase changes of single-particle states when working with
many-body systems?
For non-relativistic massive particles (i.e. the kind of systems treated in undergraduate
QM courses), the answer is “no”. Loosely speaking, we expect that in a “non-relativistic
theory” deserving of that name, massive particle cannot be created or destroyed. We should
then require that all physical observables commute with total particle number. The re-
quirement that all physical observables obey an extra symmetry (i.e. [A, N̂ ] = 0) is called
a superselection rule. In particular, because
linear expressions in ladder operators are not directly observable in the presence of this
superselection rule.
The Fock space for phonons was not constructed starting from a single-particle Hilbert
space of a non-relativistic massive particle, so the argument does not apply in this case.
And indeed, the observable (2.29) corresponding to the displacement of the r-th particle
(clearly a measurable quantity, at least in principle) is a linear combination of ladder oper-
ators. Also, as we’ll see next, when the particle number tends to infinity, the physical and
mathematical definition of N̂ becomes iffy, which may lead to non-relativistic systems to
behave as if particle number conservation is violated.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 54
Figure 2.2: The motion in a Newton cradle is determined by energy and momentum
conservation alone. (Figure adapted from Wikipedia.)
U X †
(ϵk − µ)a†k ak + ak+q a†k′ −q ak′ ak .
X
H= (2.32)
2V ′
k k,k ,q
This still is difficult to treat, so let’s get some intuition first, to guide our analysis.
Superfluidity
At very low temperature, Helium becomes superfluid: A particle slowly passing through
it does not experience friction. Here’s a way to think about that: Recall the Newton cradle
(Fig. 2.2), where one can uniquely determine the number of balls being excited merely
from energy and momentum conservation. Likewise, one may model the interaction be-
tween the particle and the gas as a scattering process, where the particle transfers energy
and momentum to the gas. Now imagine that the energy-momentum relations of the par-
ticle and the excitations of the gas are “out of tune” in the sense that there is no process
that would respect both conservation laws. In this case, no scattering is possible and one
would expect the particle to pass through the gas uninhibited.
With this model in mind, we set it as our goal to work out the energy-momentum
relation of the low-lying excitations of H.
Bose-Einstein condensation
Recall that for non-interacting Bosons (i.e. when V = 0), the ground state is achieved
when all particles are in the lowest-energy state of the single-particle term. It is plausible
(though a very difficult question to treat rigorously) that remnants of this behavior per-
sist for non-zero interaction V and for low-lying states. We will thus treat H under the
assumption that there is a finite density
n0 1
ρ= = ⟨a†0 a0 ⟩ (2.33)
V V
of particles occupying the k = 0 mode. To achieve this, we add a “chemical potential
term” −µN̂ to the Hamiltonian and will later adjust µ to achieve (2.33).
CHAPTER 2. INDISTINGUISHABLE PARTICLES 55
If you are already fully convinced, or if “mildly convincing” is anyway all you
aim for at this moment, you can skip ahead to Sec. 2.6.
“One cannot implement operators that act on macroscopically many particles.” (2.35)
2.5.1 Ferromagnetism
The guiding phenomenlogical example is ferrogmagnetism. If cooled below its Curie tem-
perature, a ferromagnet develops a magnetic moment M ̸= 0. In the absence of external
fields, the moment M is equally likely to point into any direction. Thus, statistically, the
behavior is rotationally invariant. But every time the magnet is cooled down, it “sponta-
neously” singles out one direction in space, thereby “breaking the symmetry”.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 56
The simplest case of a model exhibiting ferromagnetic behavior is the Ising model. It
involves N spin-1/2 particles – and in fact, we can learn a lot by looking at their Hilbert
space in the limit N → ∞, even before introducing the Hamiltonian.
Indeed, consider the two states (depending on the relative phase)
1
N
|ψ± ⟩ = √ (|↑⟩⊗N ± |↓⟩⊗N ).
2
The |ψ± N
⟩ are eigenvectors of σx⊗N with eigenvalue +1 and −1 respectively. Despite them
being orthogonal, I claim that as N gets macroscopic, the two states become effectively
indistinguishable.
To justify this outrageous claim, assume that just one of the macroscopically many par-
ticles is lost (as will always, realistically, be the case). Then any measurement effectively
takes place on the reduced density matrix
1 1
N
tr1 |ψ± N
⟩⟨ψ± |= (|↑⟩⟨↑|)⊗(N −1) + (|↓⟩⟨↓|)⊗(N −1) ,
2 2
which is a uniform mixture of |↑ . . . ⟩, |↓ . . . ⟩, and independent of the relative phase. In this
sense: The operator σx⊗N does not actually describe a physically realizable measurement
in the limit N → ∞.
Sometimes, it is beneficial to keep idealized mathematical objects around (like δ func-
tions) even if they are not directly physical. In this case, however, it turns out that we’ll
attain a cleaner understanding of ferromagnetism, superfluidity, and many other impor-
tant quantum many-body phenomena, if we commit to the principle (2.35) and declare
operators like σx⊗N to be unphysical in the limit N → ∞.
Let’s explore this further. Define H↑ to be the space of states that can be reached by
physical operations starting from |↑⟩⊗N and define H↓ analogously. For microscopic N ,
the two spaces are identical, but as N → ∞, they become orthogonal. A good way to see
this is to consider the average magnetization. For a state |ψ⟩, it is defined as
N
1 X
m := ⟨ψ|σz(k) |ψ⟩. (2.36)
N
k=1
We immediately see that the Hamiltonian is invariant under a simultaneous flip of all spins,
realized by the operator σx⊗N . Also, the ground state energy is −J times the number of
neighboring pairs. It is attained on the subspace with basis |↑⟩⊗N , |↓⟩⊗N , or, equivalently,
N
with basis the |ψ± ⟩.
Given the discussion above, it is now easy to see what happens. The spin flip symmetry
of the Hamiltonian is implemented by σx⊗N , which “breaks” in the sense that it becomes
N
unphysical for N → ∞. For microscopic N , the |ψ± ⟩ are pure ground states that are
invariant under the spin flip symmetry (up to phase). As N → ∞, they remain invariant,
but they become effectively mixed. In fact, any ground state α|↑⟩⊗N + β|↓⟩⊗N becomes
a mixture of the two non-symmetric ones |↑⟩⊗N , |↓⟩⊗N . We can now connect back to the
loose definition of “symmetry breaking” in the very beginning: The restriction on physical
observables in macroscopic systems means that there is no longer a pure ground state that
shares the symmetry of the Hamiltonian.
Consider a Bose gas contained in a box of volume V . Recall from (2.33) that we are
interested in states |ψ⟩ that have a fixed density ρ = n0 /V of particles in the k = 0-mode:
1 1 1
⟨ψ|a†0 a0 |ψ⟩ = ⟨ψ| √ a†0 √ a0 |ψ⟩ = ρ.
(2.37)
V V V
In the limit V → ∞, measuring the precise occupation number
⟨ψ|a†0 a0 |ψ⟩ = V ρ → ∞
would require us to count a macroscopic number of particles. Consistent with the principle
(2.35), we reject his as unphysical. The density, however, should be measurable. Hence
we posit that an observable is physical only if it can be expressed in terms of the re-scaled
ladder operators
1 1
√ a0 , √ a†0 (2.38)
V V
as well as the ak , a†k for k ̸= 0 (with coefficients that do not depend on V , of course).
This seemingly minor restriction has dramatic effects in the limit V → ∞. Indeed,
h 1 1 i 1
√ a0 , √ a†0 = → 0, (2.39)
V V V
so that in the thermodynamic limit, the operators (2.38) commute! But then all physical
observables commute with √1V a0 (why?). This operator therefore plays the same role as
the average magnetization in the Ising model: Its eigenspaces are physically separated
in the sense that relative phases between them are not observable and no vector can be
mapped from one eigenspace to another. If √aV0 |ψ⟩ = λ|ψ⟩ then (2.37) implies that λ =
√ iθ
ρe for some θ ∈ [0, 2π)
Thus, we may always assume that the dynamics takes place in one of the eigenspaces
n a0 √ o
Hθ = |ψ⟩ √ |ψ⟩ = ρeiθ |ψ⟩ ,
V
√
where √aV0 acts like ρeiθ . This is what we set out to justify.
eiϕN̂ Hθ = Hθ+ϕ .
(the positive phase gets applied to one more particle than the negative one). Some conse-
quences:
CHAPTER 2. INDISTINGUISHABLE PARTICLES 59
By (2.40), the expectation value ⟨ √aV0 ⟩ vanishes in any state that is U (1) invariant. The
operator aV0 therefore constitutes an order parameter. Now comes a big difference to the
Ising example. On Fock space for massive non-relativistic particles, we have a second
condition for an observable to be physical: In addition to fulfilling (2.35), observables
also have to be gauge invariant (Sec. 2.3.2). Hence in this case, the order parameter is
not measurable (unlike the average magnetization, which is the central physical quantity
associated with the Ising magnet). It also means that the physical behavior of the Bose gas
can only depend on ρ, not on θ, so we are free to restrict to the case θ = 0 below.
We found that a state |ψ⟩ is pure with respect to the physical observables only if it is
contained in one of the Hθ spaces. But then, it isn’t U (1)-invariant. Only the mixed state
Z
dθ
ρ = eiϕN̂ |ψ⟩⟨ψ|e−iϕN̂
2π
is. We’re again encountering the dichotomy that states are symmetric or pure, but not both.
The effect of ρ on the energy will, in the limit V → ∞, be dominated by the first
term, which is the only one proportional to V . Thus, low-lying states will have a density ρ
minimizing that term. Setting its derivative to zero gives the relation ρ = µ/U . We keep ρ
and eliminate µ, to get
Uρ X
ϵk + U ρ a†k ak + (ak a−k + a†k a†−k ).
X
H =const. + (2.41)
2
k̸=0 k̸=0
The result of the following “2½ easy steps” is summarized in (2.46). In principle,
one can just check directly that the form of H given there is indeed equal to
(2.41). Below, we only describe a somewhat natural thought process that leads
to (2.46). If you’re in a hurry, skip ahead.
Step 1: Decouple. Start with the rightmost term. It creates / destroys pairs of particles
of opposite momentum. This suggests switching the basis of the single-particle space to
one that consists of superpositions of states moving in opposite directions. Remembering
that | ± k⟩ are represented in position space by complex exponentials that are each other’s
conjugates, the cosine / sine basis
1 −i
√ (|k⟩ + | − k⟩), √ (|k⟩ − | − k⟩). (2.42)
2 2
CHAPTER 2. INDISTINGUISHABLE PARTICLES 60
seems like a natural candidate. Let’s agree that a vector k is positive if its first non-zero
component is. Then there is exactly one positive wave vector in every pair +k, −k. For
k > 0, define the annihilation operators
1
bk = √ (ak + a−k ) (“positive k label the cosines”)
2
i
b−k = √ (ak − a−k ) (“negative k label the sines”)
2
associated with the new basis. Inverting,
1 1
ak = √ (bk − ib−k ), a−k = √ (bk + ib−k ) k > 0. (2.43)
2 2
Plugging in, the pair term decouples, as hoped:
X Uρ
ϵk + U ρ b†k bk + (bk bk + b†k b†k ) .
H = const. +
2
k̸=0
Step 2: Solve harmonic oscillator. It turns out that each summand represents a har-
monic oscillator and that a simple re-scaling of position and momentum coordinates will
put it into standard form. To see how this works, we switch to Hermitian operators for the
moment:
1 −i
X = √ (bk + b†k ), P = √ (bk − b†k ).
2 2
Abbreviating A = ϵk + U ρ, B = U ρ one directly finds
B A−B 2 A+B 2
A b†k bk + (bk bk + b†k b†k ) = P + X . (2.44)
2 2 2
Well, we know how to solve these using undergrad methods (App. A.2.1)! The transfor-
mation
r r
4 A + B A−B p
X̃ = X, P̃ = 4 P, E k = A2 − B 2
A−B A+B
is obviously canonical, [X̃, P̃ ] = [X, P ], and puts the oscillator into standard form:
1 1p r A − B r
A + B 2
2 2 2
Ek (X̃ + P̃ ) = (A + B)(A − B) P + X = (2.44).
2 2 A+B A−B
Therefore, setting b̃k = √1 (X̃ + iP̃ ), we have diagonalized H (that wasn’t too hard ,):
2
q
Ek b̃†k b̃k ,
X
H = const. + Ek = ϵ2k + ϵk 2U ρ. (2.45)
k̸=0
Step 2.5: Cleanup. Because E−k = Ek , the Hamiltonian is degenerate and any
unitary transformation within the ±k-subspaces will leave its form invariant. Choosing
1 1
ck := √ (b̃k + b̃−k ) for k > 0, ck := √ (b̃k − b̃−k ) for k < 0
2 2
CHAPTER 2. INDISTINGUISHABLE PARTICLES 61
turns out to lead to the cleanest theory. Plugging in all the nested definitions gives
r r r r
1 ϵk Ek 1 ϵk Ek
ck = uk ak − vk a†−k , uk = + , vk = − ,
2 Ek ϵk 2 Ek ϵk
q
Ek c†k ck .
X
Ek = ϵ2k + ϵk 2U ρ, H = const. +
k̸=0
(2.46)
ak = uk ck + vk c†−k . (2.47)
Discussion
We have found that the elementary excitations of the Bose gas are given by quasi-particles
created by the c†k . The ground state is the quasi-particle vacuum characterized by
ck |0⟩(q) = 0 ∀k.
It is not to be confused with the particle vacuum |0⟩(p) characterized by ak |0⟩(p) = 0! For
example, using (2.47), the expected number of particles with momentum k in the quasi-
particle vacuum is
⟨0|(q) a†k ak |0⟩(q) = ⟨0|(q) (uk c†k + vk c−k )(uk ck + vk c†−k )|0⟩(q) = vk2 .
While the above shows that quasi-particle occupation number states | . . . nk . . . ⟩(q) do
not have definite particle numbers, it turns out that they do have definite momentum! In
the exercise, you will show that c†k creates quasi-particles with momentum ℏk. Thus, E(k)
found in (2.45) describes their energy-momentum (or dispersion) relation. Compared to a
free particle, E(k) involves the additional term ϵk 2U ρ. It dominates if
r
ℏ2 ∥k∥2 ∥ℏk∥ Uρ
ϵk = ≪ 2U ρ ⇔ ≪ =: c,
2m m m
i.e. for velocities much smaller than c. In this regime, we have Ek ≃ c∥ℏk∥, that is, energy
scales linearly with momentum. Beyond that, Ek is convex (“bends upwards”, Fig. 2.4),
so that Ek ≥ c∥ℏk∥ holds in general.
As alluded to in the very beginning, this means that a particle moving through the Bose
gas at low velocity cannot slow down by transferring energy and momentum to a quasi-
particle. Quantitatively: Let M be the mass of the test particle and p its initial momentum.
Assume it excites a quasi-particle of momentum q. Then energy conservation demands
which has a solution only if the test particle has initial velocity ∥p∥/M at least c.
CHAPTER 2. INDISTINGUISHABLE PARTICLES 62
Figure 2.4: Blue line: Dispersion relation E(∥k∥) for the Bose gas. Orange line: E =
cℏ∥k∥ is a good approximation for small ∥k∥, and a lower bound for all k. The x-axis is
in units of mc/ℏ, y-axis in units of mc2 .
Our goal is to construct a quantum theory for the EM field. Since quantum mechanics
is more fundamental than classical physics, one cannot hope to derive a quantum theory
from its classical limit. “Quantization” thus always involves educated guesses.
To educate ourselves, we’ll first have another look at lattice vibrations (Sec. 2.3.1). For
both their classical and their quantum model, one can easily construct a continuum limit.
The result is a classical and a quantum field theory. Their relation will serve as a template
for quantizing other fields.
63
CHAPTER 3. FIELD QUANTIZATION AND QUANTUM THEORY OF LIGHT 64
Let’s rewrite it in a form suitable for our limit. The product N m is just the total mass,
invariantly expressed as Lρ. Also, it makes sense to label the particles not by their index
r = 1, . . . N , but by their equilibrium position x = ra ∈ [0, L]. With these substitutions,
we obtain the “displacement field”
s
ℏ X 1
ϕ(t, x) = √ (ak e−iωk t+ikx + a†k eiωk t−ikx ). (3.1)
Lρ 2ωk
k
There’s some trouble brewing in (3.2): The “constant” is k 12 ℏc|k|, which di-
P
verges. This is the first of the many infinities of quantum field theory. This one is
easy to deal with: For finite N , the sum over the ground state energies of the har-
monic oscillators is finite. Subtracting this constant from the total energy does not
alter physical predictions, so as long as we do not dynamically change the ground
state energy (e.g. by putting stress on the material in a way that affects the equilib-
rium separationa) or get into
thePrealm of general relativity. Thus, the renormaliza-
tion k ℏc|k| a†k ak + 21 7→ k ℏc|k| a†k ak , while maybe not very principled,
P
does not affect predictions and makes the continuum limit converge. So let’s adopt
this convention. (We’ll encounter more troubling infinities later).
As in Sec. 2.3.1 and App. A.2.1, the definitions so far make sense equally in classical
and in quantum mechanics. In QM, the ak ’s are annihilation operators that are taken to
act on Fock space with occupation number basis | . . . nk . . . ⟩. Classically, the ak ’s are
complex numbers and (3.1) is the most general real-valued solution of the wave equation
1
2 2
∂ − ∂x ϕ(t, x) = 0 (3.3)
c2 t
under cyclic boundary conditions.
We went through this exercise in order to find a strategy for quantizing Maxwell’s
equations. The relation between the classical and the quantum continuum model found
here suggests the following recipe for quantizing classical wave equations:
Summary
• Choose normalization such that H = k ℏωk a†k ak is the energy of the field.
P
Fields for which this program can be implemented are called free. We’ll only work
with free field in this course. General, interacting fields, are treated in the QFT courses.
How to decide whether to use Fermionic or Bosonic Fock spaces will be a major topic in
Chap. 4.
Further comments
It is also of interest to write down a momentum field π(x) which describes the continuum
limit of the Pr . Because the mass of the individual particles goes to 0 for λ → ∞, only
the momentum density defines an interesting quantity in the limit. Thus, starting from
r r
1 1 ℏm X ωk
Pr = −i (ak eikar − a†k e−ikar ),
a a N 2
k
In the continuum limit, the commutation relation (or iℏ times the Poisson bracket) between
the displacement and the momentum density fields is
−iℏ X ′ ′
[ϕ(x), π(y)] = [(ak eikx + a†k e−ikx ), (ak′ eik y − a†k′ e−ik y )]
2L ′
k,k
iℏ X ikx−k′ y iℏ X ik(x−y)
= e [ak , a†k′ ] = e = iℏδ(x − y).
L L 2π
′
k,k k∈ L Z
B = ∇ × A, E = −∇Φ − ∂t A. (3.4)
A 7→ A + ∇χ, Φ 7→ Φ − ∂t χ
with an arbitrary function χ. Here, we get rid of the ambiguity by adopting the Coulomb
gauge, fixed by the gauge condition
∇ · A(t, x) = 0. (3.5)
CHAPTER 3. FIELD QUANTIZATION AND QUANTUM THEORY OF LIGHT 66
Further, we restrict to the free-space version of Maxwell’s equation, i.e. we assume that
there are no charges or currents ρ = j = 0. In this case, the Maxwell equations become
1
Φ(t, x) = 0, 2
∂t2 − ∂x2 − ∂y2 − ∂z2 A(t, x) = 0. (3.6)
c
In a box with side length L and cyclic boundary conditions, the space of complex
solutions to Eq. (3.6) is spanned by plane waves of the form
2π 3
Ak e±iωk t+ikx , A k ∈ C3 , k∈ Z, ωk := c∥k∥.
L
The gauge condition (3.5) requires the coefficients Ak to be “transversal” to the wave
vector k:
We can take this into account by choosing, for each k, an ortho-normal basis (the polar-
ization vectors)
for the space orthogonal to k (Fig. ??). Then a general real-valued solution to the Maxwell
equations in Coulomb gauge is
r
ℏ X 1
eλ (k) akλ e−iωk t+ikx + a†kλ e+iωk t−ikx ,
A(t, x) = 3
√ (3.7)
ϵ0 L 2ωk
k,λ
where the sum is over wave vectors k ∈ 2π L Z and polarization directions λ ∈ {1, 2}. As
3
discussed before for phonons (Eq. (2.30), it is often convenient to re-arrange the sum in
(3.7) so that terms corresponding to the same complex mode are grouped together:
r
ℏ X 1
eλ (k) akλ (t) + a†−kλ (t) eikx ,
A(t, x) = √ (3.8)
ϵ0 L3 2ω k
k,λ
k
The time evolution of the E and B-fields follows by applying (3.4). Setting κ = ∥k∥ ,
r r
ℏ X ωk
eλ (k) akλ e−iωk t+ikx − a†kλ e+iωk t−ikx
E(t, x) = i (3.9)
ϵ0 L3 2
k,λ
r r
ℏ X ωk
eλ (k) akλ (t) − a†−kλ (t) eikx ,
=i (3.10)
ϵ0 L3 2
k,λ
r r
ℏ X ωk
κ × eλ (k) akλ e−iωk t+ikx − a†kλ e+iωk t−ikx , (3.11)
B(t, x) = i
ϵ0 L3 c2 2
k,λ
r r
ℏ ωk
κ × eλ (k) akλ (t) + a†−kλ (t) eikx .
X
=i 3 2
(3.12)
ϵ0 L c 2
k,λ
for the energy of the EM field, one finds after some calculations
X
Hem = ℏωk |akλ |2 .
k,λ
The A-field is thus of the form discussed in Sec. 3.1 so that one can perform a free-field
quantization. From now on, we will thus treat the akλ ’s as annihilation operators for a
collection of harmonic oscillators acting on the Fock space Hem .
Notation
For increased legibility, we’ll now write k for (k, λ), with the convention that −k corre-
sponds to (−k, λ). Also, for an element | . . . nk . . . ⟩ of the occupation number basis of
the harmonic oscillators, write |{n}⟩.
Zero on average does not imply zero with probability one. Indeed, compute the variance:
⟨{n}|E(x) · E(x)|{n}⟩
√
−ℏ X ωk ωk′ ′ ′
ek · ek′ ⟨{n}| ak eikx − a†k e−ikx ak′ eik x − a†k′ e−ik x |{n}⟩
= 3
ϵ0 L ′
2
k,k
ℏ X X ℏωk
= 3
ωk ⟨{n}|ak eikx a†k e−ikx + a†k e−ikx ak eikx |{n}⟩ = (nk + 1/2),
2ϵ0 L ϵ0 L3
k k
has finite fluctuations, if ρ is sufficiently spread out. This is physically plausible. The
sum diverges because there are infinitely many summands with increasingly large wave
vector k. But these correspond to fields that oscillate rapidly, so that cancellations over
any finite region cause the net force to be small. Mathematically speaking, we found again
CHAPTER 3. FIELD QUANTIZATION AND QUANTUM THEORY OF LIGHT 68
Figure 3.1: Net force is zero, so QFT would presumably be OK with it. (Scene from the
Caucasian Chalk Circle, as depicted on in this poster).
(c.f. Sec. 2.2.2) that field operators should be thought of as distributions that have to be
integrated against smooth functions to be meaningful.
Is this a satisfactory solution?
Yes, in that it gives a good reason for why extended bodies don’t regularly get acceler-
ated into orbit due to vacuum fluctuations. No, because it paints quite the violent picture
of the microscopic world, where, supposedly, unbounded forces constantly tear at objects
and only cancellations prevent mayhem (Fig. 3.1). It sure feels like an indication that our
current theories of light and matter become invalid at very short length scales.
A coherent state |{α}⟩ of the entire EM field is one where each mode k ≡ (k, λ) is in
a coherent state |αk ⟩. Let’s compute the expectation value of the E-field:
r
2ℏπ X √
⟨{α}|Ê(x, t)|{α}⟩ = −i ωk ek ⟨{α}|(a†k e−ikx+ωk t − ak eikx−ωk t )|{α}⟩
L3
k
r
2ℏπ √
ωk ek αk† e−ikx+ωk t − αk eikx−ωk t
X
= −i 3
L
k
It acts on a total Hilbert space H = Hpar ⊗ Hem that is the tensor product between the
spaces of the particle Hpar = L2 (R3 ) and of the field Hem . Here, A(X) is defined by
(3.7), where the ladder operators ak , a†k act on Hem , but the parameter x is evaluated on
the position of the particle. In other words
x ∈ R3 , |ψ⟩ ∈ Hem .
A(X) |x⟩|ψ⟩ = |x⟩ A(x)|ψ⟩ (3.14)
1 2 P2 q q2
P − qA(X) = − (P A(X) + A(X)P ) + A(X)2 .
2m 2m 2m 2m
As a first step, we will neglect the square A(X)2 , which describes two-photon processes.
Next, verify that in Coulomb gauge, momentum and the vector potential commute:
P A(X)|ϕ⟩ = −iℏ∇ A(X)|ϕ⟩ = −iℏ(∇ · A(X))|ϕ⟩ + A(X) · P |ϕ⟩ = A(X) · P |ϕ⟩,
P2 q
ℏωk a†k ak ,
X
Hpar = + U (X), Hem = HI = − P · A(X).
2m m
k
So far, we have worked in a “mixed picture”, where the EM field was expressed in
second quantization, but only a single particle in first quantization was present. We now
also pass to the second-quantized picture for the particle. To this end, let {|ϕi ⟩}i be an
eigenbasis of Hpart and denote the corresponding creation operators as b†i , so that
Ei b†i bi .
X
Hpart =
i
It remains to treat the interaction Hamiltonian. Even without doing any calculations, we
can see from (3.8) that HI will be of the form
ijk
ij
m ij
r Z
q ℏ X 1
ϕ†i (x) √ (ak + a†−k )e−ikx ek · P ϕj (x) d3 x b†i bj (3.15)
=− 3
m ϵ0 L 2ωk
ijk
CHAPTER 3. FIELD QUANTIZATION AND QUANTUM THEORY OF LIGHT 70
so that
r Z
q ℏ
gijk =− ϕ†i (x)e−ikx ek · P ϕj (x) d3 x.
m ϵ0 L3 2ωk
The wave lengths associated with atomic transitions are much longer than the length scales
of the atoms themselves. This justifies the dipole approximation, in which the dependen-
cies of the EM field on position is neglected by substituting eixk ≃ 1. Then
r Z
q ℏ
gijk ≃ − ϕ†i (x)ek · P ϕj (x) d3 x.
m ϵ0 L3 2ωk
In the expression, the momentum operator acts energy eigenfunctions in position represen-
tation. One can eliminate momentum using
iℏ m
[X, Hpart ] = P ⇒ P = [X, Hpart ],
m iℏ
so that the coupling constants become
r r
−q ℏ 1
gijk = ϕ i (ek · P ) ϕ j ⟩ = iq (Ej − Ei ) ϕi (ek · X) ϕj ⟩.
m ϵ0 L3 2ωk ϵ0 L3 2ℏωk
(3.16)
Because this expression is symmetric under inversion of k, the minus sign of a†−k in (3.15)
can be dropped, so that
Here, |i⟩ = |ϕ2,l,m ⟩|0⟩ (we’ll choose l and m later). The delta function ensures that total
energy is conserved. Because the EM field is already in its lowest-energy state, only final
states where the atom has transitioned into its ground state and has emitted photons are
permitted. Because, by Eq. (3.17), HI is linear in ladder operators, the coupling matrix
element is non-zero only for final states that contain a single photon: |f ⟩ = |ϕ1,0,0 ⟩|k⟩,
where k = (k, λ) labels the state of the emitted photon. (This is an artifact of the approxi-
mations we have made – multiple-photon processes are, in principle, possible).
The energy difference between the two lowest levels (the Lyman-α line) is (Sec. A.2.3)
1 3 3α2 2
E1,2 := 1 − EI = EI = mc .
4 4 8
The photon energy is ℏωk = ℏc∥k∥ and energy conservation is thus equivalent to ∥k∥ =
E1,2
ℏc .
CHAPTER 3. FIELD QUANTIZATION AND QUANTUM THEORY OF LIGHT 71
It follows that the integral in (3.18) is over states labeled by f = (k, λ), where k lies
E1,2
on a sphere of radius ℏc . For fixed λ, the density of states in k-space is ρ(k)d3 k =
L 3 3
2π d k. Switching to spherical coordinates,
3 3 3
E2
L L L
ρ(k) d3 k = d3 k = r2 dr sin θ dϕ dθ = dE sin θ dϕ dθ.
2π 2π 2π ℏ3 c3
e2 E1,2 2
|⟨ϕ2,l,m |⟨0|HI |ϕ1,0,0 ⟩|k⟩|2 = ϕ2,l,m (ek · X) ϕ1,0,0 ⟩ .
2ϵ0 L3
To evaluate the matrix element, we need to borrow some results on atomic eigenstates.
Four facts: (F1) The dipole matrix elements ⟨ϕ2,l,m |e · X|ϕ1,0,0 ⟩ are non-zero only
if l = 1. (F2) ⟨ϕ2,l,0 |x|ϕ1,0,0 ⟩ = ⟨ϕ2,l,0 |y|ϕ1,0,0 ⟩ = 0. (F3) States that differ only
in the magnetic quantum number m can be mapped onto each other by a rotation.
(F4) Using the explicit form of the functions ϕn,l,m (x), a tedious integral gives
215 2 ℏ2 215
|⟨ϕ2,1,0 |z|ϕ1,0,0 ⟩|2 = a0 = .
310 m2 c2 310 α2
Fact (F1) implies that in first-order perturbation theory, the states |ϕ2,l,0 ⟩ have infinite
life time unless l = 1, i.e. only the 2p → 1s transition can be computed in this approx-
imation. By (F3), m can be changed by rotating the atom. But the life time of a level is
independent of the atom’s orientation and hence of m. We trust that our approximations
reproduce this rotational invariance (they do), and compute Γ only for m = 0:
3
e2 X
Z
2π 2 L E1,2
Γ= ϕ2,1,0 (ek · X) ϕ1,0,0 ⟩ sin θ dϕ dθ.
ℏ 2ϵ0 L3 2π ℏc
λ
Then by (F2, F4), only the z-component of ek · X = eλ (k) · X gives a non-zero contri-
bution, namely
X 2 215 2 X
ϕ2,1,0 (eλ (k) · X) ϕ1,0,0 ⟩ = a (eλ (k))2z .
310 0
λ λ
k
To evaluate the sum, note that with e0 (k) := ∥k∥ , the set {eλ (k)}2λ=0 forms on ortho-
normal basis. Expressing the length-squared of ez in that basis gets us
2
X 2
X 2
X
1= 2
|eλ (k) · ez | = cos θ + 2
(eλ (k))2z ⇒ (eλ (k))2z = sin2 θ.
λ=0 λ=1 λ=1
Using the identity sin3 θ dθ = − sin2 θ d(cos θ) = (z 2 − 1) dz, the integration results in
Z 2π Z π Z 1
4
sin3 θ dθ dϕ = 2π (z 2 − 1) dz = 2π .
ϕ=0 θ=0 −1 3
CHAPTER 3. FIELD QUANTIZATION AND QUANTUM THEORY OF LIGHT 72
To express all quantities in relativistic units, eliminate e2 in favor of the fine structure
2
constant α = 4πϵe0 ℏc . Now brew some coffee, close the door, and plug in:
3
2π α4πϵ0 ℏc ℏ2 215 L 3α2 mc2
4
Γ= 2π (don’t think, just copy)
ℏ 2ϵ0 L3 m2 c2 310 α2 2π 8ℏc 3
= 217 8−3 3−8 π 0 L0 ϵ00 α5 ℏ−1 m1 c2 (sort by units)
8
2 mc2
= α5 = 6.27 × 108 Hz = 1/(1.6 ns) (yeah, go ahead and click).
3 ℏ
Amazingly, given the number of approximations made, this is the accepted value [Radzig,
Smirnov, Reference Data on Atoms, Molecules, and Ions, Table 7.4].
treats time and space very differently. It is indeed not relativistically invariant. Therefore,
in the late 1920s, it become a popular past-time to come up with new wave equations with
the goal of finding a relativistic quantum theory of single particles.
The results of this program proved to be important, but not for the reasons their creators
intended. It turns out that the very idea of constructing a quantum theory of a single
relativistic particle leads to conceptual problems. These include the difficulty to define
a position operator that doesn’t lead to super-luminal signaling, and a Hamiltonian that
doesn’t exhibit states of unbounded negative energies. (See also Fig. ?? for a heuristic
argument that suggests that we shouldn’t be surprised by these issues).
Let’s take a closer look at the first problem. If we accept that there is no reasonable
position operator associated with a relativistic particle, we immediately face the next chal-
lenge: To avoid faster-than-light influences, physical interactions are local. For example,
in Eq. (3.14), the coupling term between a particle and the EM field was given by the
field A evaluated at the position of the particle (represented quantum-mechanically by its
position operator X). So, having thrown out the position operator, how does one couple
relativistic theories?
Fortunately, we already know one realtivistic quantum theory: electro-magnetism. So
we can just check how things work there and copy them. And indeed, in Chapter 3, we
did not associate a notion of position with a photon. Instead, locality was incorporated
by specifying a collection of fields (E, B, . . . ) that represent the properties of the system
that are measurable at any given point (t, x) in space-time. Citing [Quantum Field Theory
Lectures of Sidney Coleman] “If we know where the observations are, we don’t have to
know where the particles are.”
This (radical!) shift of perspective also works for relativistic particles. We forget about
“electron-the-particle” as a fundamental object, and instead look for an “electron field”
that, as for electro-magnetism, generates at every point in space-time the observables for
those “electron-properties” that are locally measurable. Just like phonons and photons
before, particles get re-introduced as excitations of the field Hamiltonian.
Now that we decided that we actually want to construct quantum field theories, it would
sure be helpful to have a few relativistic wave equations lying around to which we could
73
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 74
apply the free-field quantization procedure that served as well before. It is for this purpose
that the single-particle equations we previously tossed out as unsatisfactory get a second
lease on life! Re-interpreted as “classical” fields, their quantized versions give satisfactory
theories of relativistic particles.
One more conceptual point before we’ll look at the details: Just like the phonon field
(3.1) and the fields associated with photons (3.9 – 3.12), field operators for massive parti-
cles will also be linear combinations of creation and annihilation operators. The fact that
the theory now contains processes that change the number of massive particles is to be
expected in the relativistic regime. That is, unless the particles are charged! Unlike parti-
cle number N , the total charge Q seems to be conserved by all physical interactions. One
would get around this problem if every time a particle with charge q is created, another
one that is identical except for having charge −q got destroyed.
It may feel like we’re just making things up at this point. Not so! Nature has indeed
solved the problem of charge conservation in quantum fields by introducing such anti-
particles. And remember the negative energies we complained about before? It turns out
that upon free-field quantization, the wave function that would have negative energies if
interpreted as quantum states, become positive-energy modes of the quantum field, and are
associated with anti-particles.
Now for the details.
Ta : x 7→ x + a.
Rotations. For a point n ∈ R3 on the unit sphere and an angle θ, let R(n, θ) be the
3 × 3 matrix rotating points about n by θ. For example:
cos θ − sin θ 0
R(ez , θ) = sin θ cos θ 0 .
0 0 1
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 75
Figure 4.1: Lorentz boosts in 1+1 dimensions. Left: There are two time-like world-
lines through the origin (dotted lines). Lorentz boosts have to leave them invariant. This
is achieved in particular by maps that are diagonal in a basis (bluearrows) of time-like
vectors. Let’s thus consider the linear map B(α) that multiplies 11 by eα and on −1 1
cosh(α) sinh(α)
by e−α (red arrows). In the t-x-basis, B(α) = sinh(α) cosh(α) . Middle: The vertical
world-line represents a particle at rest. The slanted lines are its images under B(α) for
sinh(α)
various values of α. They represent uniform motion with velocity β = ∆x ∆t = cosh(α) =
tanh(α) ∈ (−1, 1) (in units of c). We can also track the image of one event (e.g. (1, 0),
red dot) under B(α). It traces out the branch of a hyperbola with the asymptotes given by
the time-like lines. Right: In general, the orbit of a vector under all boosts B(α) forms the
branch of a hyperbola. Hyperbolas are thus to Lorentz boosts what circles are to rotations.
1 0t
ΛR = .
0 R
Translations and rotations work the same in non-relativistic and in relativistic physics.
More interesting are boosts, which add a constant velocity. Non-relativistically, these are
implemented by Galileo boosts
1 0 0 0
v 1 1 0 0
t t
Gv = v 2 0 1 0
: →
7 .
x x + tv
v3 0 0 1
Gv Gw = Gv+w
and can, in particular, take arbitrarily large values. Relativistic theories are not invariant
under these transformations, as they violate the central axioms of special relativity: There
is no motion faster than the speed of light, and objects traveling at the speed of light do so
in every coordinate system.
These axioms imply that a change of relative motion has to be described by Lorentz
boosts. We won’t derive them here (see e.g. [Sexl-Urbantke]), but see Fig. 4.1 for some
motivation. A Lorentz boost along the ex -axis by a velocity β ∈ (−1, 1) (in units of the
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 76
Thus xµ (s) represents a word-line traversing space-time slower than / at / faster than the
speed of light depending on whether 12 − ∥v∥2 is positive / zero / negative. To express this
geometrically, define the Minkowski inner product as
1 0 0 0
0 −1 0 0
⟨u, v⟩ = uT ηv, η= 0 0 −1 0 .
0 0 0 −1
Then 12 − ∥v∥2 = v T ηv, i.e. the character of the curve xµ (t) is given by the “squared
Minkowski length” of its tangents.
A 4 × 4-matrix Λ leaves the Minkowski form invariant if
One can check directly that the elements of the Lorentz group leave the Minkowski form
invariant. The converse is also true: only Lorentz transformations have this property. This
explains the notation O(1, 3) for the Lorentz group: It is the group of isometries of the
symmetric bilinear form with 1 negative and 3 positive elements on the main diagonal,
generalizing the notion of O(n), the symmetry group of the standard Euclidean form.
Gradient fields: Associated with every scalar function ϕ(x) is its gradient field u(x)
with components uµ (x) = ∂µ ϕ(x). A symmetry g will adjust the argument as for a scalar
field, and the components as for a covariant vector:
u′ (x) = Λ−T u g −1 (x) .
Wait. Didn’t we say that covariant vectors describe functionals? What do gradients have to
do with functionals? It turns out that geometrically, the right way to think about a gradient
∂µ ϕ is as a linear functional that maps a direction v µ to the directional derivative v µ ∂µ ϕ.
Contravariant vector fields assign a direction to every point
in space-time. An exam-
ple is the electro-magnetic potential Aµ (x) = ϕ(x), A(x) for which one can show the
transformation law
A′ (x) = Λ A g −1 (x) .
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 78
In QM, the momentum pi is associated with the operators −i∂i . Energy E is associated
with the Hamiltonian H, but by virtue of the Schrödinger equation i∂t ψ = Hψ, we can
also take E = i∂t . Thus, we arrive at the correspondence principle (justified in Sec. ??)
Applying this substitution to (4.5) and letting it act on a wave function ψ gives
3
1 X 2
i∂t + ∂ ψ = 0,
2m i=1 xi
We could apply the correspondence principle to this expression and take the square root
of the resulting operator (as in Sec. A.1.10). However, the result would no longer be a
differential operator (i.e. a polynomial in xi ’s and ∂xi ’s). To maintain a close analogy to
the Schrödinger equation, we’d prefer to look for a differential equation, which we can
achieve by first squaring the relation
3
X
E2 − p2i − m2 = 0
i=1
indexed by momenta p ∈ R3 .
1 The argument follows the near-identical presentations in the quantum field theory books by Coleman and
by Lancastar-Blundell. Unfortunately, as far as I can tell, their presentation contains a mistake. Using the
notation below: They claim that the integral over the “large arcs” gives no contribution without introducing the ϵ-
regularization that we need. I can’t see how this holds. In particular, Lancaster-Blundell’s invocation of Jordan’s
Lemma does not seem give a finite bound. If anyone can explain to me why their argument is actually correct,
I’m game.
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 81
Mathematica can’t solve that integral – never a good sign. The integrand is oscillating
and the non-linear dependency of the phase on the integration variable p means that one
cannot tell by inspection whether cancellations will cause the integral to vanish for x > t,
as needed to avoid superluminal signaling.
The matter can be decided by an exercise in contour integration. Let’s recap some
basics. √Introduce polar coordinates z = ρeiϕ , ϕ ∈ (−π, π] in the complex plane. The
√
choice z = ρeiϕ/2 fixes a sign for the square root. It introduces a discontinuity on the
non-positive real axis R≤0 , where eiϕ/2 changes from from +i to −i as we cross from the
upper to the lower half-plane. This is called a branch cut of the square root. (While we’re
free to change its position, it is easy to see that a discontinuity cannot be avoided).
for ϵ > 0 and with p = u+iv now a complex argument. The integrand in (4.9) is recovered
as limϵ→0 Iϵ (p) for real p. The rays ±i(µ + R≥0 ) are mapped to the branch cuts of the
square root. Excluding those values, Iϵ is again an analytic function.
We went through all the trouble because the integral of an analytic function along a
closed contour in the complex plane vanishes.
Thus, the integrals over γ2 , γ6 do not contribute. Neither does the integral over γ4 , because
the length of γ4 goes to 0 as R → ∞, while Iϵ (p) remains bounded along that path.
Thus the only non-zero contributions come from γ3 and γ5 :
Z Z ∞ √
2 2
lim+ lim Iϵ (p) dp = ∓ ive±t v −m −vx d(iv).
ϵ→0 R→∞ γ3 /γ5 m
In this formulation, the integrand is non-negative, which implies that ψ(t, x) ̸= 0 for all
x > t. Not what we were hoping for in a relativistic theory.
which become a free quantum field upon re-interpreting the ak ’s as annihilation operators.
Should we use Bosonic or Fermionic Fock space? Are we even free to chose? We’ll show
how to use locality considerations to settle that matter in the next section.
Let’s apply some cosmetic changes: (1) The normalization constant N is only mean-
ingful
√ for concrete physical applications. We get the √ cleanest formula if we set it to
1/ V (i.e. we’re now measuring the field in units of V /N ). (2) A “box with side
length L” is√notPLorentz-invariant. As per (A.22), we take the limit L → ∞ and replace
by the integral (2π)−3/2 over all of R3 . (3) Introduce the 4-vector
R
the sum 1/ V
p = (Ep , p) = ℏ(ωk , k) = (ωk , k). This yields a more manifestly relativistic formula:
d3 p
Z
1 µ µ p
ϕ(x) = p ap e−ipµ x + a†p eipµ x , Ep = m2 + ∥p∥2 .
2Ep (2π)3/2
In particular, no unbounded negative energies exist in the field interpretation. We’re one
problem down!
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 83
4.2.3 Microcausality
What about the second problem, those pesky superluminal particles? A field theory doesn’t
directly make statements about “positions of particles”. But the relativistic ban on signals
propagating faster than light does have implications for fields.
To see what the right consistency condition is in this case, consider two space-like
separated regions A, B and two observables Â, B̂ that can be expressed in terms of the
field at points in A and B respectively. Relativity demands that any interaction performed
in region A must not influence measurements taking place in region B. This is the case if
and only if [Â, B̂] = 0, a condition known as microcausality.
How can we ensure this condition? Well, assume the fields satisfy
⟨x − y, x − y⟩ < 0 ⇒ [ϕ(x), ϕ(y)]ζ = 0 (4.10)
for either ζ = + (commutators) or ζ = − (anti-commutators). In the first case, the field
operators themselves fulfill the microcausality condition, which therefore also holds for
general observables built from them.
Let’s look at the second, trickier, case. Because two minuses make a plus, operators
Â, B̂ in space-like separated regions now commute if they are polynomials of even degree
in the fields. If we declare that only operators of this type are physically observable, then
the anti-commutator case, too, gives rise to a causal theory.
Here are some general comments that apply whenever the field operators are linear
in creation/annihilation operators – which is certainly the case when they arise from our
quantization rule:
• Unsurprisingly, the commutator version of (4.10) can only hold if the ladder opera-
tors act on a Bosonic Fock space, and the anti-commutator case only on a Fermionic
Fock space.
• It thus follows that if the field operators are directly observable, the field must be
Bosonic. This explains why we used a Bosonic space for the EM field (because the
E-field is observable).
• An even-degree polynomial in the field operators can be expanded as an even-degree
polynomial in ladder operators. A product of an even number of ladder operators
can change N only by an even amount. In other words, they leave the parity (−1)N
invariant. In this way, the constraint on physical Fermionic observables can be ex-
pressed as a superselection rule in the sense of Sec. 2.3.2): Physical observables
have to commute with the Fermion parity operator.
• It seems wasteful to introduce Fermionic quantum fields, only to declare that they
are unobservable, and it is certain polynomial expressions in them that are actually
physical. But there’s precedent to the idea that seemingly superfluous constructs
make a theory easier, e.g. the vector potential in classical electrodynamics. One
can formulate the theory in such a way that the primary objects are the physical
observables rather than unobservable fields. This program goes by the name of
algebraic quantum field theory. While conceptually clean, this perspective comes
with its own difficulties and has remained somewhat niche.
Let’s now apply these general considerations to the Klein-Gordon case. To this end,
define the annihilation / creation parts of the field as
Z 3 Z 3
+ 1 −ipµ xµ d p − 1 † ipµ xµ d p
ϕ (x) = p ap e , ϕ (x) = ap e .
(2π)3/2
p
2Ep 2Ep (2π)3/2
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 84
The use of the “−” sign to denote the part of the field that comes with creation oper-
ators and has a positive exponent is counter-intuitive (or, in the words of renowned
Harvard professor Sidney Coleman “completely bananas”), but it’s established no-
tation stemming from the fact that in the (now abandoned) single-particle wave
function interpretation, ϕ− corresponds to negative energy solutions.
We don’t yet know whether the KG field describes Bosons or Fermions and will there-
fore treat both cases. The commutator
Z
1 µ ′ µ
[ϕ+ (x), ϕ− (y)]ζ = 3
p e−ipµ x +ip µ y [ap , a†p′ ]ζ d3 p d3 p′
(2π) 2 Ep Ep′
1 −ipµ (xµ −yµ ) d3 p
Z
= e =: ∆+ (x − y)
2Ep (2π)3
of the Fermionic
commutators to be zero. It turns out that this works in the Bosonic, but not in the Fermionic
case. Indeed, (4.8, 4.9) give, for space-like x,
Z ∞
i 1
ve−vx sinh t v 2 − m2 dv ∈ R.
p
∂t ∆+ (t, x) = − ψ(t, x) = − 2
2 (2π) 2x m
Z −ip·x
e
∆+ (0, x) = d3 p ∈ R (because p 7→ −p conjugates the integrand).
2(2π)3 Ep
Integrating along the (space-like) path (τ, x), τ ∈ [0, t], we find that ∆+ is real for space-
like arguments.
Causality thus implies that the KG field describes Bosons!
4.2.4 Anti-particles
Unlike particle number, charge seems to be conserved in every interaction. There is thus
one more superselection rule we have to accommodate: Physical observables have to com-
mute with the charge operator
Z
X
(l) (l)
† (l) 3
Q= ql N , N = a(l)
p ap d p
l
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 85
where the sum is over all fields, with ql the charge carried by particles of type l and N (l)
the total number operator for particles of this type.
What are the consequences of the charge superselection rule? Adding / removing
charged particles obviously does not conserve Q. Therefore, any expression that is lin-
ear in ladder operators does not commute with Q and thus cannot represent a physical
observable. (For example, because the E-field is observable and linear in ladder opera-
tors, a photon cannot be charged. OK, we probably knew that before).
Just like the Fermion parity superselection rule, conservation of charge therefore forces
us to build physical observables from polynomials in the fields. But this time, it is far less
obvious how to write down polynomials in the field operators that commute with Q.
Nature, fantastically, solves this by associating with each particle an anti-particle of
opposite charge so that, whenever a particle is created, an anti-particle is destroyed.
Here are the details. We start with two KG fields, ϕ(i) (x), i = 1, 2. They are indepen-
dent, with Hilbert space H = H(1) ⊗ H(2) , and Hamiltonian H = H (1) + H (2) . Combine
them to form a complex field that has one of the fields as its real and one as its imaginary
part2 :
ϕ(1) + iϕ(2)
ϕ(x) = √ . (4.11)
2
To analyze the complex field, introduce new annihilation operators
(1) (2) (1) (2)
ap + iap ap − iap
ap := √ , bp := √ .
2 2
This corresponds to a unitary basis change in the single-particle subspace of the joint
Hilbert space, similar to the basis change from the cosine/sine’s to complex exponentials
used in (2.43). The complex field takes on the form
d3 p
Z
1 µ µ
ϕ(x) = p ap e−ipµ x + b†p eipµ x . (4.12)
2Ep (2π)3/2
It’s customary to say that the a†p ’s create particles and the b†p ’s anti-particles (though that’s
an arbitrary assignment, as we haven’t put enough physics into the model to distinguish
one).
With this definition, the complex field (4.12) becomes a superposition of processes
destroying particles or creating anti-particles. Since both processes involve changing the
charge by −q, it is not surprising to find
Z
[Q, ap ] = q [a†p′ ap′ , ap ] d3 p = −qap , [Q, b†p ] = −qb†p ⇒ [Q, ϕ(x)] = −qϕ(x)
2 This is common terminology, though “non-hermitian field”, “hermitian part” and “anti-hermitian part”
for suitable coefficients αi , m. The operator H should represent the energy which, by (4.3)
satisfies E 2 = m2 + ∥p∥2 . Invoking the correspondence principle, we thus demand
3
X 2
H2 = − i αi ∂i + mβ
i=1
3
X 3
X 3
X
(αi )2 ∂i2 + m2 β 2 − αi αj + αj αi ∂i ∂j − im αi β + βαi ∂i
=−
i=1 i̸=j=1 i=1
3
!
X
=− ∂i2 + m2 ,
i=1
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 87
which is equivalent to
The diagonal case i = j says that the αi ’s square to identity, while the off-diagonal case
i ̸= j says that they anti-commute. There is clearly no way to satisfy these conditions
using numbers, but squaring-to-identity and anti-commuting sounds a lot like what Pauli
matrices do.
Before presenting a solution in terms of Paulis, let’s re-write the condition to look more
relativistic. Assume we could find four matrices γ 0 , ..., γ 3 such that
[γ µ , γ ν ]+ = 2η µν 1, (4.15)
i.e. that anti-commute and square to ±1. Then one may check directly that
β = γ0, αi = βγ i (4.16)
would solve (4.14). (In mathematical terms, the relations (4.15) say that the gamma matri-
ces are generators of the Clifford algebra associated with the Minkowski form).
To find a solution to (4.15), start with the Pauli matrices
1 0 0 1
1 = σ0 = 0 1 , X = σ1 =
1 0
,
0 −i 1 0
Y = σ2 = , Z = σ3 = .
i 0 0 −1
There are only three non-trivial Pauli matrices, while we need four γ matrices. But tensor
products of two Paulis give us enough freedom. For example, one can verify directly that
a solution to (4.15) is given by
γ 0 = Z ⊗ 1,
γ 1 = i Y ⊗ X,
γ 2 = i Y ⊗ Y,
γ 3 = i Y ⊗ Z.
1 0
0 σi
γ0 = , γi = . (4.17)
0 −1 −σi 0
This solution is called the Dirac representation of the γ-matrices. Other representations
can be obtained by a unitary change of basis in C4 . For example, we’ll later have use for
the Weyl representation, which has the same γ i , but replaces γ 0 by
0 1
γWeyl = X ⊗ 1 =
0
1 0 .
Now put things together. Start from i∂t ψ = Hψ, plug in the ansatz (4.13), multiply
with β from the left, and use (4.16) to pass to the gamma matrices, to obtain the Dirac
equation
iγ µ ∂µ − m ψ = 0.
(4.18)
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 88
The compact notation hides quite a lot of complexity. Indeed, because the gamma’s are
4 × 4-matrices, the Dirac equation acts on vector-valued wave functions. Explicitly:
ψ1 (x)
(i∂t − m)1
+iσ · ∇ ψ2 (x)
=0
(−i∂t − m)1 ψ3 (x)
−iσ · ∇
ψ4 (x)
in terms of the 2 × 2 matrices
3
X ∂z ∂x − i∂y
σ·∇= σi ∂xi = .
∂x + i∂y ∂z
i=1
That doesn’t look very transparent. We’ll check whether one can make physical sense
of it in the next sections.
Some comments:
• We have seen that, while the Dirac equation nominally acts on four-component wave
functions, it is better to think of ψ as a two-component function that has been em-
bedded into C4 . Which two-dimensional subspace of C4 is used depends on the
sign of the energy and the value of p. This is interesting, as “two component wave
function” might remind you of the theory of spin- 12 degrees of freedom. We will
indeed see in a moment that the Dirac equation describes spin- 12 particles.
• p
Recall how in the Klein-Gordon case, we initially wanted to implement the root
m2 + ∥p∥2 , but settled for the squared relation because we didn’t want to take the
root of differential operators? Dirac gets around this using the remarkable property
√
eigs(γ µ pµ ) = ± pµ pµ of Clifford algebras. Note that γ µ pµ is linear in momen-
tum, but still has eigenvalues that show the right square root behavior! The price to
pay for this trick is having to switch to a higher-dimensional space of wave func-
tions.
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 90
The lecture notes are fairly complete until this point, and then again from the Ap-
pendix on. The remainder until the Appendix is under active construction, though.
CHAPTER 4. RELATIVISTIC QUANTUM MECHANICS 91
The big question is whether one can choose Λ 7→ SΛ such that ψ ′ is a solution of the Dirac
equation whenever ψ is. Only in this case does the Dirac equation define a set of solutions
that is Lorentz-invariant. Plugging the ansatz for ψ ′ into the Dirac equation gives
The final expression equals the Dirac equation for ψ if and only if
SΛ−1 γ µ SΛ = Λµ α γ α (4.21)
ν
(because Λµ α Λµ ν = (ΛT Λ−T )α = δα ν ). To see whether this is possible, let’s recall
some basics from the quantum theory of angular momentum.
Uθ,ei = e−iθJi .
Obvious question: What are the conditions on the generators Ji such that (4.22) holds?
Using the Baker-Campbell-Hausdorff formula, one can show that (4.22) is essentially3
equivalent to
[Ji , Jj ] = [Li , Lj ].
In other words, commutator relations encode the group law of symmetries. This partly
explains the obsession of physicists with commutators. .
In the particular case of the rotation group, recall (or check)
[Li , Lj ] = iϵijk L3
defines a projection representation of the rotation group. This is the spin- 21 -representation.
We will make use of the following covariance property of the Pauli basis:
UR† σ i UR = Ri j σ j . (4.23)
Note: Common (admittedly illogical) convention has it that the placement of indices
on Pauli matrices has no significance (i.e. σ µ = σµ ), while for γ-matrices, γµ = ηµν γ ν .
This formula is, in fact, the reason that the Bloch representation of rotations works:
(a′ )i = tr σ i UR |ϕ⟩⟨ϕ|UR
† † i
= tr UR σ UR |ϕ⟩⟨ϕ| = Ri j aj .
It is sufficient to prove the first-order version:
θ i θ i i i j
∂θ |0 ei 2 σ σ j e−i 2 σ = [σ , σ ] = −ϵijk σ k ,
2
which should be compared to the transformation of a vector under rotations
j
∂θ |0 e−iθLi v = (ei × v)j = ϵjik v k = −ϵijk v k .
And likewise:
0 1 0 0 0 0 1 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0
K1 = i 0
, K2 = i , K3 = i .
0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0
σ̄ 0 = σ 0 , σ̄ i = −σ i
Then
A†Λ σ µ AΛ = Λµ ν σ ν , A−1 µ −† µ
Λ σ̄ AΛ = Λ ν σ̄
ν
We have already verified the formula for rotations. Now let Λ be a pure Lorentz
boost along the ei -axis.
i
σ µ=0
θ σi µ θ σi 1
∂θ |0 e 2 σ e 2 = [σ i , σ µ ]+ = σ0 µ=i ,
2
0 else
0 else
Weyl Spinors
Set
A−†
Λ 0
SΛ = .
0 AΛ
Then
A†Λ A†Λ σ µ AΛ
−†
σµ
0 0 AΛ 0 0
SΛ−1 γ µ SΛ = = µ −†
= Λµ ν γ ν
0 A−1
Λ
σ̄ µ 0 0 AΛ A−1
Λ σ̄ AΛ 0
as required.
Non-relativisitic limit
Now we assume that we are in the non-relativistic limit where all energies are small com-
pared to the rest mass:
E − m ≪ m, |qAµ | ≪ m, ...
Then
E − qϕ + m ≃ 2m, (4.28)
Thus, in the non-relativistic regime, the “small spinor” ψs has a much smaller norm than
the “large spinor” ψl and can therefore be neglected. We thus focus on (4.27), which, using
(4.28) becomes
1
(E − m)ψl = σΠ)2 ψl + qϕ ψl ,
2m
the Pauli equation.
You may be able to see the Schrödinger equation lurking through. The left hand side
factor is just the non-relativistic energy. The right rightmost term is he potential energy.
The middle term requires some more work, but it is quadratic in momenta, which looks
promising!
About that middle term. A tedious calculation gives
2
σΠ)2 = P̂ − qA − qσ · B.
Why did we break up the coupling constant q/(2m) into two factors µB times g? To
explain that, assume the magnetic field is constant and of the form
0
B = 0.
B
Then
2
= −∇2 + iq ∇ · A + A · ∇ + q 2 A2
− i∇ − qA
For the middle summand, realize that Ai commutes with ∂i so that it can be written as
−2qA · (−i∇) = −q xPy − yPx B = −qL · B,
97
Appendix A
In this chapter, we recall some facts that should be familiar from linear algebra and intro-
ductory quantum mechanics courses. The textbook Quantum Mechanics by L. Ballentine
is a good source for this material.
α, β, γ ∈ H
as well as
⟨α|β⟩ = ⟨β|α⟩. (A.3)
From this, it follows that
i.e. the inner product is anti-linear w.r.t. the first entry and linear w.r.t. the second one.
Beware that mathematicians usually employ the opposite convention, where the
sesquilinear inner product is linear in the first entry!
∥α∥ > 0 ∀α ̸= 0.
98
APPENDIX A. QUANTUM MECHANICS RECAP 99
There are two examples of Hilbert spaces you should be acquainted with: column
vectors and square-integrable functions. Let’s look at both in turn.
The vector space Cd is formed by d-dimensional complex column vectors
α1
α = ...
αd
Show Hilbert spaces appears e.g. in the description of spin degrees of freedom.
More involved is the Hilbert space L2 (Rn ) of square-integrable complex functions on
R . Given two functions α, β : Rn → C, we can define a “continuous analogue” of
n
Eq. (A.4):
Z
⟨α|β⟩ = ᾱ(x)β(x) dn x. (A.5)
For the non-pedantic physicist, the space of all wave functions, together with (A.5) defines
a Hilbert space. It is associated with a point particle with n degrees of freedom.
There are three technical problems that one has to address to define the Hilbert space
of functions with mathematical rigor.
The first problem is the integral is not actually defined for all functions. Set, for
example
sin(1/x) x ̸= 0,
ψ(x) =
0 x = 0.
Then
Z
|α(x)|2 dn x
does not exist (in either the Riemann or the Lebesque sense). The second problem
is that the integral may be defined, but infinite – take e.g. α(x) = 1 and compute
⟨α|α⟩. To get rid of both problems, we define a function α to be square-integrable
if
Z
∥α∥2 = ⟨α|α⟩ = |α(x)|2 dn x
exists and is finite. If α, β are square-integrable, then the product ᾱβ is integrable,
and the Cauchy-Schwarz inequality says that
Another technical issue with function spaces concerns physical units. Let me say
upfront that one can represent all physical quantities just by real numbers relative to
some fixed set of units, and that in this case, none of the issues below arise. (This is
what we will mainly do in this document). However, attaching a dimension to every
physical quantity has some value in that it can highlight certain inconsistencies and
guide heuristic arguments. So let’s briefly discuss how this would be done in QM.
For example, we may want ψ(x) to be defined not on the set of real numbers, but
on a set representing physical positions ([x] = L) measured in some concrete unit
of length, say meters m. Then [dx] = L as well, and for the normalization property
to work out, we can either stick with the scalar product
Z
⟨ϕ|ψ⟩ = ϕ̄(x)ψ(x) dx,
R
in which case the wave function needs to have the dimension [ψ(x)] = L1/2 , or we
retain dimensionless wave functions, in which case we have to redefine the scalar
product
Z
⟨ϕ|ψ⟩ = ϕ̄(x)ψ(x) dµ(x),
R·m
with respect to a dimension-free measure
1
dµ(x) := dx.
m
When working in another continuous representation (e.g. momentum, see below),
the units will have to be adapted accordingly. Unlike functions that depend
P on con-
tinuous parameters, discrete coefficients remain dimensionless (so that [ i |αi |2 ] =
1) and thus do not carry information about their physical interpretation.
In QM, linear maps between Hilbert spaces are traditionally called operators.
Examples:
• H = Cd : In this case, operators can conveniently be specified as matrices, which act
on column vectors in the usual way. For example, we will have ample opportunity
to work with the Pauli matrices:
0 1 0 −i 1 0
σx = , σx = , σz = .
1 0 i 0 0 −1
APPENDIX A. QUANTUM MECHANICS RECAP 101
The genius of this notation is that one doesn’t need to expend any thoughts on concepts
like “dual vectors” or “linear functionals” – the formalism almost forces one to use these
object correctly.
Let’s play around with this. Equation (A.6) is the inner product between |ψ⟩ and |ϕ⟩.
One can combine two vectors also to form an outer product, namely the linear operator
H → H defined as
|β⟩ 7→ |ϕ⟩⟨ψ| |β⟩ := |ϕ⟩ ⟨ψ|β⟩ . (A.7)
Definition (A.7) implies that composing bra’s and ket’s is associative: One can read the
expression
|ϕ⟩⟨ψ|β⟩
as either
|ϕ⟩⟨ψ| |β⟩ “operator acting on vector”
or as
|ϕ⟩ ⟨ψ|β⟩ “vector times inner product” ,
getting the same result.
APPENDIX A. QUANTUM MECHANICS RECAP 102
A.1.4 Bases
Let H be a Hilbert space. A set {|ei ⟩}i ⊂ H is called ortho-normal if
|ei ⟩⟨ei | = 1,
X
(A.8)
i
The converse is not true: There are complete sets that are not ortho-normal bases.
Using just the completeness relation, the following important properties of ONBs can
be easily verified:
1. Expansion coefficients are given by inner products
X
|ψ⟩ = 1|ψ⟩ =
X
|ei ⟩⟨ei | |ψ⟩ = ⟨ei |ψ⟩ |ei ⟩.
| {z }
i i
ψi
⟨ψ| = ⟨ψ|1 =
X
⟨ψ|ei ⟩ ⟨ei |.
| {z }
i
ψ̄i
⟨ψ|ϕ⟩ = ⟨ψ|1|ϕ⟩ =
X X
⟨ψ|ei ⟩⟨ei |ϕ⟩ = ψ̄i ϕi .
i i
A = 1 A1 =
X X
|ei ⟩⟨ei |A|ej ⟩⟨ej | = Ai,j |ei ⟩⟨ej |, Ai,j := ⟨ei |A|ej ⟩. (A.9)
i,j i,j
so that
⟨ϕ|A|ψ⟩ = ⟨ϕ|1A1|ψ⟩ =
X
ϕ̄j Aij ψi .
ij
The expression (A.9) also shows that for every basis {|ei ⟩}i of the Hilbert space, the
set {|ei ⟩⟨ej |}ij is a basis for the vector space of linear operators.
The Dirac notation allows one to save a bit of ink when working with one fixed ONB.
Say we have agreed to work with {|ei ⟩}i . Then quantum physicists (and no-one else. . . )
commonly drop the symbol e and just put the index into the ket:
|i⟩ := |ei ⟩.
Under this identification, the composition rules of bra’s, ket’s, and operators correspond to
the usual rules of matrix-vector multiplication. This representation is particularly useful
for computer implementations!
For example, a spin-1/2-degree of freedom is associated with the Hilbert space
H = {α|↑⟩ + β|↓⟩ | α, β ∈ C}
with basis {|↑⟩, |↓⟩}. Then one can introduce operators either abstractly or as matrices,
e.g.:
0 1
σx = |↑⟩⟨↓| + |↓⟩⟨↑| “ = ” ,
1 0
1 0
σz = |↑⟩⟨↑| − |↓⟩⟨↓| “ = ” .
0 −1
APPENDIX A. QUANTUM MECHANICS RECAP 104
The operator A† (pronounced “A dagger”) is called the adjoint of A. If one chooses a basis
of H and expands
X
A= Aij |i⟩⟨j|,
ij
then
X
⟨i|A† |j⟩ = ⟨j|A|i⟩ = Āji ⇒ A† = Āji |i⟩⟨j|.
ij
One can think about the adjoint A† as acting on “bras” the way that A acts on “kets”.
More precisely, recall that we introduced ⟨ϕ| as “the functional projecting onto |ϕ⟩”. Writ-
ing the projection of |ψ⟩ onto A|ϕ⟩ as
we conclude that the functional that projects onto A|ϕ⟩ is given by ⟨ϕ|A† .
Examples: The Pauli matrices are self-adjoint. The momentum operator is self-adjoint:
Z ∞
⟨ϕ|P |ψ⟩ = ϕ̄(x)(−iℏ)ψ ′ (x) dx
−∞
Z ∞
=− (ϕ̄)′ (x)(−iℏ)ψ(x) dx (integration by parts)
−∞
Z ∞
= ψ̄(x)(−iℏ)ϕ′ (x) dx = ⟨ψ|P |ϕ⟩,
−∞
where we have used that for square-integrable functions limx→±∞ ψ(x) = 0 so that no
boundary terms appear when integrating by parts.
A|ψi ⟩ = λi |ψi ⟩
Of course, the λi ’s are the eigenvalues and the |ψi ⟩’s the eigenvectors of A.
A spectral decomposition (or eigendecomposition) of A is a representation of the form
1=
X X
A= λi |ψi ⟩⟨ψi |, |ψi ⟩⟨ψi |. (A.11)
i i
APPENDIX A. QUANTUM MECHANICS RECAP 105
It follows that A has an eigendecomposition if and only if one can find an ONB comprised
of eigenvectors. In this case, one refers to it as A’s eigenbasis, and the λi ’s appearing in
the decomposition are exactly the eigenvalues of A.
Not every operator has an eigenbasis, e.g. the spin-1/2 raising operator
1 0 1
σ+ = (σx + iσy ) =
2 0 0
does not (why?). There’s a theorem in functional analysis that essentially says that A
has an eigendecomposition if and only if A commutes with its adjoint, i.e. AA† = A† A.
(Though the case when there is a continuum of eigenvalues needs more attention, see
section below).
The most important class of operators for which this holds are, of course, the self-
adjoint ones A = A† . What is more, in this case, all eigenvalues are real. Indeed, A|ψ⟩ =
λ|ψ⟩ implies (taking |ψ⟩ to be normalized without loss of generality)
λ = ⟨ψ|A|ψ⟩ = ⟨ψ|A† |ψ⟩ = ⟨ψ|A|ψ⟩ = λ̄.
Thus the self-adjoint operators are exactly those of the form
λi ∈ R, {|ϕi ⟩}i an ONB.
X
A= λi |ϕi ⟩⟨ϕi |,
i
There are additional problems! For the position operator (Xψ)(x) = xψ(x), the eigen-
value equation
xψ(x) = λψ(x) ∀x
is solved by
c x=λ
ψ(x) = ,
0 else
which has norm ∥ψ∥ = 0. So it seems like there are no eigendecompositions for the two
most important operators of QM. //
To get around the problem, we widen our domain of discourse by allowing for more
general objects than just square-integrable functions. Let’s first see how this formally
solves our problem. Whether we are “allowed to do this”, i.e. whether the formal con-
struction will lead to inconsistencies is something we’ll worry about later.
APPENDIX A. QUANTUM MECHANICS RECAP 106
Delta distributions
The distribution δy is a formal object whose inner product with a smooth function ϕ is
defined to be
Z
⟨δy |ψ⟩ = δ̄y (x)ψ(x) dx := ψ(y).
provides an eigendecomposition of the position operator X in the sense that for any pair
of smooth functions ϕ, ψ we get the correct result
Z Z Z
⟨ϕ| x|δx ⟩⟨δx | dx |ψ⟩ = x⟨ϕ|δx ⟩⟨δx |ψ⟩ = xϕ̄(x)ψ(x) = ⟨ϕ|X|ψ⟩. (A.12)
x
So when integrated against smooth functions, the expressions above behave just like an
eigendecomposition should. We can work this that! ,
Plane waves
We now turn to eigendecomposition of the momentum operator. For k ∈ R, define the
non-normalizable eigenfunction
We claim that
Z Z
|ϕk ⟩⟨ϕk | dk = 1, ℏk |ϕk ⟩⟨ϕk | dk = P,
k k
To see that this is true, note that the inner product with a function ψ
Z
−1/2
⟨ϕk |ψ⟩ = (2π) e−ikx ψ(x) dx = ψ̃(k)
APPENDIX A. QUANTUM MECHANICS RECAP 107
gives the Fourier transform ψ̃ of ψ evaluated at k. Recall that the inverse transform is
Z
−1/2
(2π) eikx ψ̃(k) dk = ψ(x).
General eigendecompositions
We can now sketch the way in which a general Hermitian operator A has an eigendecom-
position. Consider all solutions to the eigenvalue equation
A|ψλ ⟩ = λ|ψλ ⟩,
We can unify the treatment of the discrete and the continuous part. Define
1 λ′ ∈ C
X
ρ= δλ + IC in terms of the indicator function IC (λ′ ) = .
0 else
λ∈D
APPENDIX A. QUANTUM MECHANICS RECAP 108
The delta functions allow us to incorporate the sums in (A.16) into the integral:
Z
A = λ|ψλ ⟩⟨ψλ |ρ(λ) dλ. (A.17)
where PS projects onto the space spanned by {|ψλ ⟩ | λ ∈ S}. This looks somewhat like
the formula
Z
ρ(λ) dλ = µ(S)
S
for computing the measure of a set S given a density ρ. Therefore, the map S 7→ PS is
called a projection-valued measure and ρ the density of states (with respect to dλ). The
interpretation of ρ is particularly clear when applied to sets S that do not intersect the
continuous part S ∩ C = ∅. Then
Z
ρ(λ) dλ = |S ∩ D|
S
equals the number of eigenvalues of A in S.
See Chapter 1 of Quantum Mechanics by Ballentine for a more careful, but not too
technical exposition. A rigorous version is the spectral theorem of functional analysis.
and therefore
Z
P = iℏ |δx ⟩⟨δx′ | dx
The first holds because shifting the derivative to the bra means that in (A.19), ψ instead of
ϕ gets differentiated, and to remedy that, we need to use integration by parts once more,
which causes the change in sign. The second one holds because ∂x δx (y) = ∂x δ(y − x) =
−δx′ (y), so differentiating the index rather than the argument of the delta function also
incurs a sign change. A similar argument verifies the third expression. This last one is
interesting, because it is a formal generalization of (A.9) to continuous bases. It expresses
P in terms of its “matrix elements”
Z Z
⟨δy |P |δz ⟩ = −iℏ δy (x)δz′ (x) dx = iℏ δy′ (x)δz (x) dx = iℏδy′ (z).
P2 ℏ2 ℏ2
Z Z
=− |δx ⟩∂x2 ⟨δx | dx = |δx′ ⟩⟨δx′ | dx.
2m 2m 2m
A plane wave eikx complies with the boundary conditions if and only if every component
ki of the wave vector is an integer multiple of 2π
L . Indeed, the discrete set of functions
1 ikx 2π n
ϕk (x) := e , k∈ Z ,
Ln/2 L
forms an ONB for L2 (B) and the formulas for the Fourier transform become
Z
1
ψ̃(k) = n/2 e−ikx ψ(x) dn x,
L B
1 X (A.21)
ψ(x) = n/2 ψ̃(k)eikx .
L 2π n
k∈ L Z
Comparison with (A.20) shows that, formally, the transition between a finite and an un-
bounded volume Fourier transform is facilitated by the substitution
Z
1 1 X
dn k ↔ (A.22)
π n/2
Rn L n/2
2π n
k∈ L Z
Note the asymmetry in (A.21): Fourier transformation takes the compact domain B to
the discrete domain 2πL Z . We can of course reverse the interpretation of the two functions
n
in (A.21). The formula then says that functions ψ(x) defined on a lattice Zn 2π L can be
expanded in terms of plane waves ϕk (x) = Ln/2 1
e−ikx with wave vectors k ∈ B. In
this context, B is sometimes called the Brillouin zone and k the crystal momentum or
quasi-momentum.
Of course, the universe isn’t actually a finite box with cyclic boundary conditions...
...but we may as well pretend it were! Physics is local, so we can assume that all phenom-
ena we are interested in take place in some box that is sufficiently large that the boundary
does not affect the predictions we extract from the theory.
Translation symmetry
Fourier transforms are intimately connected to translation symmetry. Let Ta be the trans-
lation operator that shifts functions along the vector a
It is the unique common eigenbasis of all Ta (why?). Therefore, if A is any operator that
commutes with translations
[Ta , A] = 0 ∀ a, (A.23)
then T must be diagonal in the Fourier basis, too. Explicitly, (A.23) implies that A is fully
specified by its “first column”
⟨δx |A|δy ⟩ = ⟨δx |ATy T−y |δy ⟩ = ⟨δx |Ty A|δ0 ⟩ = ⟨δx−y |A|δ0 ⟩ = f (x − y).
It then follows that the eigenvalues of A are proportional to the Fourier transform of f :
Z
−n/2
⟨δx |A|ϕk ⟩ = (2π) ⟨x|A|y⟩eiky dn y
Z
= eikx (2π)−n/2 f (x − y)e−ik(x−y) dn y = (2π)n/2 f˜(k) ⟨δx |ϕk ⟩
so that, summarizing,
Z
A = (2π) n/2
f˜(k) |ϕk ⟩⟨ϕk | dn k. (A.24)
⟨p, x⟩ = ωt − kx,
which (at least in the case of n = 4) determines the space-time metric in relativity. The
commonly used basis of plane waves is
This convention extends to the case n =R 1. That is, if a function ψ depends only on
1
time, then its FT is taken to be ψ̃(ω) = 2π eiωt ψ(t)dt, whereas if the single parameter is
1
R −ikx
interpreted as a spatial coordinate or a generic parameter, then ψ̃(k) = 2π e ψ(x)dx.
APPENDIX A. QUANTUM MECHANICS RECAP 112
The Fourier transform and its inverse thus take the form
1 X −ikx 1 X
ψ̃(k) = ⟨ϕk |ψ⟩ = √ e ψ(x), ψ(x) = ⟨δx |ψ⟩ = √ eikx ψ̃(k).
N x∈ZN N k∈ 2π Z
N N
The theory developed above can be easily translated to the finite case.
and likewise
X
Ak = · · · = λki |ϕi ⟩⟨ϕi |.
i
k
P
Thus, if p(x) = k ck x is a polynomial, then
X X
p(A) = c k Ak = p(λi )|ϕi ⟩⟨ϕi |.
k i
For an arbitrary function f : C → C, one can thus consistently define its action on
operators with an eigendecomposition as
X
f (A) := f (λi )|ϕi ⟩⟨ϕi |.
i
ϕi ∈ R,
X
U= eiϕi |ψi ⟩⟨ψi |,
i
A.1.12 Projections
Recall (see Fig. A.1) that in Rd with Euclidean scalar product (u, v) =
P
i ui vi , there is a
one-one relation between
• Subspaces V ⊂ Rd , and
• orthogonal projections P , i.e. linear maps fulfilling P = P t , P 2 = P .
The Hilbert space analogue works like this: An operator P is a projector (or projection)
if
1. P = P † , and
2. P 2 = P .
The first property means that P has a spectral decomposition. The second property then
implies that the eigenvalues are elements of {0, 1}. Thus,
X
P = |ψi ⟩⟨ψi |,
i
where the {|ψi ⟩} form an ONB for the subspace V ⊂ H onto which P projects.
Examples:
APPENDIX A. QUANTUM MECHANICS RECAP 114
• For every normalized vector |ψ⟩ ∈ H, the outer product P = |ψ⟩⟨ψ| is the projec-
tion onto the one-dimensional subspace V = {z|ψ⟩ | z ∈ C}.
• Define the parity operator Π on H = L2 (R) by
Then it’s easy to see that P± = 21 (1 + Π) are projection operators onto the space of
even and odd functions respectively (why?).
Then
√
r r r
ℏ † 2ℏ mℏω
X= (a + a ) = Re(a), P = −i (a − a† ) = 2mℏω Im(a).
2mω mω 2
The Poisson bracket {X, P } = 1 implies
1 1
{a, a† } =
− i{X̃, P̃ } + i{P̃ , X̃} = ,
2 iℏ
so the coordinate change (X, P ) → (a, a† ) is canonical up to the factor 1/(iℏ). The
Hamilton function reads in complex coordinates
1
H= ℏω(aa† + a† a) = ℏω|a|2 . (A.26)
2
and the equations of motion are (using standard properties of Poisson brackets)
Quantum mechanics
Now assume that the X, P are not classical phase space coordinates, but instead position
and momentum operators on L2 (R). Replacing Poisson brackets {·, ·} by commutators
1
iℏ [·, ·] and complex conjugates by Hermitian conjugates, the above derivation goes through
verbatim for the quantum case, up until Eq. (A.26). There, the fact that the a†i , ai do not
commute means that we cannot simplify 21 (a†i ai + ai a†i ) as |ai |2 . Using the commutation
relations of the ladder operators instead, the Hamiltonian becomes
1 1
H = ℏω (a† a + aa† ) = ℏω a† a + .
2 2
Momentarily switching back to the position-space representation of the operators, one can
2
easily see that there is a unique ground state |0⟩, with wave function ⟨x̃|0⟩ = π −1/4 e−x̃ /2 .
From the commutation relations of the ladder operators, it then follows that with
1
|n⟩ := √ (a† )n |0⟩,
n!
the set {|n⟩}n≥0 forms an ONB of the Hilbert space. It is indeed the eigenbasis of H:
√ √ 1
a|n⟩ = n|n − 1⟩ ⇒ a† |n⟩ = n + 1|n + 1⟩ ⇒ H|n⟩ = ℏω n + |n⟩.
2
where the potential is given in terms of some coupling matrix V = (Vkl ). Without loss of
generality (why?), we can assume that V is symmetric and thus there exists an orthogonal
O that diagonalizes V :
solved by ak (t) = ak (0)e−iωk t . The transformation back to the original coordinates reads
r
ℏ X 1
Xl (t) = √ (ak e−iωk t Okl + a†k eiωk t Okl ),
m 2ω k
k
√
r
X ωk
Pl (t) = −i mℏ (ak e−iωk t Okl − a†k eiωk t Okl ).
2
k
In words, the configuration X(t) of the particles is a linear combination of the normal
modes, with the coefficients oscillating with the eigenfrequencies ωk .
Some remarks:
Quantum mechanics
As was done in the n = 1 case of Sec. A.2.1, assume now that the Xi , Pi are position and
momentum operators on L2 (Rn ). Then, again, the above applies verbatim to the quantum
case, except that the Hamiltonian reads
1
ℏωk a†k ak +
X
H= .
2
k
i=1
APPENDIX A. QUANTUM MECHANICS RECAP 118
and the eigenbasis arises from laddering (don’t confuse the quantum numbers ni with the
number of n of degrees of freedom)
Y 1 X 1
|n1 , . . . ⟩ = √ (a†i )ni |0⟩ ⇒ H|n1 , . . . ⟩ = ℏωk nk + . (A.27)
i
ni ! 2
k
In principle, it is possible to work out the wave function ⟨x|n1 , . . . ⟩ in terms of the original
coordinates. But this gets ugly pretty quickly, so one usually tries to extract physical
predictions without having to go there.
It follows that the Fermnionic occupation numbers can only be 0 and 1. Likewise,
aa† = 1 − a† a = 1 − N (A.29)
Using (A.29)
1 = (a† a − aa† ),
ℏω ℏω
H ′ := H −
2 2
so the right hand side generates the same time evolution.
The simplest representation of the Fermionic oscillator is on H = C2 , with
† 1 0 0 1 0 1
a = (σx + iσy ) = ⇒ a = (σx − iσy ) = .
2 1 0 2 0 0
Then the occupation number operator and the occupation number basis is
1 0 0 1
N= , |0⟩ = , |1⟩ =
0 0 1 0
of finding the system in one of the final states when measured at time t.
APPENDIX A. QUANTUM MECHANICS RECAP 120
H0 |f ⟩ = Ef |f ⟩,
as a power series in λ and that low orders give meaningful answers. Separating the
Schrödinger equation
X X
iℏ∂t λs |ψs ⟩ = (H0 + λV ) λs |ψs ⟩
s s
by degrees of λ gives
With initial condition |ψ(t = 0)⟩ = |i⟩, the zeroth-order equation is solved by
t
|ψ0 (t)⟩ = e iℏ Ei |i⟩.
Plugging this into the first-order one and projecting onto an eigenstate |f ⟩ gives
t
iℏ∂t ⟨f |ψ1 (t)⟩ = Ef ⟨f |ψ1 (t)⟩ + e iℏ Ei ⟨f |V |i⟩
which is solved by
1
1 − e iℏ (Ei −Ef )t 1 Ef t
⟨f |ψ1 (t)⟩ = ⟨f |V |i⟩ e iℏ for Ef ̸= Ei , (A.30)
Ef − Ei
t 1
⟨f |ψ1 (t)⟩ = ⟨f |V |i⟩ e iℏ Ef t for Ef = Ei , f ̸= i. (A.31)
iℏ
Using L’Hôspital’s rule, one verifies that (A.30) tends to (A.31) for Ef → Ei . In this
sense, it suffices to work with (A.30) alone. Its square is
sin2 ((Ei − Ef ) 2ℏ
t
)
|⟨f |ψ(t)⟩|2 = 4|⟨f |V |i⟩|2 (i ̸= f ).
(Ei − Ef )2
With ϵ = (Ef − Ei ), τ = 2ℏ t
, the fraction is sin2 (ϵτ )/ϵ2 , the square of the “sinc
function” (Fig. A.2). It has a central peak of height τ , zeroes at ϵ = ± πτ , and shows
oscillations of quadratically decreasing amplitude for ϵ → ±∞. It is known (by the
Dirichlet integral) that the area under the curve is τ π. Therefore, the family of functions
1
fτ (ϵ) := πτ sin2 (ϵτ )/ϵ2 , converges to a δ-function centered at 0 as τ → ∞.
Qualitatively, we can now describe which parameters enter the probability Pi→F (t).
By the above, only states |f ⟩ with energy Ef in the range Ei ± 2πℏ t pick up significant
APPENDIX A. QUANTUM MECHANICS RECAP 121
Figure A.2: Squared sinc function sin2 (ϵτ )/ϵ2 . x axis in units of τ , y axis in units of τ1 .
weight. For such states, the modulus squared is proportional to t and the squared coupling
coefficient |⟨f |V |i⟩|2 .
To get a more quantitative statement, let ρ(f ) be a measure such that
Z
PF = |f ⟩⟨f |ρ(f ) df.
F
In other words, ρ(f ) is the “density of states”, in the sense of Sec. A.1.7. Then
2 t
2 sin (Ei − Ef ) 2ℏ
Z Z
2
⟨ψ(t)|PF |ψ(t)⟩ = |⟨f |ψ(t)⟩| ρ(f ) df = 4|⟨f |V |i⟩| ρ(f ) df.
F F (Ei − Ef )2
Let’s suspend disbelief for a while and take (A.32) at face value. It is called Fermi’s Golden
Rule: The probability Pi→F (t) increases linearly, with slope Γ proportional to the squared
coupling and the density of states, integrated over all final states with the right energy.
The “≃”–step in (A.32) involved quite the leap of faith. The squared-sinc-construction
gives a delta function only in the limit of large times, but first-order perturbation theory is
valid, at most, at short times. It’s unclear whether there’s an intermediate regime where
both approximations simultaneously hold. Also, if the spectrum is discrete, the density of
states ρ(f ) is itself a sum of delta functions (Sec. A.1.7), so that the integral has no obvi-
ous meaning. The cleanest (but not only) way around this issue is to restrict attention to
energies Ei that lie in the continuous part of the spectrum of H0 . This frequently involves
letting the “quantization region” L3 go to ∞ (c.f. Sec. A.1.9). One could analyze the con-
ditions for (A.32) to hold more carefully – but this is rarely done in practice. Experience
has shown that the “golden” rule gives the right answer more often than one could have
hoped, hence the moniker.
Appendix B
Miscellaneous Integrals
The Gaussian integral (B.1) is taken over the entire real real line x → ±∞, but in
fact,
pis already close to its asymptotic value if the limits of the integral are large compared
to |α|. This is obvious if α has a large real part (because the absolute value of the
2
integrand is decaying with e− Re αx ). Imaginary parts of α also aid convergence, but for
a more subtle reason: They cause the integrand to oscillate rapidly for large arguments, so
that its contributions to the integral tend to cancel.
To visualize this effect, consider the non-asymptotic real Fresnel integrals
Z x Z x
C(x) := cos(t2 ) dt, S(x) := sin(t2 ) dt.
0 0
Separating real and imaginary parts in (B.2) gives
r
π
lim C(x) = lim S(x) = . (B.3)
x→∞ x→∞ 8
Their convergence is shown in (Fig. B.1).
122
APPENDIX B. MISCELLANEOUS INTEGRALS 123
Figure B.1: The Fresnel integrals C(x)p (orange) and S(s) (blue). The integral quickly
converges towards its asymptotic value π/8 (black line), with contributions of larger
arguments canceling to the oscillating behavior of the integrand.
r2 sin θ dr dθ dϕ = r2 dr dµ dϕ
In this chapter, we take a more pedantic look at the function spaces that occur in QM. For
simplicity of presentation, we’ll mainly restrict attention to the one-dimensional case.
so that
Z
⟨ψ|ϕ⟩ := ψ(x)∗ ϕ(x) dx
1 See any textbook on analysis, e.g. Folland’s Modern analysis, Chapter 2 for more details on integration
theory. Just two comments on terminology: (1) All integrals in the theory of R function spaces are to be understood
in the sense of Lebesgue. (2) A function f is integrable if the integral f exists and is finite. (So, counter-
intuitively, “f is integrable” and “the integral of f exists” are different statements!)
124
APPENDIX C. FUNCTION SPACES AND DISTRIBUTIONS 125
supported on a set of measure zero, i.e. iff [ψ] = [0]. The implication ∥[ψ]∥ = 0 ⇒
[ψ] = 0 is part of the mathematical definition of a norm. It is frequently invoked in
physics arguments: For example, in the algebraic treatment of the harmonic oscillator,
one typically shows that ∥a|0⟩∥ = 0 and concludes that a|0⟩ = 0, i.e. that the attempt to
construct a negative-energy eigenstate by laddering leads to the 0 function.
Too small: L2 (R) does not contain the eigenfunctions of some important operators.
The eigenfunctions of the momentum operator are plane waves, which have norm ∞, and
therefore do not belong to L2 . The eigenfunctions of the position operator are supported
only on one single point. As elements of L2 , they are therefore equivalent to the function
that is identically 0.
Too large: L2 (R) contains elements for which important operators are undefined. For
example, elements of L2 (R) can have discontinuities, in which case the action of the
momentum operator is not well-defined. For an example involving the position operator,
take the function
1
ψ(x) = √ . (C.2)
π(x + i)
Then
Z Z
1 1 1 ∞
|ψ(x)|2 dx = dx = [arctan(x)]−∞ = 1,
π x2 + 1 π
so ψ ∈ L2 (R). But (by comparison with a 1/x dx = ∞), one can easily see that the
R∞
integral ⟨ψ|X k |ψ⟩ = x2x+1 dx is infinite for even k ∈ N and undefined for odd k ∈ N.
R k
Figure C.1: Rigged Hilbert spaces are “rigged” in the sense of “fully equipped” (like
Imperator Furiosa’s War Rig, pictured above), not in the sense of “manipulated with the
goal to deceive”, like a loaded die. (OK, maayybe I was just looking for an excuse to
include that picture in my lecture notes).
Discussion
Do these issues mean that L2 (R) is not an appropriate mathematical model for the space
of wave functions? Arguably not!
For the eigenfunction examples, note that infinitely extended or infinitely concentrated
states are unphysical, so we cannot complain that the space L2 (R), designed to model
physical wave functions, does not contain them.
Now let’s look at the function ψ defined in (C.2). The fact that Xψ ̸∈ L2 (R) does not
mean that position measurements aren’t well-defined. To the contrary, p(x) = |ψ(x)|2 =
1
π(x2 +1) is a perfectly good probability density describing position measurement outcomes.
It’s just that none of the moments ⟨X k ⟩ (including the expectation value, k = 1) exist and
are finite. But nobody ever promised us that all probability distributions can be character-
ized via moments, so there is no fundamental issue with this. Likewise, any ψ ∈ L2 (R),
even if it exhibits discontinuities, has a Fourier transform ψ̃, and thus a probability density
p(ℏk) = |ψ̃(k)|2 over momentum measurement outcomes.
However, the discussion does suggest that for the purpose of doing calculations, it
would be good to identify a “sandwich of spaces”
Φ ⊂ L2 (R) ⊂ Φ′ , (C.3)
where Φ is “sufficiently small” that all relevant operators are well-defined on it, and Φ′ is
“large enough” that it contains a complete set of eigenvectors for all relevant operators.
As we’ll see, the spaces Φ and Φ′ are usually constructed together. Elements of Φ are
called test functions and those of Φ′ distributions. Constellations as in (C.3) are studied as
Gelfand triples or rigged Hilbert spaces (Fig. C.1)).
Which spaces of functions are the best choice for Φ, Φ′ depends on the problem one
wants to solve. An important set for quantum mechanics is Schwartz space (after Laurent
Schwartz, not to be confused with Hermann Schwarz of Cauchy-Schwarz-inequality fame)
for Φ and the associated space of tempered distributions for Φ′ . We’ll look at this case next,
and briefly sketch the general theory in Sec. C.3.
APPENDIX C. FUNCTION SPACES AND DISTRIBUTIONS 127
C.2 Distributions
C.2.1 Schwartz space
The most important set of test functions Φ in QM is Schwartz space S, the “smooth func-
tions whose derivatives vanish rapidly”:
n o
S = ϕ ∈ C ∞ (Rn ) ∀α, β ∈ N0 : sup |xα ∂xβ ϕ(x)| < ∞ . (C.4)
x
The condition ϕ ∈ C ∞ (R) means that elements of Schwartz space are infinitely diff-
entiable; while the second condition says that ϕ and its derivatives vanish faster than any
polynomial function as |x| → ∞. It follows that S is invariant under P and X. It is also
easy to see that any square-integrable function can be arbitrarily-well approximated by
Schwartz-class functions, i.e. for every ψ ∈ L2 (R) and every ϵ > 0, there exists a ϕ ∈ S
such that ∥ψ − ϕ∥ ≤ ϵ. (Technically: S is dense in L2 (R) w.r.t. norm topology).
This already solves half of our problems: Because well-behaved functions are dense,
there is little loss of generality in assuming that any wave function of physical interest lies
in S. One can then apply X and P without any issue.
(The notation Dl u will be explained below). Then TDl u is well-defined as a linear func-
tional S → C. That’s because ϕ ∈ S implies that ∂xl ϕ ∈ S as well; local integrability of u
and continuity of ∂xl ϕ implies that the integrand is locally integrable; and finally fact that
∂xl ϕ vanishes faster than any polynomial, together with the matching growth restriction on
u, means that the integral remains finite as |x| → ∞. A functional of this form is called a
tempered distribution, and the space of all tempered distributions is denoted by S ′ .
In contrast, note that TDl u is rarely well-defined as a functional on L2 (R). For one,
elements ψ ∈ L2 (R) aren’t in general differentiable, and even if they are, they generally
vanish too slowly for the integral to converge. So we see that S, on account of being
smaller than L2 (R), allows for a larger set of linear functionals! Recall that we’re out to
find a set larger than L2 (R), so this seems like a promising direction to explore. Let’s look
at some examples.
Plane waves: Because eikx is not normalizable Teikx defines a linear functional on
Schwartz space, but not on L2 (R).
APPENDIX C. FUNCTION SPACES AND DISTRIBUTIONS 128
Delta functional: Let θ(x) be the step function that is 0 for x < 0 and 1 for x ≥ 0.
Then, using integration by parts,
Z Z ∞
TDθ(x) (ϕ) = − θ(x)∂x ϕ(x) dx = − ∂x ϕ(x) dx = ϕ(0). (C.6)
0
The operation only makes sense for functions ϕ that are differentiable at 0 – so certainly
for elements of S, but not necessarily elements of L2 (R).
“Bra vectors”: For every ψ ∈ L2 (R), the “bra” ϕ 7→ ⟨ψ|ϕ⟩ = Tψ∗ defines a tempered
distribution. (Indeed, every square-integrable function is also locally integrable. That’s an
easy consequence of the Cauchy-Schwarz inequality).
The principal value is important in the theory of partial differential equations, where
one often wants to associate a distribution with the function u(x) = x1 in some way
(c.f. Chap. D). Unfortunately, x1 is not locally integrable, and indeed, ϕ(x)
R
x dx does not
in general exist. But as we’ll see, the principal value
Z
1 ϕ(x)
pv (ϕ) := lim+ dx (C.7)
x ϵ→0 R\(−ϵ,ϵ) x
is finite for all ϕ ∈ S and, what is more, is given by the tempered distribution TD log |x| (ϕ).
To see that this makes sense, we first need to convince ourselves that log |x|, even though
it diverges as x → 0, is locally integrable. This follows from the fact that the anti-
derivative of log |x| is F (x) = x log |x| − x + C, which remains finite at the singularity:
limx→0 F (x) = C. Therefore, TD log |x| is indeed a tempered distribution. It remains to
be shown that it evaluates to the principal value:
Z
TD log |x| (ϕ) = − log |x| ϕ′ (x) dx
Z −ϵ Z ∞
= lim+ − log(−x) ϕ′ (x) dx − log x ϕ′ (x) dx
ϵ→0 −∞ ϵ
Z −ϵ Z ∞
ϕ(x) ϕ(x)
= lim+ dx − ϕ(−ϵ) log ϵ + dx + ϕ(ϵ) log ϵ
ϵ→0 −∞ x ϵ x
1
= pv (ϕ) + lim log(ϵ)(ϕ(ϵ) − ϕ(−ϵ))
x ϵ→0+
1 ′ 1
= pv (ϕ) + 2ϕ (0) lim+ ϵ log(ϵ) = pv (ϕ).
x ϵ→0 x
| {z }
=0
Regular distributions
Distributions of the form Tu (i.e. those that can be expressed without differentiating the
argument before integrating) are called regular. For regular distributions, it is common
APPENDIX C. FUNCTION SPACES AND DISTRIBUTIONS 129
to use the same symbol for both the distribution S → C and for the function R→C
defining it:
Z
T (ϕ) = T (x)ϕ(x) dx. (C.8)
You might complain that such an overloading of notation is not a nice thing to do. And
you’d be right. But things are about to get worse. Such a convention is even used for
non-regular distributions!
Consider e.g. the delta distribution δ(ϕ) = ϕ(0) discussed above. It is not regular.
(Because a hypothetical function giving rise to it would have to be zero everywhere except
at x = 0 – but an integral over a function supported on only one point is zero). But, in
analogy to (C.8), one still writes
Z
δ(ϕ) = δ(x)ϕ(x) dx.
The r.h.s. is not an integral and δ(x) not a function – the entire r.h.s. is to be read as an
elaborate notation for δ(ϕ). Whether this convention is genius (because it allows practi-
tioners to work with distributions without having to learn the abstract theory) or horrific
(because the one job of mathematics is to be rigorous and not to pretend that objects exist
when in fact they don’t) is a question that may be controversially debated.
(This is the bilinear analogue of the definition of the adjoint for sesquilinear inner prod-
ucts). It directly follows that for regular distributions with u ∈ S, TAu (ϕ) = Tu (At ϕ).
Using the notation in (C.8), this means
We take Eq. (C.9) as the general definition for the action of an operator on distributions.
In words: Operations on distributions are defined by shifting them onto the argument.
Derivatives of distributions
The most important application is the differentiation operator (Dϕ)(x) = ∂x ϕ(x). By
partial integration, Dt = −D from which we get
Z
(DTu )(ϕ) = u(x)(−∂xl )ϕ(x) dx = TDu (ϕ)
which immediately implies (C.10) by differentiating both sides in the sense of distribution.
Generalized eigenvectors
We say that a distribution T is a generalized eigenvector of an operator A : S → S if
A T = λ T.
Plane waves are therefore eigenvectors of the differentiation operator D or the momen-
tum operator P = −iD:
So, with all these preparations in the bag, it was pretty easy to identify the generalized
eigenvectors!
T̃ (ϕ) = T (ϕ̃).
APPENDIX C. FUNCTION SPACES AND DISTRIBUTIONS 131
that is, the FT of δ is a regular distribution, arising from the constant function
1
δ̃(k) = √ . (C.12)
2π
One could be tempted to use the following formal calculation to arrive at the same conclu-
sion:
Z
1 1
δ̂(k) = “ √ e−ikx δ(x) dx” = √ .
2π 2π
But, unlike, (C.11), this is not a rigorous argument given our development of the theory so
far! That’s because we have defined δ(ϕ) only for ϕ ∈ S, but e−ikx is most definitely not
an element of Schwartz space. The integral is therefore only heuristically defined. One
can sometimes make sense of products of distributions – but the issue is subtle and we will
not pursue it here.
Constant functions. The constant function 1(x) = 1Rdoes not have a Fourier trans-
form in the ordinary sense. For one, the integral (2π)−1 dx that would define 1̃(0) is
infinite. However, because T1 defines a tempered distribution, it does have a FT. Slightly
abusing language once again, we call it the FT of 1 (in the sense of distribution).
We can find it by expressing F −1 in terms of F and applying it to (C.12). To this end,
let Π be the parity operator, which mirrors functions about the origin: (Πϕ)(x) = ϕ(−x).
Then it is easy to see that F † = ΠF t and hence unitary of F implies
The principal value. From an easy contour integration, the FT of 1/(x + iϵ) is
1
Z
1 √ √
√ e−ikx dx = −i 2πe−ϵk θ(k) → −i 2πθ(k) (ϵ → 0+ ).
2π x + iϵ
Using (C.10), we then find that the FT of the principal value is a regular distribution:
1
F pv(1/x) (k) = lim+ F (k) + iπF(δ)(k)
ϵ→0 x + iϵ
r r
π π
=i (−2θ(k) + 1) = −i sign(k). (C.14)
2 2
Combining this result with (C.13) gives further transforms of common distributions:
r
2
(F sign) (k) = i pv(1/k) (C.15)
π
1 1
(Fθ) (k) = F (sign +1) (k) = i √ (pv(1/k) − iπδ) . (C.16)
2 2π
APPENDIX C. FUNCTION SPACES AND DISTRIBUTIONS 132
This is valid, because the integral, interpreted as a function of s ∈ [0, ∞), is con-
tinuous at 0. In fact, it is even differentiable:
Z −s∥k∥ Z
e
−∂s |0 ϕ̃(k) d3 k = ϕ̃(k) d3 k, (C.20)
∥k∥
which is finite for ϕ̃ ∈ S. (Note that the same regularization does not work for the
integral in (C.17), which formally corresponds to the case ϕ̃(k) = 1. Of course,
the constant function is not an element of Schwartz space, and indeed, this choice
would cause (C.20) to diverge).
Topological spaces
Consider a set X. A topology on X is a rule that allows us to decide when a sequence
xk : N → X converges to an element x ∈ X.
As a first example, assume that X is a vector space equipped with a norm ∥ · ∥. This
covers an extremely wide range of spaces, from X = R, the real numbers with norm
∥x∥ = |x| the absolute value, to X = L2 (R) with norm ∥x∥ = ⟨x|x⟩ derived from the
inner product. We say that a sequence xk converges in norm topology to x,
We’ll use these concepts to give very general definitions of continuity and complete-
ness.
Continuity
A function f between two topological spaces is continuous if it maps convergent sequences
to convergent sequences (Fig. C.2), i.e. if
xk → x ⇒ f (xk ) → f (x).
used ubiquitously, are actually well-defined. Another reason is that the equivalence of
“kets” and “bras” requires this property: The set of continuous linear functionals on a
Hilbert space H is denoted by H′ . If |ψ⟩ ∈ H, then we’ve shown above that ⟨ψ| is
continuous, i.e. an element of H′ . The Riesz representation theorem says that the converse
is also true: Every continuous linear functional of a Hilbert space is given by some “bra
vector”.
One can show that L2 (R) is complete, i.e. actually a Hilbert space.
Contrast this with Schwartz space S. It, too is a complex vector space with the same
sesquilinear inner product as L2 (R). But it is not √
complete in norm topology and hence no
Hilbert space. The argument works just like the 2-example above. Because S is dense
in L2 (R), for every ψ ∈ L2 (R), there exists a sequence ϕk : N → S converging to ψ in
norm. Thus, if ψ ̸∈ S, the sequence ϕk has no limit point in S.
Generalizations
The topological formulation above is the basis of generalizations. The common recipe
is to choose a test function space Φ (often norm-dense in L2 (R)), endow it with a finer
topology, and then consider the continuous dual Φ′ .
The most important choice is to take Φ to be the space of bump functions Cc∞ (R):
smooth functions with compact support. “Compact support” means that these functions
APPENDIX C. FUNCTION SPACES AND DISTRIBUTIONS 135
are identically zero for |x| large enough. (It is not obvious that one can define functions
that transition smoothly from being identically zero in some region to being non-zero in
other regions, but such functions do exist). In the context of distributions, the space of
bump functions is usually denoted by D.
Recall that Schwartz functions vanish faster than any polynomial, and thus integrals
against locally integrable functions u(x) that grow at most polynomially are finite. Be-
cause bump functions vanish identically for large x, integrals against any locally integrable
u are well-defined. This suggests, correctly, that the space of distributions D′ is even larger
than the space of tempered distributions S ′ .
The structure of D′ is somewhat more complicated than was the case for S ′ . We will
not discuss it here, but, for completeness, give the topology from which it derives. It is
defined in terms of the (semi-)norms
Terminology
The word “distributions” used without qualification is most likely to refer to D′ , but can
also mean a general continuous dual space Φ′ , and may also refer to tempered distributions
S ′ , depending on context. Making matters worse, S is always called “Schwartz space”,
but the name “Schwartz” is also associated with the general mathematical theory of dis-
tributions and in particular also with D′ . “Tempered distributions” always means S ′ , at
least.
Lastly, if a science professor answers an inquiry about a questionable derivation by
claiming that it is to be understood “in the sense of distribution”, they likely mean neither
D′ nor Φ′ nor S ′ . Instead, they are probably vaguely aware of the fact that what they are
doing isn’t quite rigorous, but are optimistic that a smart mathematician could figure it
out, and in any case, want to get through their lecture with their dignity intact and have
found that “distribution” is a fully general incantation that reliably suppresses follow-up
questions.
Needless to say, I would never engage in such tactics.
Appendix D
Green’s functions
D.1 Introduction
In this chapter, we are interested in the affine equation
Lu = f (D.1)
for u, given L and f . We’ll restrict attention to the most important special case, where L
is a translation-invariant differential operator on Rn .
Example: The damped harmonic oscillator. Newton’s equation for the position
u(t) of a particle subject to a driving force mf (t), viscous damping coefficient
(mγ)/2, and undamped eigenfrequency ω0 is
The problem is to find u(t) given f (t) and the boundary condition u(−∞) = 0.
Now here’s the basic idea: Formally, f (x) = f (x′ )δ(x − x′ ) dx′ is a superposition
R
of “delta impulses”. Thus, if we could work out how the system reacts to a delta im-
puls, we should be able to solve the general problem by linearity. Exploiting translational
invariance, we may even get away treating just the case of f (x) = δ(x).
This indeed works out. Assume we can find a G such that
LG = δ. (D.2)
Some terminology: G is the Green’s function of L.1 The expression (D.3) for u is
known as the convolution G ⋆ f of G and f .
1 Yes, it’s “the Green’s function of L”, not “the Green function of L” as would be more in line with the
136
APPENDIX D. GREEN’S FUNCTIONS 137
Elementary examples
The simplest example is L = ∂t . The general solution to Lu = f is, of course, the integral
Z t
u(t) = f (t′ ) dt′ .
a
Because ∂t is not invertible on the space of all differentiable functions, there is an ambigu-
ity in the solution, represented by a. Fixing a amounts to choosing the boundary condition
u(a) = 0. In these notes, we only treat the translationally-invariant theory, so we will
restrict attention to the choices a = ±∞. For a = −∞,
Z t Z
u(t) = f (t′ ) dt′ = θ(t − t′ )f (t′ ) dt′ .
−∞
+
The Green’s function G of ∂t under the boundary condition u(−∞) = 0 is therefore the
step function θ. And indeed, by Sec. C.2.3, ∂t θ(t) = δ(t) holds in the sense of distribution,
so that the step function fulfills (D.2).
An analogous calculation for a = +∞ leads to G− (t) = −θ(−t). Because the solu-
tions u(t) constructed using G+ only depend on the u(t′ ) for t′ ≤ t, one usually calls G+
a retarded Green’s function and, likewise, G− an advanced Green’s function. Their differ-
ence h(t) := G+ (t) − G− (t) = 1 is a solution of the homogeneous equation ∂t h(t) = 0,
consistent with our reasoning above.
For another example, take a quick look at L = ∂t2 . Because δ(t)t = 0 as distributions,
∂t2 (tθ(t)) = ∂t tδ(t) + θ(t) = ∂t θ(t) = δ(t),
which shows that G+ (t)tθ(t) is a (retarded) Green’s function for ∂t2 . Other solutions are
1 − 1 + 1
G− (t) = −tθ(−t), G= G + G = |x|.
2 2 2
Likewise, the Fourier transform of the defining equation LG = δ for Green’s function is
P G̃ = (2π)−n/2 1, (D.6)
The trouble is, of course, that if P has real zeros, the integrals in Eqs. (D.5, D.7) might
not exist. In the next sections, we’ll go through a variety of methods for anyway extracting
solutions by modifying these equations.
The problem of characterizing all G̃ ∈ Φ′ that satisfy (D.6) is known as the “prob-
lem of division” in the theory of distributions. In the univariate case, n = 1, it is
fairly easy to solve (we’ll introduce all necessary ingredients in Sec. D.2.4). Essen-
tially, this case is simple because univariate polynomials have finitely many roots. If
P has continuous sets of zeros, the problem can become very complicated, though.
Thus (employing the sign convention for the FT of time variables, as in (A.25)),
e−iωt
Z
1
G(t) = − dω.
2π (ω − ω+ )(ω − ω− )
Since we’re done integrating in Fourier space, we can re-use the letter ω, defining it to be
p
|ω02 − γ 2 |. Then the above may be simplified (using l’Hôpital for the equality case) to
sin(ωt) ω0 > γ
e−γt −γt
G(t) = θ(t) e t ω0 = γ . (D.8)
ω
sinh(ωt) ω0 < γ
APPENDIX D. GREEN’S FUNCTIONS 139
“Infinitesimal” deformations
There’s a variant of this constructing that can be interpreted as “shifting the roots of P
away from the real axis” instead of “deforming the integration path to avoid the roots”.
Write the complex ω’s on the path γ as ω = u + iv(u). Then
Z −iωt
e−iut
Z
1 e 1
dω = ev(u)t du.
2π γ P (ω) 2π P (u + iv(u))
If there are no zeros between γ and the real line, the integral does not change under the
substitution v(u) 7→ ϵv(u) for 1 ≤ ϵ < 0. In particular, by continuity of the exponential,
e−iut e−iut
Z Z
1 1
lim+ eϵv(u)t du = lim+ du.
2π ϵ→0 P (u + iϵv(u)) 2π ϵ→0 P (u + iϵv(u))
In the common special case where v(u) is constant, the limit is typically written as
e−iωt
Z
1
G± (t) = dω, (D.10)
2π P (ω ± i0)
with the sign depending on sgn v. (This construction should remind you of the formula
(C.10), expressing the principal value as the “side limit” 1/(x ± i0)). Of course, ω 7→
P (ω ± iϵ) is a polynomial whose roots are shifted by ∓iϵ compared to the ones of P .
APPENDIX D. GREEN’S FUNCTIONS 140
Multivariate case
We can reduce the problem for arbitrary n to the n = 1-case. While the technique works
in general, it is particularly natural if one of the variables is distinguished in some way. In
physics applications, this is typically the time. With this in mind, we’ll use x = (t, x) for
the arguments of u, f, G, and k = (ω, k) for the arguments of Fourier transforms.
Define pk (ω) = P (ω, k). Then pk is a univariate polynomial, and we can just repeat
the n = 1-construction from above, but in a k-dependent way. That is to say, choose
deformations γk avoiding the zeros of pk and define
e−iωt
Z Z
Gγ (x) := dω eikx dn−1 k. (D.11)
γk pk (ω)
The proof that Gγ is a Green’s function works exactly as in the univariate case.
There are three natural choices for deformations γk avoiding the poles. The simplest ones
are γk± : straight lines parallel to the real axis with imaginary parts ±ϵ. From the discussion
above, the integral does not depend on the value of ϵ > 0. The third contour is γkF , which
moves around −ωk in the lower half-plane and around +ωk in the upper half-plane (the
superscript “F ” is for Feynman; see Fig. ??).
Let’s look at G+ (x) = Gγ + (x). The frequency integral can be evaluated exactly as in
the damped harmonic oscillator example above, leading to
e−iωk t − eiωk t
2πiθ(t)
2ωk
so that the full integral is
d3 k
Z
1
G+ (x) = iθ(t) e−iωk t − e+iωk t e−ikx
.
2ωk (2π)3
The k-integral can be expressed in terms of Bessel functions (but the result isn’t pretty).
In any case, the θ term means that the convolutions u = G+ ⋆ f only depend on f (t, x)
for t ≤ 0. We have thus constructed a retarded Green’s function.
R(z; L) := (L − z 1)−1
(if the inverse exists) is called the resolvent. It turns out that one can learn a lot about an
operator by studying its resolvent, and the concept is a central tool in functional analysis.
From the discussion in Sec. D.1, it is clear that if L is invertible, its Green’s function is
In the general case R(0; L) does not exist, but one might hope that suitable limits of
where, for definiteness, we take ωk,ϵ to be the principal square root, i.e. the one with
positive real part. Then ωk,ϵ sits below the positive real axis and −ωkϵ above the negative
real axis. By comparison with Fig. ??,
Thus we’re looking for distributions G̃ “proportional to 1/ω”. That was exactly our moti-
vation for introducing the principal value in Eq. (C.7). And indeed,
Z Z
1 ˜ 1 ˜ ˜ 1
ω pv (f ) = lim ω f (ω) dω = 1 f (ω) dω ⇒ ω pv = 1.
ω ϵ→0 +
|ω|≥ϵ ω ω
and one can show that these are all. An inverse Fourier transform gives
1 λ
Gλ (t) = sign(t) + √
2 2π
APPENDIX D. GREEN’S FUNCTIONS 142
where Q is a polynomial with only complex roots and the al are distinct real numbers.
Define
Z ˜
1 f (ω)
G̃(f˜) = √ P dω,
2π P (ω)
R
where the symbol P denotes the principal value integral that is computed by approach-
ing each of the singularities al symmetrically, the same way integration around 0 is han-
dled in pv(1/ω). The proof that G̃ is indeed a Green’s function then works as the one for
pv(1/ω) given above. The ambiguity in defining G̃ corresponds to adding multiples of
delta distributions supported on the real zeros of P :
l
X
G̃ 7→ G̃ + λl δal .
i=1
It remains to treat roots with higher multiplicity. Here, we only discuss the case
P (ω) = −ω 2 ; the general case works similarly. The trick is to write
1 1
− = ∂ω .
ω2 ω
But we already know how to associate a distribution with 1/ω (the principal value) and
how to differentiate distributions (by differentiating minus the test function). Indeed, with
1
G̃ = √ D pv(1/ω)
2π
we get
Z
2 ˜ 1 1
(−ω )G̃(f ) = √ P (−∂ω )(−ω 2 ϕ(ω)) dω
2π ω
Z
1 1
=√ P (2ωϕ(ω) + ω 2 ∂ω ϕ(ω)) dω
2π ω
Z Z Z
1 1
=√ 2 ϕ(ω) dω − ϕ(ω) dω = √ ϕ(ω) dω.
2π 2π
One may verify that the ambiguity is given by
G̃ 7→ G̃ + λ1 δ + λ2 δ ′ .