Lecture Notes On Foundations of Quantum Mechanics: Roderich Tumulka Winter Semester 2019/20

Download as pdf or txt
Download as pdf or txt
You are on page 1of 181

Lecture Notes on

Foundations of Quantum Mechanics


Roderich Tumulka∗

Winter semester 2019/20

These notes will be updated as the course proceeds.


Date of this update: March 4, 2020


Department of Mathematics, Eberhard Karls University, Auf der Morgenstelle 10, 72076 Tübingen,
Germany. Email: [email protected]

1
1 Course Overview
Learning goals of this course: To understand the rules of quantum mechanics; to
understand several important views of how the quantum world works; to understand
what is controversial about the orthodox interpretation and why; to be familiar with
the surprising phenomena and paradoxes of quantum mechanics.

Quantum mechanics is the field of physics concerned with (or the post-1900 theory
of) the motion of electrons, photons, quarks, and other elementary particles, inside
atoms or otherwise. It is distinct from classical mechanics, the pre-1900 theory of the
motion of physical objects. Quantum mechanics forms the basis of modern physics and
covers most of the physics under the conditions on Earth (i.e., not-too-high temperatures
or speeds, not-too-strong gravitational fields). “Foundations of quantum mechanics” is
the topic concerned with what exactly quantum mechanics means and how to explain
the phenomena described by quantum mechanics. It is a controversial topic. Here are
some voices critical of the traditional, orthodox view:

“With very few exceptions (such as Einstein and Laue) [...] I was the
only sane person left [in theoretical physics].”
(E. Schrödinger in a 1959 letter)

“I think I can safely say that nobody understands quantum mechanics.”


(R. Feynman, 1965)

“I think that conventional formulations of quantum theory [...] are un-


professionally vague and ambiguous.”
(J. Bell, 1986)

In this course we will be concerned with what kinds of reasons people have for
criticizing the orthodox understanding of quantum mechanics, what the alternatives are,
and which kinds of arguments have been put forward for or against important views.
We will also discuss the rules of quantum mechanics for making empirical predictions;
they are uncontroversial. The aspects of quantum mechanics that we discuss also apply
to other fields of quantum physics, in particular to quantum field theory.

Topics of this course:

• The Schrödinger equation

• The Born rule

• Self-adjoint matrices, axioms of the quantum formalism, collapse of the wave func-
tion, decoherence

• The double-slit experiment and variants thereof, interference and superposition

2
• Spin, the Stern-Gerlach experiment, the Pauli equation, representations of the
rotation group

• The Einstein-Podolsky-Rosen argument, entanglement, non-locality, and Bell’s


theorem

• The paradox of Schrödinger’s cat and the quantum measurement problem

• Heisenberg’s uncertainty relation

• Interpretations of quantum mechanics (Copenhagen, Bohm’s trajectories, Ev-


erett’s many worlds, spontaneous collapse theories, quantum logic, perhaps others)

• Views of Bohr and Einstein

• POVMs and density matrices

• No-hidden-variables theorems

• Identical particles and the non-trivial topology of their configuration space, bosons
and fermions

Mathematical tools that will be needed in this course:


• Complex numbers

• Vectors in n dimensions, inner product

• Matrices, their eigenvalues and eigenvectors

• Multivariable calculus

• Probability; continuous random variables, the Gaussian (normal) distribution


The course will involve advanced mathematics, as appropriate for a serious discussion of
quantum mechanics, but will not focus on technical methods of problem-solving (such as
methods for calculating the ground state energy of the hydrogen atom). Mathematical
topics we will discuss in this course:
• Differential operators (such as the Laplace operator) and their analogy to matrices

• Eigenvalues and eigenvectors of differential (and other) operators

• The Hilbert space of square-integrable functions, norm and inner product

• Projection operators

• Fourier transform of a function

• Positive operators and positive-operator-valued measures (POVMs)

3
• Tensor product of vector spaces

• Trace of a matrix or an operator, partial trace

• Special ordinary and partial differential equations, particularly the Schrödinger


equation

• Exponential random variables and the Poisson process

Philosophical questions that will come up in this course:

• Is the world deterministic, or stochastic, or neither?

• Can and should logic be revised in response to empirical findings?

• Are there in principle limitations to what we can know about the world (its laws,
its state)?

• Which theories are meaningful as fundamental physical theories? In particular:

• If a statement cannot be tested empirically, can it be meaningful? (Positivism


versus realism)

• Does a fundamental physical theory have to provide a coherent story of what


happens?

• Does that story have to contain elements representing matter in 3-dimensional


space in order to be meaningful?

Physicists usually take math classes but not philosophy classes. That doesn’t mean,
though, that one doesn’t use philosophy in physics. It rather means that physicists
learn the philosophy they need in physics classes. Philosophy classes are not among the
prerequisites of this course, but we will sometimes make connections with philosophy.

4
2 The Schrödinger Equation
One of the fundamental laws of quantum mechanics is the Schrödinger equation

X ~2 N
∂ψ
i~ =− ∇2i ψ + V ψ . (2.1)
∂t i=1
2mi

It governs the time evolution of the wave function ψ = ψt = ψ(t, x1 , x2 , . . . , xN ). (It


can be expected to be valid only in the non-relativistic regime, i.e., when the speeds of
all particles are small compared to the speed of light. In the general case (the relativistic
case) it needs to be replaced by other equations, such as the Klein–Gordon equation and
the Dirac equation.) We focus first on spinless particles and discuss the phenomenon of
spin later. I use boldface symbols such as x for 3-dimensional (3d) vectors.
Eq. (2.1) applies to a system of N particles in R3 . The word “particle” is traditionally
used for electrons, photons, quarks, etc.. Opinions diverge whether electrons actually
are particles in the literal sense (i.e., point-shaped objects, or little grains). A system is
a subset of the set of all particles in the world. A configuration of N particles is a list
of their positions; configuration space is thus, for our purposes, the Cartesian product
of N copies of physical space, or R3N . The wave function of quantum mechanics, at any
fixed time, is a function on configuration space, either complex-valued or spinor-valued
(as we will explain later); for spinless particles, it is complex-valued, so

ψ : Rt × R3N
q → C. (2.2)

The subscript indicates the variable:√t for time, q = (x1 , . . . , xN ) for the configuration.
Note that i in (2.1) either denotes −1 or labels the particles, i = 1, . . . , N ; mi are
positive constants, called the masses of the particles; ~ = h/2π is a constant of nature,
h is called Planck’s quantum of action or Planck’s constant, h = 6.63 × 10−34 kg m2 s−1 ;
 ∂ ∂ ∂ 
∇i = , , (2.3)
∂xi ∂yi ∂zi

is the derivative operator with respect to the variable xi , ∇2i the corresponding Laplace
operator
∂ 2ψ ∂ 2ψ ∂ 2ψ
∇2i ψ = + 2 + 2. (2.4)
∂x2i ∂yi ∂zi
V is a given real-valued function on configuration space, called the potential energy or
just potential.
Fundamentally, the potential in non-relativistic physics is
X ei ej /4πε0 X Gmi mj
V (x1 , . . . , xN ) = − , (2.5)
1≤i<j≤N
|xi − xj | 1≤i<j≤N |xi − xj |

where p
|x| = x2 + y 2 + z 2 for x = (x, y, z) (2.6)

5
denotes the Euclidean norm in R3 , ei are constants called the electric charges of the
particles (which can be positive, negative, or zero); the first term is called the Coulomb
potential, the second term is called the Newtonian gravity potential, ε0 and G are con-
stants of nature called the electric constant and Newton’s constant of gravity (ε0 =
8.85 · 10−12 kg−1 m−3 s4 A2 and G = 6.67 × 10−11 kg−1 m3 s−2 ), and mi are again the
masses. However, when the Schrödinger equation is regarded as an effective equation
rather than as a fundamental law of nature then the potential V may contain terms aris-
ing from particles outside the system interacting with particles belonging to the system.
That is why the Schrödinger equation is often considered for rather arbitrary functions
V , also time-dependent ones. The operator
N
X ~2 2
H=− ∇i + V (2.7)
i=1
2m i

is called the Hamiltonian operator, so the Schrödinger equation can be summarized in


the form
∂ψ
i~ = Hψ . (2.8)
∂t
The Schrödinger equation is a partial differential equation (PDE). It determines the
time evolution of ψt in that for a given initial wave function ψ0 = ψ(t = 0) : R3N → C
it uniquely fixes ψt for any t ∈ R. The initial time could also be taken to be any t0 ∈ R
instead of 0.
So far I have not said anything about what this new physical object ψ has to do
with the particles. One such connection is

Born’s rule. If we measure the system’s configuration at time t then the outcome is
random with probability density
2
ρ(q) = ψt (q) . (2.9)

This rule refers to the concept of probability density, which means the following. The
probability that the random outcome X ∈ R3N is any particular point x ∈ R3N is zero.
However, the probability that X lies in a set B ⊆ R3N is given by
Z
P(X ∈ B) = ρ(q) d3N q (2.10)
B

(a 3N -dimensional volume integral). Instead of d3N q, we will often just write dq. A
density function ρ must be non-negative and normalized,
Z
ρ(x) ≥ 0 , ρ(q) dq = 1 . (2.11)
R3N

A famous density function in 1 dimension is the Gaussian density


1 (x−µ)2
ρ(x) = √ e− 2σ2 . (2.12)
2πσ

6
A random variable with Gaussian density is also called a normal (or normally dis-
tributed ) random variable. It has mean µ ∈ R and standard deviation σ > 0. The mean
value or expectation value EX of a random variable X is its average value
Z
EX = x ρ(x) dx . (2.13)
R
p
The standard deviation of X is defined to be E[(X − EX)2 ].

For the Born rule to make sense, we need that


Z
|ψt (q)|2 dq = 1. (2.14)
R3N

And indeed, the Schrödinger equation guarantees this relation: If it holds for t = 0 then
it holds for any t ∈ R. More generally, the Schrödinger equation implies that
Z Z
dq |ψt | = dq |ψ0 |2
2
(2.15)
R
for any ψ0 . One says that dq |ψt |2 satisfies a conservation law. Indeed, the Schrödinger
equation implies a local conservation law for |ψ|2 , as we will show below; this means not
only that the total amount of |ψ|2 is conserved, but also that amounts of |ψ|2 cannot
disappear in one place while the same amount appears in another place; that is, the
amount of |ψ|2 cannot be created or destroyed, only moved around, and in fact flows
with a current j.
In general, a local conservation law in Rd gets expressed by continuity equation 1
d
∂ρ X ∂jα
=− , (2.16)
∂t α=1
∂x α

where ρ is a time-dependent scalar function on Rd called the density and j a time-


dependent vector field on Rd called the current. To understand why (2.16) expresses local
conservation of ρ, recall the Ostrogradski–Gauss integral theorem (divergence theorem),
which asserts that for a vector field F in Rn ,
Z Z
n
div F (x) d x = F (x) · n(x) dn−1 x , (2.17)
A ∂A

where div F = ∂F1 /∂x1 + . . . + ∂Fn /∂xn is called the divergence of F , A is an n-


dimensional region with (piecewise smooth) n − 1-dimensional boundary ∂A, the left-
hand side is a volume integral in n dimensions with volume element dn x, the right-hand
side is a surface integral (flux integral) of the vector field F , n(x) is the outward unit
normal vector on ∂A at x ∈ ∂A, and dn−1 x means the area of a surface element. The
1
I don’t know where this name comes from. It has nothing to do with being continuous. It should
be called conservation equation.

7
formula (2.17) implies in particular that if the vector field F has zero divergence, then
its flux integral across any closed surface ∂A vanishes. Now apply this to n = d + 1
and the vector field F = (ρ, j1 , . . . , jd ), which has zero divergence (in d + 1 dimensions!)
according to (2.16), and consider its flux across the surface of the d + 1-dimensional
cylinder A = [0, T ] × S, where S ⊆ Rd is a ball or any set with a piecewise smooth
boundary ∂S. Then the surface integral of F is
Z Z Z T Z
0 = − ρ0 + ρt + dt dd x j · n∂S (2.18)
S S 0 ∂S

with n∂S the unit normal vector field in Rd on the boundary of S. That is, the amount
of ρ in S at time T differs from the initial amount of ρ in S by the flux of j across
the boundary of S during [0, T ]—a local conservation law. If (and this is indeed the
case with the Schrödinger equation) there is no flux to infinity, i.e., if the last integral
becomes arbitrarily small by taking S to be a sufficiently big ball, then the total amount
of ρ remains constant in time.
Now the Schrödinger equation implies the following continuity equation in configu-
ration space with d = 3N :
N
∂|ψ(t, q)|2 X
=− ∇i · j i (t, q) (2.19)
∂t i=1

with  
j i (t, q) = ~
mi
Im ψ ∗ (t, q)∇i ψ(t, q) , (2.20)
where Im means imaginary part, because
∂ ∗   
ψ ψ = 2Re ψ ∗ −i Hψ (2.21)
∂t ~

 X N 
2 ~2 ∗ 2 2
= ~ Im − 2mi
ψ ∇i ψ + V (q)|ψ| (2.22)
| {z }
i=1
real
N
X   N
X
∗ ∗
=− ~
mi
Im ψ ∇2i ψ + (∇i ψ ) · (∇i ψ) = − ∇i · j i . (2.23)
| {z }
i=1 i=1
real

Thus, |ψ|2 is locally conserved, and in particular its integral over all of configuration
space does not change Rwith time, as expressed in (2.15).
Since the quantity dq |ψ|2 occurs frequently, it is useful to abbreviate it: The L2
norm is defined to be Z 1/2
kψk = dq |ψ(q)|2 . (2.24)
R3N
Thus, kψt k = kψ0 k, and the Born rule is consistent with the Schrödinger equation,
provided the initial datum ψ0 has norm 1, which we will henceforth assume. The wave
function ψt will in particular be square-integrable, and this makes the space L2 (R3N )
of square-integrable functions a natural arena. It is also called the Hilbert space, and is
the space of all wave functions (times finite factors).

8
3 Unitary Operators in Hilbert Space
In the following, we will often simply write L2 for L2 (R3N ). We will leave out many
mathematical details.

3.1 Existence and Uniqueness of Solutions of the Schrödinger


Equation
The Schrödinger equation defines the time evolution of the wave function ψt . In math-
ematical terms, this means that for every choice of initial wave function ψ0 (q) there is
a unique solution ψ(t, q) of the Schrödinger equation. This leads to the question what
exactly is meant by “every” wave function. Remarkably, even when ψ0 is not differen-
tiable, there is still a natural sense in which a “weak solution” or “L2 solution” can be
defined. This sense allows for a particularly simple statement:
Theorem 3.1. 2 For a large class of potentials V (including Coulomb, Newton’s gravity,
every bounded measurable function, and linear combinations thereof ) and for every ψ0 ∈
L2 , there is a unique weak solution ψ(t, q) of the Schrödinger equation with potential V
and initial datum ψ0 . Moreover, at every time t, ψt lies again in L2 .

3.2 The Time Evolution Operators


Let Ut : L2 → L2 be the mapping defined by

Ut ψ0 = ψt . (3.1)

Ut is called the time evolution operator or propagator. Often, it is not possible to write
down an explicit closed formula for Ut , but it is nevertheless useful to consider Ut . It
has the following properties.
First, Ut is a linear operator, i.e.

Ut (ψ + φ) = (Ut ψ) + (Ut φ) (3.2)


Ut (zψ) = z (Ut ψ) (3.3)

for any ψ, φ ∈ L2 , z ∈ C. This follows from the fact that the Schrödinger equation
is a linear equation, or, equivalently, that H is a linear operator. It is common to say
operator for linear operator.
Second, Ut preserves norms:
kUt ψk = kψk . (3.4)
This is just another way of expressing Eq. (2.15). Operators with this property are
called isometric.
2
This follows from Stone’s theorem and Kato’s theorem together. See, e.g., Theorem VIII.8 in
M. Reed and B. Simon: Methods of Modern Mathematical Physics, Vol. 1 (revised edition), Academic
Press (1980), and Theorem X.16 in M. Reed and B. Simon: Methods of Modern Mathematical Physics,
Vol. 2, Academic Press (1975).

9
Third, they obey a composition law :
Us Ut = Ut+s , U0 = I , (3.5)
for all s, t ∈ R, where I denotes the identity operator
Iψ = ψ . (3.6)
It follows from (3.5) that Ut−1 = U−t . In particular, Ut is a bijection. An isometric
bijection is also called a unitary operator ; so Ut is unitary. A family of operators
satisfying (3.5) is called a one-parameter group of operators. Thus, the propagators form
a unitary 1-parameter group. (The composition law (3.5) is owed to the time translation
invariance of the Schrödinger equation, which depends on the time independence of the
potential. If one inserted a time dependent potential, then (3.5) and (3.6) would have
to be replaced by Utt23 Utt12 = Utt13 and Utt = I, where Ust maps ψs to ψt .)
Fourth,
Ut = e−iHt/~ . (3.7)
The exponential of an operator A can be defined by the exponential series

X An
eA = (3.8)
n=0
n!
if A is a so-called bounded operator ; in this case, the series converges. Unfortunately,
the Hamiltonian of the Schrödinger equation (2.1) is unbounded. But mathematicians
agree about how to define eA for unbounded operators (of the type that H is); we will
not worry about the details of this definition.
Eq. (3.7) is easy to understand: after defining
φt := e−iHt/~ ψ0 , (3.9)
one would naively compute as follows:
d d
i~ φt = i~ e−iHt/~ ψ0 (3.10)
dt dt
 iH 
= i~ − e−iHt/~ ψ0 (3.11)
~
= Hφt , (3.12)
so φt is a solution of the Schrödinger equation with φ0 = e0 ψ0 = ψ0 , and thus φt = ψt .
The calculation (3.10)–(3.12) can actually be justified for all ψ0 in the domain of H, a
dense set in L2 ; we will not go into details here.

3.3 Unitary Matrices and Rotations


The space L2 is infinite-dimensional. As a finite-dimensional analog, consider the func-
tions on a finite set, ψ : {1, . . . , n} → C, and the norm
n
!1/2
X
kψk = ψ(i) (3.13)
i=1

10
instead of the L2 norm Z 1/2
2
kψk = |ψ(q)| dq . (3.14)

A function on {1, . . . , n} is always square-summable (its norm cannot be infinite). It


can be written as an n-component vector

ψ(1), . . . , ψ(n) , (3.15)

and the space of these functions can be identified with Cn .


The linear operators on Cn are given by the complex n × n matrices. If a matrix
preserves the norm (3.13) as in (3.4), it is automatically bijective and thus unitary. A
matrix Uij is unitary iff3
U † = U −1 , (3.16)
where U † , the adjoint matrix of U , is defined by

Uij† = (Uji )∗ . (3.17)

The norm (3.13) is analogous to the norm (= magnitude = length) of a vector in R3 ,

3
!1/2
X
|u| = u2i . (3.18)
i=1

The norm-preserving operators in R3 are exactly the orthogonal matrices, i.e., those
matrices A with
At = A−1 , (3.19)
where At denotes the transposed matrix, Atij = Aji . They have a geometric meaning:
Each orthogonal matrix is either a rotation around some axis passing through the origin,
or a reflection across some plane through the origin, followed by a rotation. The set of
orthogonal 3 × 3 matrices is denoted O(3). The set of those orthogonal matrices which
do not involve a reflection is denoted SO(3) for “special orthogonal matrices”; they
correspond to rotations and can be characterized by the condition det A > 0 in addition
to (3.19).
In dimension d > 3, one can show that the special orthogonal matrices are still
compositions (i.e., products) of 2-dimensional rotation matrices such as (for d = 4)
 
cos α sin α
− sin α cos α 
 . (3.20)
 1 
1

This rotation does not rotate around an axis, it rotates around a (d − 2)-dimensional
subspace (spanned by the 3rd and 4th axes). However, in d ≥ 4 dimensions, not every
3
iff = if and only if

11
special orthogonal matrix is a rotation around a (d − 2)-dim. subspace through a certain
angle, but several such rotations can occur together, as the following example shows:
 
cos α sin α
− sin α cos α 
 . (3.21)
 cos β sin β 
− sin β cos β

We will simply call every special orthogonal d × d matrix a “rotation.”


Since Cn can be regarded as R2n , and the norm (3.13) then coincides with the 2n-
dimensional version of (3.18), every unitary operator then corresponds to an orthogonal
operator, in fact a special orthogonal one. So if you can image 2n-dimensional space,
every unitary operator is geometrically a rotation. Also in L2 it is appropriate to think
of a unitary operator as a rotation.

3.4 Inner Product


In analogy to the dot product in R3 ,
3
X
u·v = ui vi (3.22)
i=1

one defines the inner product of two functions ψ, φ ∈ L2 to be


Z
hψ|φi = ψ(q)∗ φ(q) dq . (3.23)
R3N

It has the following properties:

1. It is anti-linear (or semi-linear or conjugate-linear ) in the first argument,

hψ + φ|χi = hψ|χi + hφ|χi , hzψ|φi = z ∗ hψ|φi (3.24)

for all ψ, φ, χ ∈ L2 and z ∈ C.

2. It is linear in the second argument,

hψ|φ + χi = hψ|φi + hψ|χi , hψ|zφi = zhψ|φi (3.25)

for all ψ, φ, χ ∈ L2 and z ∈ C. Properties 1 and 2 together are called sesqui-linear


(from Latin sesqui = 1 21 ).

3. It is conjugate-symmetric (or Hermitian),

hφ|ψi = hψ|φi∗ (3.26)

for all ψ, φ ∈ L2 .

12
4. It is positive definite,4
hψ|ψi > 0 for ψ 6= 0 . (3.27)

Note that the dot product in R3 has the same properties, the properties of an inner
product, except that the scalars involved lie in R, not C. Another inner product with
these properties is defined on Cn by
n
X
hψ|φi = ψ(i)∗ φ(i) . (3.28)
i=1

The norm can be expressed in terms of the inner product according to


p
kψk = hψ|ψi . (3.29)

Note that the radicand is ≥ 0. Conversely, the inner product can be expressed in terms
of the norm according to the polarization identity
 
hψ|φi = 14 kψ + φk2 − kψ − φk2 − ikψ + iφk2 + ikψ − iφk2 . (3.30)

(Its proof can be a good exercise for the reader.) It follows from the polarization identity
that every unitary operator U preserves inner products,

hU ψ|U φi = hψ|φi . (3.31)

(Likewise, every A ∈ SO(3) preserves dot products, which has the geometrical meaning
that a rotation preserves the angle between any two vectors.)
In analogy to the dot product, two functions ψ, φ with hψ|φi = 0 are said to be
orthogonal.

3.5 Abstract Hilbert Space


The general and abstract definition of a vector space (over R or over C) is that it is a set S
(whose elements are called vectors) together with a prescription for how to add elements
of S and a prescription for how to multiply an element of S by a scalar, such that the
usual algebraic rules of addition and scalar multiplication are satisfied. Similarly, a
Hilbert space is a vector space over C together with an inner product satisfying the
completeness property: every Cauchy sequence converges. One can then prove the

Theorem 3.2. L2 (Rd ) is a Hilbert space.

4
Another math subtlety: This will be true only if we identify two functions ψ, φ whenever the set
{q ∈ R3N : ψ(q) 6= φ(q)} has volume 0. It is part of the standard definition of L2 to make these
identifications.

13
4 Classical Mechanics
Classical physics means pre-quantum (pre-1900) physics. I describe one particular ver-
sion that could be called Newtonian mechanics (even though certain features were not
discovered until after Isaac Newton’s death). This version is over-simplified in that
it leaves out magnetism, electromagnetic fields (which play a role for electromagnetic
waves and thus the classical theory of light), and relativity theory.

4.1 Definition of Newtonian Mechanics


According to Newtonian mechanics, the world consists of a space, which is a 3-dimensional
Euclidean space, and particles moving around in space with time. Here, a particle means
a material point—a point-shaped physical object. Let us suppose there are N particles
in the world (say, N ≈ 1080 ), and let us fix a Cartesian coordinate system in Euclidean
space. At every time t, particle number i (i = 1, . . . , N ) has a position Qi (t) ∈ R3 .
These positions are governed by the equation of motion

d2 Qi
mi = −∇i V (Q1 , . . . , QN ) (4.1)
dt2
with V the fundamental potential function of the universe as given in Eq. (2.5). This
completes the definition of Newtonian mechanics.
The equation of motion (4.1) is an ordinary differential equation (ODE) of second
order (i.e., involving second time derivatives). Once we specify, as initial conditions,
the initial positions Qi (0) and velocities (dQi /dt)(0) of every particle, the equation of
motion (4.1) determines Qi (t) for every i and every t.
Written explicitly, (4.1) reads

d2 Q i X ei ej Qj − Qi X Qj − Qi
mi 2
= − 3
+ Gmi mj . (4.2)
dt j6=i
4πε0 |Qj − Qi | j6=i
|Qj − Qi |3

The right hand side is called the force acting on particle i; the j-th term in the first
sum (with the minus sign in front) is called the Coulomb force exerted by particle j on
particle i; the j-th term in the second sum is called the gravitational force exerted by
particle j on particle i.
Newtonian mechanics is empirically wrong. For example, it entails the absence of
interference fringes in the double-slit experiment (and entails wrong predictions about
everything that is considered a quantum effect). Nevertheless, it is a coherent theory, a
“theory of everything,” and often useful to consider as a hypothetical world to compare
ours to.
Newtonian mechanics is to be understood in the following way: Physical objects such
as tables, baseballs, or dogs consist of huge numbers (such as 1024 ) of particles, and they
must be regarded as just such an agglomerate of particles. Since Newtonian mechanics
governs unambiguously the behavior of each particle, it also completely dictates the
behavior of tables, baseballs, and dogs. Put differently, after (4.1) has been given, there

14
is no need to specify any further laws for tables, baseballs, or dogs. Any physical law
concerning tables, baseballs, or dogs, is a consequence of (4.1). This scheme is called
reductionism. It makes chemistry and biology sub-fields of physics. (This does not
mean, though, that it would be of practical use to try to solve (4.1) for 1024 or 1080
particles in order to study the behavior of dogs.) Can everything be reduced to (4.1)?
It seems that conscious experiences are an exception—presumably the only one.
When we consider a baseball, we are often particularly interested in the motion of
its center Q(t) because we are interested in the motion of the whole ball. It is often
possible to give an effective equation for the behavior of a variable like Q(t), for example
 
2 0
dQ dQ
M 2 = −γ − M g 0 , (4.3)
dt dt
1

where M is the mass of the baseball, the first term on the right hand side is called the
friction force, the second the gravitional force of Earth, γ is the friction coefficient of
the baseball and g the gravitational field strength of Earth. The effective equation (4.3)
looks quite similar to the fundamental equation (4.1) but (i) it has a different status (it
is not a fundamental law), (ii) it is only approximately valid, (iii) it contains a term that
is not of the form −∇V (the friction term), (iv) forces that do obey the form −∇V (Q)
(such as the second force) can have other functions for V (such as V (x) = M gx3 ) instead
of (2.5).
The theory I call Newtonian mechanics was never actually proposed to give the
correct and complete laws of physics (although we can imagine a hypothetical world
where it does); for example, it leaves out magnetism. An extension of this theory, which
we will not consider further but which is also considered “classical physics,” includes
electromagnetic fields (governed by Maxwell’s field equations) and gravitational fields
(governed by Einstein’s field equations, also known as the general theory of relativity).
The greatest contributions from a single person to the development of Eq. (4.1) came
from Isaac Newton (1643–1727), who suggested (in his Philosophiae Naturalis Principia
Mathematica 1687) considering ODEs, in fact of second order, suggested “forces” and
2
the form m ddtQ 2 = force, and introduced the form of the gravitational force, now known

as “Newton’s law of universal gravity.” Eq. (4.2) was first written down, without the
Coulomb term, by Leonhard Euler (1707–1783). The first term was proposed in 1784
by Charles Augustin de Coulomb (1736–1806). Nevertheless, we will call (4.1) and (4.2)
“Newton’s equation of motion.”

4.2 Properties of Newtonian Mechanics


If t 7→ q(t) = (Q1 (t), . . . , QN (t)) is a solution of Newton’s equation of motion (4.1),
then so is t 7→ q(−t), which is called the time reverse. This property is called time
reversal invariance or reversibility. It is a rather surprising property, in view of the irre-
versibility of many phenomena. But since it has been explained, particularly by Ludwig
Boltzmann, how reversibility of the microscopic laws and irreversibility of macroscopic

15
phenomena can be compatible,5 time reversal invariance has been widely accepted. This
was also because time reversal invariance also holds in other, more refined theories af-
ter Newtonian mechanics, such as Maxwell’s equations of classical electromagnetism,
general relativity, and the Schrödinger equation.
Definition 4.1. Let v i (t) = dQi /dt denote the velocity of particle i. The energy, the
momentum, and the angular momentum of the universe are defined to be, respectively,
N N 
X mk X ej ek  1
E= v 2k − Gmj mk − (4.4)
k=1
2 j,k=1
4πε0 |Qj − Qk |
j<k
N
X
p= mk v k (4.5)
k=1
N
X
L= mk Qk × v k , (4.6)
k=1

where v 2 = v · v = |v|2 , and × denotes the cross product in R3 . The first term in (4.4)
is called kinetic energy, the second one potential energy.
Proposition 4.2. E, p, and L are conserved quantities, i.e., they are time independent.
The proof is a useful exercise.

4.3 Hamiltonian Systems


A dynamical system is another name for an ODE. A dynamical system in Rn can be
characterized by specifying the function F : Ω → Rn in
dX
= F (X, t) , (4.7)
dt
with Ω ⊆ Rn × R. F can be called a time-dependent vector field on (a possibly time-
dependent domain in) Rn . (One often considers a more general concept of ODE, in
which F is a time-dependent vector field on a differentiable manifold M .)
Newtonian mechanics has a time evolution that belongs to the class of dynamical
systems, with n = 6N , X = (Q1 , . . . , QN , v 1 , . . . , v N ), and Ω (the phase space) = R6N
or rather Ω = {(Q1 , . . . , QN , v 1 , . . . , v N ) ∈ R6N : Qi 6= Qj ∀i 6= j}. The phase point
X(t) = (Q1 (t), . . . , QN (t), v 1 (t), . . . , v N (t)) is determined by the equation of motion
and the initial datum X(0). The mapping Tt that maps any X(0) ∈ Ω to X(t),

Tt (X(0)) = X(t) , (4.8)


5
For discussion see, e.g., J. L. Lebowitz: From Time-symmetric Microscopic Dynamics to Time-
asymmetric Macroscopic Behavior: An Overview. Pages 63–88 in G. Gallavotti , W. L. Reiter,
J. Yngvason (editors): Boltzmann’s Legacy. Zürich: European Mathematical Society (2008) http:
//arxiv.org/abs/0709.0724

16
is called the flow map. It satisfies a composition law analogous to that of the unitary
time evolution operators for the Schrödinger equation (3.5),

Ts Tt = Ts+t and T0 = idΩ (4.9)

for all s, t ∈ R, where idΩ means the identity mapping on Ω, idΩ (x) = x. In general, Tt
is not a linear mapping but still a bijection.
Newtonian mechanics also belongs to a narrower class, called Hamiltonian systems.
Simply put, these are dynamical systems for which the vector field F is a certain type of
derivative of a scalar function H called the Hamiltonian function or simply the Hamil-
tonian. Namely, n is assumed to be even, n = 2r, and denoting the n components of x
by (q1 , . . . , qr , p1 , . . . , pr ), the ODE is of the form

dqi ∂H
= (4.10)
dt ∂pi
dpi ∂H
=− . (4.11)
dt ∂qi
Newtonian mechanics fits this definition with r = 3N , q1 , . . . , qr the 3N components
of q = (q 1 , . . . , q N ), p1 , . . . , pr the 3N components of p = (p1 , . . . , pN ) (the momenta
pk = mk v k ), and H = H(q, p) the energy (4.4) expressed as a function of q and p, that
is,
N N 
X p2k X ej ek  1
H(q, p) = − Gmj mk − . (4.12)
k=1
2mk j,k=1 4πε0 |q j − q k |
j6=k

For readers familiar with manifolds I mention that the natural definition of a Hamil-
tonian system on a manifold M is as follows. M plays the role of phase space. Let the
dimension n of M be even, n = 2r, and suppose we are given a symplectic form ω on
M , i.e., a non-degenerate differential 2-form whose exterior derivative vanishes, dω = 0.
(Non-degenerate means that it has full rank n at every point.) The equation of motion
for t 7→ x(t) ∈ M reads
 dx 
ω , · = dH , (4.13)
dt
where dH means the exterior derivative of H. To make the connection with the case
M = Rn just described, dH is then the gradient of H and ω the n × n matrix
 
0 I
ω= (4.14)
−I 0

with I the r × r unit matrix and 0 the r × r zero matrix; ω(dx/dt, ·) becomes the
transpose of ω applied to the n-vector dx/dt, and (4.13) reduces to (4.10) and (4.11).

17
5 The Double-Slit Experiment
The double-slit experiment is a demonstration of fundamental features of quantum me-
chanics, in particular of the wave nature and the particle nature of electrons and other
elementary particles. The experiment has been carried out in many variants with elec-
trons, neutrons, photons, entire atoms, and even molecules. It involves interference,
i.e., the constructive or destructive cooperation (i.e., addition) of waves. The word
“diffraction” means more or less the same as interference.
In this experiment, an “electron gun” shoots electrons at a plate with two slits.
Electrons get reflected when they hit the plate, but they can pass through the slits.
Behind the plate with the slits, at a suitable distance, there is a screen capable of
detecting electrons. Every electron leaves a (basically point-shaped) spot on the screen
that we can see. The exact location of the next spot is unpredictable, but a probability
distribution governing the spots is visible from a large number of spots. Before I describe
the outcome of the experiment, let me discuss, for the purpose of contrast, the expected
outcome on the basis of Newtonian mechanics and on the basis of classical wave theory.

5.1 Classical Predictions for Particles and Waves


In Newtonian mechanics, a particle moves uniformly (i.e., along a straight line with
constant speed) if no forces act on it. The particles hitting the plate will be scattered
back and leave the setup, but let us focus on the particles that make it through the
slits. Since in this experiment, gravity can usually be neglected and other forces do
not matter much as the particles do not get very close to each other or to particles
belonging to the plate, the particles passing the slits move along straight lines. Not all
particles will arrive at the same spot, as they will be shot off at different positions and in
different directions. If the particle source (the electron gun) is small and far away, then
the possible spots of arrival will form two stripes corresponding to the two slits—the
complement of the shadow of the plate. A particle passing through the upper (lower)
slit arrives in the upper (lower) stripe. If the source is larger and less far away, then the
two stripes will be blurred and may overlap.
Waves behave very differently. We may think of water waves in a basin and, playing
the role of the plate, a wall with two gaps that will let waves pass into an ulterior
basin; at the end of the ulterior basin, playing the role of the screen, we may measure
the intensity of the arriving waves (say, their energy) as a function on the location
along the rim of the basin. The first difference from Newtonian particles is that the
energy does not arrive in a single spot, in a chunk carried by each particle, but arrives
continuously in time and continuously distributed along the rim. A second difference
shows up when the width of the slits is small enough so it becomes comparable to
the wave length. Then the wave emanating from each slit will spread out in different
directions;
p for an ideally small slit, the outcoming wave will be a semi-circular wave such
as cos(k x2 + y 2 )/(x2 + y 2 )1/4 (with wave number k = 2π/wave length and direction
of propagation orthogonal to the wave fronts, while the amplitude decreases like r−1/2
because the energy of the wave is proportional to the square of the amplitude, and

18
the energy gets distributed over semi-circles of circumference πr). Thus, at each slit
the incoming wave propagates in the direction from the source, while the outgoing wave
propagates in many directions (hence the word “diffraction,” which is Latin for “breaking
up,” as the path of propagation suddenly changes direction).

Figure 5.1: Constructive and destructive interference of waves: Graph of the sum of two
semi-circular waves in the plane emanating from ai , each given by cos(|x − ai |)/|x −
ai |1/2 , with a1 = (5, 0) and a2 = (−5, 0).

The third difference is that, when waves emanate from both slits whose distance is not
much larger than the wave length, they will cancel each other in some places (destructive
interference) and add in others (constructive interference), as shown in Figure 5.1. As a
consequence, the energy arriving on the rim will vary from place to place; as a function
E(x) of the coordinate x along the rim it will show ups and downs (local maxima and
minima) known as an “interference pattern.” In double-slit experiments done with light,
these are visible as alternating bright and dark stripes known as “interference fringes.”

5.2 Actual Outcome of the Experiment


In the quantum mechanical experiment, the energy arrives, as for Newtonian particles, in
discrete localized chunks, each corresponding to one electron (or neutron etc.) and each
visible as a spot on the screen. The probability distribution ρ(x) of the spots, however,
features an interference pattern, indicative of the presence of waves. One speaks of
wave–particle duality, meaning that the experiment displays aspects of particle behavior
and aspects of wave behavior. But how, you may wonder, can the electron be particle
and wave at the same time?

19
We will discuss proposals for that in the following chapters; for now let us go into
more detail about the experiment. The experiment can be carried out in such a way
that, at any given time, only one electron (or neutron etc.) is passing the setup between
the source and the screen, so we can exclude interaction between many particles as the
cause of the interference pattern. Figure 5.2 shows the spots on the screen in a double-
slit experiment carried out by Tonomura et al.. In this experiment, 70,000 electrons
were detected individually after passing through a double slit.6 Only one electron at a
time went through the setup. About 1,000 electrons per second went through, at nearly
half the speed of light. Each
2.4. DIFFRACTION AND electron needed about 10−8 seconds to travel
INTERFERENCE 43 from the
double slit to the screen.

Figure 5.2:Figure
A picture of actual
2.8: Data from results
a double-slit of a with
experiment double-slit
electrons, inexperiment taken
which electrons are from A. Tono-
sent through
the apparatus one at a time. Each electron is found at a particular spot on the detection screen. The
mura et al., American Journal of Physics 57(2): 117–120 (1989), after (a) 10, (b) 100,
statistical pattern of spots – that is, the probability distribution for electron detection – builds up the
(c) 3,000, classic
(d) 20,000, (e) 70,000
two-slit interference electrons.
pattern. From Tonomura et al.

6
More precisely, electrons could pass right or left of a positively charged wire of diameter 1 µm.
Those passing on the right get deflected to the left, and vice versa. Thus, the arrangement leads to the
superposition of waves travelling in slightly different directions—just what is needed for interference.

20
The sense of surprise (or even paradox) may be further enhanced. If we place the
screen directly behind the plate, the spots occur in two stripes located where the slits
are (like Newtonian particles with small source). From this it seems natural to conclude
that every particle passes through one of the slits. Now it would be of interest to see
how the particles passing through slit 1 behave later on. So we put the screen back at
the original distance from the plate, where it shows an interference pattern, but now
we close one of the slits, say slit 2. As expected, the number of particles that arrive on
the screen gets (approximately) halved. Perhaps less expectedly, the interference fringes
disappear. Instead of several local maxima and minima, the distribution function ρ1 (x)
has just one maximum in (approximately) the center and tends to 0 monotonically on
both sides. Let me explain what is strange about the disappearance of the fringes. If
we close slit 1 instead and keep only slit 2 open, then (if the slits are equal in size and
their centers are distance a apart) the distribution function should be (and indeed is)
ρ2 (x) = ρ1 (x − a). Arguing that every particle passes through one of the slits, and that
those passing through slit i ∈ {1, 2} end up with distribution ρi , we may expect that
the distribution with both slits open, ρ = ρ12 , is given by the sum of the ρi . But it is
not,
ρ12 (x) 6= ρ1 (x) + ρ2 (x) . (5.1)
While the right-hand side has no minima (except perhaps for a little valley of width
a in the middle, which is usually invisible since a is usually smaller than 10−6 m), ρ12
features pronounced minima, some of which even have (at least ideally) ρ12 (x) = 0.
Such x are places where particles passing through slit 1 would arrive if slit 1 alone were
open, but where no particles arrive if both slits are open! How does a particle passing
through slit 1 even “know” whether slit 2 is open or not?
Moreover, there are detectors that can register a particle while it passes through and
moves on. If we place such a detector in each slit, while both slits are open, we will know
of each particle which slit it went through. In that case, the fringes disappear again,
and the observed distribution is ρ1 + ρ2 . In particular, the distribution on the screen
depends on whether we put detectors in the slits. It seems as if our mere knowledge of
which slit each particle went through had an effect on the locations of the spots on the
screen!
The same phenomena arise when using more than two slits, except that the details
of the interference pattern are different then. It is common to use dozens of slits or more
(called a diffraction grating).
Note that the observations in the double-slit experiment are in agreement with, and
in fact follow from, the Born rule and the Schrödinger equation: The relevant system
here consists of one electron, so ψt is a function in just 3 dimensions. The potential V
can be taken to be +∞ (or very large) at every point of the plate, except in the slits
themselves, where V = 0. Away from the plate, also V = 0. The Schrödinger equation
governs the behavior of ψt , with the initial wave function ψ0 being a wave packet, e.g.,
a Gaussian wave packet as in Exercise 4 of Assignment 1,
x2
ψ0 (x) = (2πσ 2 )−3/4 e−ik·x e− 4σ2 , (5.2)

21
moving toward the double slit. According to the Schrödinger equation, part of ψ will
be reflected from the plate, part of it will pass through the two slits.7 The two parts
of the wave emanating from the two slits, ψ1 and ψ2 , will overlap and thus interfere,
ψ = ψ1 + ψ2 .
When we detect the electron, its probability density is given, according to the Born
rule, by
|ψ|2 = |ψ1 + ψ2 |2 = |ψ1 |2 + |ψ2 |2 + 2 Re(ψ1∗ ψ2 ) . (5.3)
The third summand on the right is responsible for the minima of the interference pattern.
What if we include detectors in the slits? Then we detect the electron twice: once
in a slit and once at the screen. Thus, we either have to regard it as a many-particle
problem (involving the electron and the particles making up the detector), or we need a
version of the Born rule suitable for repeated detection. We will study both approaches
in later chapters.

5.3 Feynman’s Discussion


Richard Feynman, in his widely known book Feynman Lectures on Physics, Reading,
MA: Addison-Wesley (1964) (Volume 3, Chapter 1), provides a nice introduction to the
double slit experiment. I recommend that chapter as further reading and will add a few
remarks about it:

• Feynman’s statement on page 1,

[The double slit experiment] has in it the heart of quantum mechanics.


In reality, it contains the only mystery.

is a bit too strong. Other mysteries can claim to be on equal footing with this
one. Feynman weakened his statement later.

• Feynman’s statements

We cannot make the mystery go away by “explaining” how it works.


(page 1)
Many ideas have been concocted to try to explain the curve for P12 [...]
None of them has succeeded. (page 6)
No one has found any machinery behind the law. No one can “explain”
any more than we have just “explained.” No one will give you any
deeper representation of the situation. We have no idea about a more
basic mechanism from which these results can be deduced. (page 10)

are too strong. We will see in Chapters 6 and 12 that Bohmian mechanics and
other theories provide precisely such explanations of the double slit experiment.
7
This is nicely illustrated by a movie created by B. Thaller showing a numerical simulation of the
Schrödinger equation at a double-slit, available online at http://vqm.uni-graz.at/movies.html.

22
• Feynman’s presentation conveys a sense of mystery and a sense of paradox about
quantum mechanics. This will be a recurrent theme in this course, and one ques-
tion will be whether there is any genuine, irreducible mystery or paradox in quan-
tum mechanics.

• Feynman suggests that the mysterious character of quantum mechanics is not


surprising (“perfectly reasonable”) “because all of direct, human experience and
of human intuition applies to large objects.” This argument seems not quite on
target to me. After all, the troublesome paradoxes of the double slit are not like the
notions we often find hard to imagine (for example, how big the number 6 × 1023
is, or what 4-dimensional geometry looks like, or how big a light year is) but which
are clearly sensible. They sound more like Alice in Wonderland, like they are not
sensible—well, like paradoxes.

23
6 Bohmian Mechanics
“[Bohmian mechanics] exercises the mind in a very salutary way.”
J. Bell, Speakable and Unspeakable in Quantum Mechanics, page 171

The situation in quantum mechanics is that we have a set of rules, known as the
quantum formalism, for computing the possible outcomes and their probabilities for
(more or less) any conceivable experiment, and everybody agrees (more or less) about
the formalism. What the formalism doesn’t tell us, and what is controversial, is what
exactly happens during these experiments, and how nature arrives at the outcomes
whose probabilities the formalism predicts. There are different theories answering these
questions, and Bohmian mechanics is one of them.
Let me elucidate my statements a bit. We have already encountered part of the
quantum formalism: the Schrödinger equation and the Born rule. These rules have
allowed us to predict the possible outcomes of the double-slit experiment with a single
electron (easy here: a spot anywhere on the screen) and their probability distribution
(here: a probability distribution corresponding to |ψ|2 featuring a sequence of maxima
and minima corresponding to interference fringes). What the rules didn’t tell us was
what exactly happens during this experiment (e.g., how the electron moves). Bohmian
mechanics fills this gap.
We have not seen all the rules of the quantum formalism yet. We will later, in Chap-
ters 8 and 10. So far, we have formulated the Born rule only for position measurements,
and we have not considered repeated detections.

6.1 Definition of Bohmian Mechanics


According to Bohmian mechanics, the world consists of a space, which is a 3-dimensional
Euclidean space, and particles (material points) moving around in space with time. Let
us suppose there are N particles in the world (say, N ≈ 1080 ), and let us fix a Cartesian
coordinate system in Euclidean space. At every time t, particle number i (i = 1, . . . , N )
has a position Qi (t) ∈ R3 . These positions are governed by Bohm’s equation of motion

dQi ~ ∇i Ψ
= Im (t, Q(t)) . (6.1)
dt mi Ψ
Here, Q(t) = (Q1 (t), . . . , QN (t)) is the configuration at time t, and Ψ is a wave function
that is called the wave function of the universe and evolves according to the Schrödinger
equation
N
∂Ψ X ~2 2
i~ =− ∇ Ψ+V Ψ (6.2)
∂t i=1
2mi i
with V given by (2.5). The configuration Q(0) at the initial time of the universe (say,
right after the big bang) is chosen randomly by nature with probability density

ρ0 (q) = |Ψ0 (q)|2 . (6.3)

24
(We write capital Q for the configuration of particles and little q for the configuration
variable in either ρ or Ψ.) This completes the definition of Bohmian mechanics.
The central fact about Bohmian mechanics is that its predictions agree exactly with
those of the quantum formalism (which so far have always been confirmed in experi-
ment). We will understand later why this is so.
Eq. (6.1) is an ordinary differential equation of first order (specifying the velocity
rather than the acceleration). Thus, the initial configuration Q(0) determines Q(t) for
all t, so Bohmian mechanics is a deterministic theory. On the other hand, Q(t) is
random because Q(0) is. Note that this randomness does not conflict with determinism.
It is a theorem, the equivariance theorem, that the probability distribution of Q(t) is
given by |Ψt (q)|2 . We will prove the equivariance theorem later in this chapter. As
a consequence, it is consistent to assume the Born distribution for every t. Note that
due to the determinism, the Born distribution can be assumed only for one time (say
t = 0); for any other time t, then, the distribution of Q(t) is fixed by (6.1). The state
of the universe at any time t is given by the pair (Q(t), Ψt ). In particular, in Bohmian
mechanics, “wave–particle duality” means a very simple thing: there is a wave, and
there are particles.
Let us have a closer look at Bohm’s equation of motion (6.1). If we recall the formula
(2.20) for the probability current then we can rewrite Eq. (6.1) in the form

dQi j probability current


= i2 = . (6.4)
dt |Ψ| probability density

This is a very plausible relation because it is a mathematical fact about any particle
system with deterministic velocities that

probability current = velocity × probability density . (6.5)

We will come back to this relation when we prove equivariance.


Here is another way of re-writing (6.1). A complex number z can be charaterized by
its modulus R ≥ 0 and its phase S ∈ R, z = ReiS . It will be convenient in the following
to replace S by S/~ (but we will still call S the phase of z). Then a complex-valued
function Ψ(t, q) can be written in terms of the two real-valued functions R(t, q) and
S(t, q) according to
Ψ(t, q) = R(t, q) eiS(t,q)/~ . (6.6)
Let us plug this into (6.1): Since

∇i Ψ = ∇i (ReiS/~ ) (6.7)
iS/~ iS/~
= (∇i R)e + R∇i e (6.8)
i∇i S iS/~
= (∇i R)eiS/~ + R e , (6.9)
~

25
we have that
!
~ ∇i Ψ ~ ∇i R ∇i S
Im = Im +i (6.10)
mi Ψ mi R}
| {z ~
real
~ ∇i S 1
= = ∇i S . (6.11)
mi ~ mi
Thus, (6.1) can be rewritten as

dQi 1
= ∇i S(t, Q(t)) . (6.12)
dt mi
In words, the velocity is given (up to a constant factor involving the mass) by the
gradient of the phase of the wave function.
A historical note. A few years before the development of the Schrödinger equation,
Louis de Broglie had suggested a quantitative rule-of-thumb for wave–particle duality:
A particle with momentum p = mv should “correspond” to a wave with wave vector k
according to the de Broglie relation

p = ~k . (6.13)

The wave vector is defined by the relation ψ = eik·x (so it is defined only for plane waves);
it is orthogonal to the wave fronts (surfaces of constant phase), and its magnitude is
|k| = 2π/(wave length). Now, if the wave is not a plane wave then we can still define
a local wave vector k(x) that is orthogonal to the surface of constant phase and whose
magnitude is 1/(rate of phase change). Some thought shows that k(x) = ∇S(x)/~. If
we use this expression on the right hand side of (6.13) and interpret p as mass times
the velocity of the particle, we obtain exactly Eq. (6.12), that is, Bohm’s equation of
motion.

6.2 Historical Overview


The idea that the wave function might determine particle trajectories as a “guiding field”
was perhaps first expressed by Albert Einstein around 1923 and considered in detail by
John C. Slater in 1924. Bohmian mechanics was developed by Louis de Broglie in
1927 but then abandoned. It was rediscovered independently by Nathan Rosen (known
for the Einstein–Rosen bridge in general relativity and the Einstein–Podolsky–Rosen
argument) in 1945 and David Bohm in 1952. Bohm was the first to realize that it
actually makes the correct predictions, and the first to take it seriously as a physical
theory. Several physicists mistakenly believed that Bohmian mechanics makes wrong
predictions, including de Broglie, Rosen, and Einstein. Curiously, Bohm’s 1952 paper
provides a strange presentation of the theory, as Bohm insisted on writing the law
of motion as an equation for the acceleration d2 Qj /dt2 , obtained by taking the time
derivative of (6.1).

26
It is widespread to call any variables that are not functions of ψ “hidden variables”;
in Bohmian mechanics, the configuration Q is a variable that is not a function of ψ, so
it is often called a hidden variable although the particle positions are not hidden at all
in Bohmian mechanics, as they can be measured any time to any desired accuracy.

6.3 Equivariance
The term “equivariance” comes from the fact that the two relevant quantities, ρt and
|Ψt |2 , vary equally with t. (Here, ρt is the distribution arising from ρ0 by transport
along the Bohmian trajectories.) The equivariance theorem can be expressed by means
of the following diagram:
Ψ0 −→ ρ 0
Ut y (6.14)
 
y
Ψt −→ ρt
The horizontal arrows mean taking | · |2 , the left vertical arrow means the Schrödinger
evolution from time 0 to time t, and the right vertical arrow means the transport of
probability along the Bohmian trajectories. The statement about this diagram is that
both paths along the arrows lead to the same result.
As a preparation for the proof, we note that the equation of motion can be written
in the form
dQ
= vt (Q(t)) , (6.15)
dt
where vt : R3N → R3N is the vector field on configuration space vt = v = (v 1 , . . . , v N )
whose i-th component is
~ ∇i Ψ
vi = Im . (6.16)
mi Ψ
We now address the following question: If vt is known for all t, and the initial probability
distribution ρ0 is known, how can we compute the probability distribution ρt at other
times? The answer is the continuity equation
∂ρt  
= −div ρt vt . (6.17)
∂t
This follows from the fact that the probability current is given by ρt vt . In fact, in any
dimension d (d = 3N or otherwise) and for any density (probability density or energy
density or nitrogen density or . . . ) it is true that
current = density × velocity (6.18)
(provided that the velocity vector field vt is not itself random).
We are now ready to prove the equivariance theorem. (This is not a rigorous proof,
but this argument contains the essence of the reason why the equivariance theorem is
true.) We first show that
∂ρt ∂
if ρt = |Ψt |2 then = |Ψt |2 (6.19)
∂t ∂t

27
and then conclude that if ρ0 = |Ψ0 |2 then ρt = |Ψt |2 (which is the equivariance theorem).
By the continuity equation (6.17) for ρt and the continuity equation (2.19) for |Ψt |2 , the
right equation in (6.19) is equivalent to
X   X
− ∇i · ρt v i = − ∇i · j i . (6.20)
i i

As mentioned in (6.4), v i = j i /|Ψt |2 . Thus, if ρt = |Ψt |2 then Eq. (6.20) is true, which
completes the proof.

6.4 The Double-Slit Experiment in Bohmian Mechanics


Let us apply what we know about Bohmian mechanics to N = 1 and the wave function
of the double-slit experiment. We assume that the particle in the experiment moves as
if it was alone in the universe, with the potential V representing the wall with two slits.
We will justify that assumption in a later chapter. We know already what the wave
function ψ(t, x) looks like. Here is a picture of the possible trajectories of the particle.

Figure 6.1: Several alternative Bohmian trajectories of a particle in a double-slit exper-


iment

We know from the equivariance theorem that the position will always have proba-
bility distribution |ψt |2 . Thus, if we detect the particle at time t we find its distribution

28
in agreement with the Born rule.
Note that the particle moves not along straight lines, as it would according to classical
mechanics. That is because Bohm’s equation of motion is different from Newton’s. Note
also that the wave passes through both slits, while the particle passes through one only.
Note further that the particle trajectories would be different if one slit were closed: then
no interference fringes would occur. How can the particle, after passing through one
slit, know whether the other slit is open? Because the wave passes through both slits if
both are open.

“Is it not clear from the smallness of the scintillation on the screen that we
have to do with a particle? And is it not clear, from the diffraction and
interference patterns, that the motion of the particle is directed by a wave?
De Broglie showed in detail how the motion of a particle, passing through
just one of two holes in screen, could be influenced by waves propagating
through both holes. And so influenced that the particle does not go where
the waves cancel out, but is attracted to where they cooperate. This idea
seems to me so natural and simple, to resolve the wave–particle dilemma in
such a clear and ordinary way, that it is a great mystery to me that it was
so generally ignored.” J. Bell, Speakable and Unspeakable in Quantum
Mechanics, page 191

Coming back to Feynman’s description of the double-slit experiment, we see that his
statement that its outcome “cannot be explained” is not quite accurate. It is true that
it cannot be explained in Newtonian mechanics, but it can in Bohmian mechanics.
Note also that we can find out which slit the particle went through without disturbing
the interference pattern: check whether the particle arrived in the upper or lower half
of the detection screen. This method takes for granted that Bohm’s equation of motion
is correct; in a Bohmian world, it would yield correct retrodictions.
The fact that trajectories beginning in the upper half stay in the upper half, visible
from Figure 6.1, can be understood mathematically as follows. Since the initial wave
function, as well as the arrangement of the two slits, is symmetric around the horizontal
middle axis, the wave function stays symmetric while evolving, ψt (x, y, z) = ψt (x, y, −z)
(with z the vertical axis in Figure 6.1), and so the Bohmian velocity field is mirror
symmetric,

vx (x, y, z, t) = vx (x, y, −z, t) ,


vy (x, y, z, t) = vy (x, y, −z, t) , (6.21)
vz (x, y, z, t) = −vz (x, y, −z, t) .

As a consequence, on the symmetry plane z = 0, the velocity field is tangent to the


plane, and as a consequence of that, any trajectory with one point on the z = 0 plane
stays on that plane (towards the future and the past), so no trajectory can cross the
z = 0 plane. (We are using here the uniqueness of the solution of a first-order ODE for
a given initial point.)

29
Here is an alternative reasoning. Since the velocity component in the direction
perpendicular to the plate is, we may assume, constant, we can think of the horizontal
axis in Figure 6.1 as the time axis and simplify the math by pretending we are dealing
with 1-dimensional (1d) motion (along the z axis). Bohmian trajectories cannot cross
each other (this follows from the uniqueness of the solution of a first-order ODE for
a given initial point by taking the time of a hypothetical crossing as the initial time).
In 1 dimension, this has the consequence that alternative trajectories stay in the same
order along the axis. Since, by symmetry, Qz (t) = 0 is a solution, the other trajectories
cannot cross it. (For comparison, Newtonian trajectories in 1d can cross because the
equation of motion is of second order. The trajectories cannot cross in phase space.)
Another alternative reasoning is based on the observation (Exercise 10 from Assign-
ment 3) that, by equivariance, in 1d the α-quantile of |ψ0 |2 lies on the same trajectory
as the α-quantile of |ψt |2 (i.e., the trajectories are the quantile curves). By symmetry,
for α > 0.5, the α-quantile lies in the upper half axis {z > 0} at every t.

6.5 Delayed Choice Experiments


John Archibald Wheeler proposed a variant of the double-slit experiment that may
increase further the sense of paradox.8 Since Wheeler’s variant, called the delayed-choice
experiment, uses no more than the Schrödinger equation and Born’s rule, and since we
know that Bohmian mechanics can account for that, it is clear that the paradox must
disappear in Bohmian mechanics. Let us have a look at what Wheeler’s paradox is and
how Bohmian mechanics resolves it.
Wheeler considered preparing, by means of a double-slit or in some other way, two
wave packets moving in different directions, so that they pass through each other. After
passing through each other, they continue moving in different directions and thus get
separated again. Wheeler gave the experimenter two choices: either put a screen in the
overlap region or put it further away, where the two wave packets have clearly separated.
If you put the screen in the overlap region, you will see an interference pattern, which
is taken to indicate that the electron is a wave and went through both slits. However,
if you put the screen further away, the detection occurs in one of two clusters. If the
detection occurs in the upper (lower) cluster, this is taken to indicate that the particle
went through the lower (upper) slit because a wave packet passing through the lower
(upper) slit will end up in the upper (lower) region on the screen. So, Wheeler argued,
we can choose whether the electron is particle or wave: if we put the screen far away,
it must be particle because we see which slit it went through; if we put the screen
in the overlap, it must be wave because we see the interference pattern. Even more,
we can force the electron to become wave or particle (and to go through both slits or
just one) even after it passed through the double-slit! So it seems like there must be
retrocausation, i.e., situations in which the cause lies in the future of the effect.
8
J. A. Wheeler: The ‘Past’ and the ‘Delayed-Choice Double-Slit Experiment.’ Pages 9–48 in
A. R. Marlow (editor): Mathematical Foundations of Quantum Theory, Academic Press (1978)

30
Bohmian mechanics illustrates that these conclusions don’t actually follow.9 To
begin with, there is no retrocausation in Bohmian mechanics, as any intervention of
observers will change ψ only in the future, not in the past, of the intervention, and the
particle trajectory will correspondingly be affected also only in the future. Another basic
observation is that with the literal wave-particle dualism of Bohmian mechanics (there
is a wave and there is a particle), there is nothing left of the idea that the electron is
sometimes a wave and sometimes a particle, and hence nothing of the idea that observers
could force an electron to become a wave or to become a particle. In detail: the wave
passes through both slits, the particle through one; in the overlap region, the two wave
packets interfere, and the particle’s |ψ|2 distribution features an interference pattern; if
there is no screen in the overlap region, then the particle moves on in such a way that
the interference pattern disappears and two separate clusters form.
After we understand the Bohmian picture of this experiment, some steps in Wheeler’s
reasoning appear strange: If one assumes that there are no particle trajectories in the
quantum world, as one usually does in orthodox quantum mechanics, then it would seem
natural to say that there is no fact about which slit the electron went through, given
that there was no attempt to detect the electron while passing a slit. Surprising it is,
then, that Wheeler claims that the detection on the far-away screen reveals which slit
it took! How can anything reveal which slit the electron took if the electron didn’t take
a slit?
There is another interesting aspect to the story that I will call Wheeler’s fallacy.
When you analyze the Bohmian picture in the case of far-away screen, it turns out that
the trajectories passing through the upper (lower) slit end up in the upper (lower) cluster.
So Wheeler made the wrong retrodiction of which slit the electron passed through! How
could this happen? Wheeler noticed that if the lower (upper) slit is closed, so only one
packet comes out, and it comes out of the upper (lower) slit, then only detection events
in the lower (upper) region occur. This is also true in Bohmian mechanics. Wheeler
concluded that when wave packets come out of both slits, and if a detection occurs in
the right region, then the particle must have passed through the left slit. This is wrong
in Bohmian mechanics, and once you realize this, it is obvious that Wheeler’s conclusion
is a non sequitur 10 —a fallacy.
Shahriar Afshar proposed and carried out a further variant of the experiment, known
as Afshar’s experiment.11 In this variant, one puts the screen in the far position, but
one adds obstacles (that would absorb or reflect electrons) in the overlap region, in fact
in those places where the interference is destructive. If an interference pattern occurs
in the overlap region, even if it is not observed, then almost no particles arrive at the
obstacles, and almost no particles get absorbed or reflected. Indeed, for the particular
wave function we are considering, the presence of the obstacles does not significantly
9
This was first discussed in J. Bell: De Broglie–Bohm, delayed-choice double-slit experiment, and
density matrix. International Journal of Quantum Chemistry 14: 155–159 (1980).
10
= it doesn’t follow (Latin)
11
S. S. Afshar: Violation of the principle of complementarity, and its implications. Proceedings of
SPIE 5866: 229–244 (2005) http://arxiv.org/abs/quant-ph/0701027

31
alter the time evolution according to the Schrödinger equation.12 As a consequence, if
all particles arrive on the far screen (in either the left or the right region), as in fact
observed in the experiment, then this indicates that no absorption or reflection occurred,
so there was an interference pattern in the overlap region even though no screen was put
there. Afshar argued that this experiment refutes Wheeler’s view that one can either
have an interference pattern or measure which slit the particle went through, but not
both. (In his article, Afshar committed Wheeler’s fallacy; but that does not make the
experiment less relevant.) Again, Bohmian mechanics, having particle and wave, easily
explains the outcome of this experiment.

12
A way of seeing this without running a numerical simulation goes as follows. The obstacles could
be represented as regions of infinite potential. In the Schrödinger equation, a region B ⊂ R3 of infinite
potential is equivalent to a Dirichlet boundary condition on the boundary ∂B of B, i.e., the condition
ψ(x, t) = 0 for all x ∈ ∂B and all t ∈ R. Imposing such a condition at places x where the solution ψ in
the absence of obstacles would vanish for all t anyway does not affect the solution.

32
7 Fourier Transform and Momentum
7.1 Fourier Transform
We know from Exercise 2 of Assignment 1 that the plane wave eik·x evolves according
to the free Schrödinger equation to
2
eik·x e−i~k t/2m
. (7.1)

Since the Schrödinger equation is linear, any linear combination of plane waves with
different wave vectors k, X
ck eik·x (7.2)
with complex coefficients ck , will evolve to
2
X
ck eik·x e−i~k t/2m . (7.3)

Moreover, a “continuous linear combination”


Z
d3 k c(k)eik·x (7.4)
R3

with arbitrary complex c(k) will evolve to


Z
2
d3 k c(k)eik·x e−i~k t/2m . (7.5)
R3

Definition 7.1. For a given function ψ : Rd → C, the function


Z
1
ψ(k)
b = ψ(x) e−ik·x dd x (7.6)
(2π)d/2 Rd

is called the Fourier transform of ψ, ψb = F (ψ).

Theorem 7.2. Inverse Fourier transformation:


Z
1 b eik·x dd k .
ψ(x) = ψ(k) (7.7)
(2π)d/2 Rd

Note the different sign in the exponent (it is crucial). If we had not put the pre-factor
in (7.6) we would have obtained the pre-factor squared in (7.7).
We have been sloppy in the formulation of the definition and the theorem in that
we have not specified the class of functions to which these formulas apply. In fact, (7.6)
can be applied whenever ψ ∈ L1 (the space of all integrable functions, i.e., those with
kψkL1 = dx |ψ| < ∞) and then yields ψb ∈ L∞ (the space of all bounded functions)
R
R R
because |ψ(k)|
b ≤ (2π)−d/2 kψkL1 by the triangle inequality, f ≤ |f |. Conversely, if
ψb ∈ L1 , then (7.7) holds, and ψ ∈ L∞ . However, if ψ ∈ L1 \ L∞ then ψb ∈ / L1 , and (7.7)

33
is not literally applicable. For ψ ∈ L1 ∩ L∞ , both (7.6) and (7.7) are rigorously true.
Another space of interest in this context is the Schwartz space S of rapidly decaying
functions, which contains the smooth functions ψ : Rd → C such that for every n ∈ N
and every α ∈ Nd0 there is Cn,α > 0 such that |∂ α ψ(x)| < Cn,α |x|−n for all x ∈ Rd ,
where ∂ α := ∂1α1 · · · ∂dαd . For example, every Gaussian wave packet lies in S ; note
that S ⊂ L1 ∩ L∞ . It turns out that Fourier transformation maps S bijectively to
itself. Moreover, S is a dense subspace in L2 , and F can be extended in a unique
way to a bounded operator F : L2 → L2 , even though the integral (7.6) exists only for
ψ ∈ L1 ∩ L2 .
Going back to Eq. (7.5) and taking c(k) = (2π)−3/2 ψb0 (k), we can express the solution
of the free Schrödinger equation as
Z
1 3

−i~k2 t/2m b

ψt (x) = d k e ψ 0 (k) eik·x . (7.8)
(2π)3/2 R3

In words, we can find ψt from ψ0 by taking its Fourier transform ψb0 , multiplying by a
2
suitable function of k, viz., e−i~k t/2m , and taking the inverse Fourier transform.
The same trick can be done for N particles. Then d = 3N , ψ = ψ(x1 , . . . , xN ),
ψb = ψ(k
b 1 , . . . , kN ), and the factor to multiply by is

 X N
~ 2  ~ 2
exp −i kj t instead of exp −i k t . (7.9)
j=1
2mj 2m

Note that we take the Fourier transform only in the space variables, not in the time
variable. There are also applications in which it is useful to consider a Fourier transform
in t, but not here.

Example 7.3. The Fourier transform of a Gauss function. Let σ > 0 and
x2
ψ(x) = C e− 4σ2 (7.10)

with C a constant. Then, using the substitution y = x/(2σ),


Z
C 2 2
ψ(k) =
b
3/2
e−x /4σ e−ik·x d3 x (7.11)
(2π) R3
3 3 Z
2 Cσ 2
= 3/2
e−y −2iσk·y d3 y (7.12)
(2π) 3
| {z } R
=:C2
Z
2 2 2
= C2 e−(y+iσk) −σ k d3 y (7.13)
R3 Z
−σ 2 k2 2
= C2 e e−(y+iσk) d3 y (7.14)
R3

34
The evaluation of the last integral involves the Cauchy integral theorem, varying the
path of integration and estimating errors. Here, I just report that the outcome is the
constant π 3/2 , independently of σ and k. Thus,13
2 2
ψ(k)
b = C3 e−σ k (7.15)
with C3 = C2 π 3/2 . In words, the Fourier transform of a Gaussian function is another
Gaussian function, but with width 1/(2σ) instead of σ. (We see here shadows of the
Heisenberg uncertainty relation, which we will discuss in the next chapter.)
For later use, I report further14 that the formula (7.15) remains valid for complex σ
with Re(σ 2 ) > 0 (put differently, when we replace σ by σeiθ with − π4 < θ < π4 ). That
is,
 x2   
if ψ(x) = C exp −e−2iθ 2 , then ψ(k) b = C 0 exp −e2iθ σ 2 k2 (7.16)

with some constant C 0 ∈ C.
Rule 7.4. (a)
∂ψ
d
(k) = ikj ψ(k)
b . (7.17)
∂xj
That is, differentiation of ψ corresponds to multiplication of ψb by ik.
(b) Conversely,
∂ ψb
−ix
\ jψ = . (7.18)
∂kj

(c) If f (x) = eik0 ·x g(x), then fˆ(k) = ĝ(k − k0 ).


(d) If f (x) = g(x − x0 ), then fˆ(k) = e−ik·x0 ĝ(k).
Proof. (a) Indeed, using integration by parts (and assuming that the boundary terms
vanish),
Z
d∂ψ 1 ∂ψ
(k) = d/2
dd x (x) e−ik·x (7.19)
∂xj (2π) R d ∂x j
Z
1 ∂ −ik·x
=− d/2
dd x ψ(x) e (7.20)
(2π) Rd ∂xj
Z
1
=− dd x ψ(x) (−ikj )e−ik·x (7.21)
(2π)d/2 Rd
Z
1
= ikj dd x ψ(x) e−ik·x (7.22)
(2π)d/2 Rd
= ikj ψ(k)
b . (7.23)
13
A different derivation of (7.15) is given on page 132 of D. Kammler: A First Course in Fourier
Analysis, 2nd ed., Cambridge University Press (2007).
14
See Formula 206 of http://en.wikipedia.org/wiki/Fourier_transform (accessed 10/31/2019),
or pages 562 and 588 of D. Kammler: A First Course in Fourier Analysis, 2nd ed., Cambridge University
Press (2007).

35
(This calculation is a rigorous proof in S .)

(b) Interchanging differentiation and integration (which again is rigorously justified in


S ),
Z
∂ ψb ∂ 1
= ψ(x) e−ik·x dd x (7.24)
∂kj ∂kj (2π)d/2 Rd
Z 
1 
= −ix j ψ(x) e−ik·x dd x . (7.25)
(2π)d/2 Rd

(c) Indeed,
Z
1
ĝ(k − k0 ) = d/2
g(x) e−i(k−k0 )·x dd x (7.26)
(2π) d
ZR 
1 ik0 ·x

= e g(x) e−ik·x dd x . (7.27)
(2π)d/2 Rd

(d) This follows in much the same way.

Example 7.5. A more general Gaussian packet of the form


(x−x0 )2
ψ(x) = C eik0 ·x e− 4σ 2 (7.28)

has Fourier transform


2 2
ψ(k)
b = C3 eik0 ·x0 e−ik·x0 e−σ (k−k0 ) , (7.29)

which is again a Gaussian packet with center k0 and width 1/(2σ).


If we evolve (7.28) with the free Schrödinger equation up to time t, it is still of
Gaussian form but with the real constant σ 2 replaced by the complex constant σ 2 + i 2m ~
t
(and the prefactor C changed in a t-dependent way). As in (7.16), (7.29) is still valid for
complex σ with Re(σ 2 ) > 0, so it covers the evolved Gaussian as well. The most general
Gauss packet is the exponential of a second-order polynomial in x for which the matrix
of second-order coefficients has negative-definite self-adjoint part. Its Fourier transform
is also again a Gauss packet.

∗∗∗
Fourier transformation defines a unitary operator F : L2 (Rd ) → L2 (Rd ), F ψ = ψ. b
We verify that kF ψkL2 = kψkL2 at least for nice ψ. Note first that, for f, g ∈ L ∩ L2 ,
1

Z Z  Z Z 
−ik·x
e d d
f (k) d k g(x) d x = e−ik·x g(x) dd x f (k) dd k (7.30)

36
by changing the order of integration (which integral is done first). The theorem saying
that we are allowed to change the order of integration (for an integrable integrand f g)
is called Fubini’s theorem. From Eq. (7.30) we can conclude hg ∗ |fˆi = hĝ ∗ |f i. Since
Z  ∗

(F f )(k) = (2π) −d/2
e−ik·x
f (x) dd x = F −1 (f ∗ )(k) , (7.31)

setting g = F −1 (f ∗ ) = (F f )∗ yields hfˆ|fˆi = hf |f i, which completes the proof.

7.2 Momentum
“Position measurements” usually consist of detecting the particle. “Momentum mea-
surements” usually consist of letting the particle move freely for a while and then mea-
suring its position.15
We now analyze this experiment using Bohmian mechanics. We define the asymptotic
velocity u to be
dQ
u = lim (t) (7.32)
t→∞ dt
if this limit exists. It can also be expressed as
Q(t)
u = lim . (7.33)
t→∞ t
To understand this, note that (Q(t) − Q(0))/t is the average velocity during the time
interval [0, t]; if an asymptotic velocity exists (i.e., if the velocity approaches a constant
vector u) then the average velocity over a long time t will be close to u because for
most of the time the velocity will be close to u. The term Q(0)/t converges to zero as
t → ∞, so we obtain (7.33).
We want the momentum measurement to measure p := mu for a free particle (V =
0). So we measure Q(t) for large t, divide by t, and multiply by m. We can and will
also take this recipe as the definition of a momentum measurement, independently of
whether we want to use Bohmian mechanics.
How large do we need t to be? In practice, often not very. When thinking of a particle
emitted by a radioactive atom, or coming from a particle collision in an accelerator
experiment (such as the Large Hadron Collider LHC in Geneva), a millisecond is usually
enough for dQ/dt to become approximately constant.
According to the Born rule, the outcome p is random, and its distribution can be
characterized by saying that, for any set B ⊂ R3 ,
P(u ∈ B) = lim P(Q(t)/t ∈ B) (7.34)
t→∞
= lim P(Q(t) ∈ tB) (7.35)
t→∞
Z
= lim |ψt (x)|2 d3 x , (7.36)
t→∞ tB
15
Alternatively, one lets the particle collide with another particle, makes a “momentum measurement”
on the latter, and makes theoretical reasoning about what the momentum of the former must have been.

37
where
tB = {tx : x ∈ B} (7.37)
is the scaled set B.
Theorem 7.6. Let ψ(t, x) be a solution of the free Schrödinger equation and B ⊆ R3 .
Then Z Z
2 3
lim |ψ(t, x)| d x = |ψb0 (k)|2 dk . (7.38)
t→∞ tB mB/~

As a consequence, the probability density of p is


1 b  p  2
ψ0 . (7.39)
~3 ~
The theorem essentially says that when we think of ψ0 as a linear combination of
plane waves eik·x as in Eq. (7.4) or (7.7), then the contribution from a particular value of
k will move at a velocity of ~k/m (shadows of the de Broglie relation p = ~k!), and in
the long run these contributions will tend to separate in space (i.e., overlap no longer),
leaving the contribution from k in the region around ~kt/m. We see the de Broglie
relation again in (7.39) when we insert p/~ for k in ψ. b The upshot of this analysis can
be formulated as

Born’s rule for momentum. If we measure the momentum of a particle with wave
function ψ then the outcome is random with probability density
1 b p  2
ρmom (p) = 3 ψ . (7.40)
~ ~
Likewise, if we measure the momenta of N particles with joint wave function ψ(x1 , . . . , xN ),
then the outcomes are random with joint probability density
1 b p1 pN  2
ρmom (p1 , . . . , pN ) = 3N ψ ,..., . (7.41)
~ ~ ~
For this reason, the Fourier transform ψb is also called the momentum representation
of ψ, while ψ itself is called the position representation of the wave function.
Example 7.7. The Gaussian wave packet (7.28), whose Born distribution in position
space is a Gaussian distribution with mean x0 and width σ, has momentum distribution
2 (p−~k 2
ρmom (p) = (const.) e−2(σ/~) 0)
, (7.42)

that is, a Gaussian distribution with mean ~k0 and width


~
σP = . (7.43)

In particular, if we want a momentum distribution that is sharply peaked around some
value p0 = ~k0 , that is, if we want σP to be small, then σ must be large, so ψ must be
wide, “close to a plane wave.”

38
7.3 Momentum Operator
Let pj , j = 1, 2, 3, be the component of the vector p in the direction of the xj -axis. The
expectation value of pj is (using Eq. (7.17) in the fourth line and unitarity of F in the
sixth)
Z
hpj i = pj ρmom (p) d3 p (7.44)
R3
Z
= ~kj |ψb0 (k)|2 d3 k (7.45)
D E
= ψb0 ~kj ψb0 (7.46)

D ∂ψ
d0 E
= ψ0 (−i~) (7.47)
b
∂xj
D ∂ψ E
d0
= −i~ ψ0
b (7.48)
∂xj
D ∂ψ E
0
= −i~ ψ0 (7.49)
∂xj
D  ∂  E
= ψ0 −i~ ψ0 . (7.50)

∂xj

This relation motivates calling Pj = −i~ ∂x∂ j the momentum operator in the xj -direction,
and (P1 , P2 , P3 ) the vector of momentum operators.
We note for later use that, by the same reasoning,
Z D  ∂ n E
n
hpj i = (~kj )n |ψb0 (k)|2 dk = ψ0 −i~ ψ0 (7.51)

∂xj
for every n ∈ N.

7.4 Tunnel Effect


The tunnel effect is another quantum effect that is widely perceived as paradoxical.
Consider the 1d Schrödinger equation with a potential V that has the shape of a potential
barrier of height V0 > 0. As an idealized example, suppose

V (x) = V0 10≤x≤L (7.52)

or a smooth approximation thereof (see Figure 7.1).


Classically, the motion of a particle in the potential V (or any potential in 1 dimen-
sion) can easily be deduced from energy conservation: If the initial position is < 0 and
the initial momentum is p0 > 0, then the initial energy is E = p20 /2m, and whenever
the particle reaches location x, its momentum must be
p
p = ± 2m(E − V (x)) . (7.53)

39
V (x) V (x)

x x

Figure 7.1: Potential barriers in 1d. LEFT: Idealized “hard” barrier as in (7.52),
RIGHT: Smooth approximation thereof, or “soft” barrier.

In particular, the particle can never reach a region in which V (x) > E; so, if E < V0 ,
then the particle will turn around at the barrier and move back to the left.
That is different in quantum mechanics. Consider a Gaussian wave packet, initially
to the left of the barrier, with a rather sharp momentum distribution around a p0 > 0
with p20 /2m < V0 . Then part of the packet will be reflected, and part of it will pass
through the barrier!16p(And the part that passes through is much larger than just the
tail of ρmom with p ≥ V0 /2m.) As a consequence, the Born rule predicts a substantial
probability for the particle to show up on the other side of the barrier (“tunneling
probability”). Figure 7.2 shows the Bohmian trajectories for such a situation (with only
a small tunneling probability).
For computing the tunneling probability, an easy recipe is to assume that the initial
ψ is close to a plane wave consider only the interior part of it that actually looks like a
plane wave. One solves the Schrödinger equation for a plane wave arriving, computes
the amount of probability current through the barrier, and compares it to the current
associated with the arriving wave.17
What is paradoxical about tunneling? Perhaps not so much, once we give up New-
tonian mechanics and accept that the equation of motion can be non-classical, such as
Bohm’s. Then it is only to be expected that the trajectories are different, and not sur-
prising that some barriers which Newton’s trajectories cannot cross, Bohm’s trajectories
can. Part of the sense of paradox comes perhaps from a narrative that is often told when
the tunnel effect is introduced: that the particle can “borrow” some energy for a short
amount of time by virtue of an energy–time uncertainty relation. This narrative seems
not very helpful.
The tunnel effect plays a crucial role in radioactive α-decay (where the α-particle
leaves the nucleus by means of tunneling), beam splitters in optics (where the thickness
of the barrier is adjusted so that half of the incoming wave will be reflected and half
transmitted), and scanning tunneling electron microscopy (where the distance between
16
Another movie created by B. Thaller and available at http://vqm.uni-graz.at/movies.html
shows a numerical simulation of the Schrödinger equation with potential (7.52).
17
For further discussion of why that yields a reasonable result, see T. Norsen: The Pilot-Wave
Perspective on Quantum Scattering and Tunneling. American Journal of Physics 81: 258 (2013)
http://arxiv.org/abs/1210.7265.

40
Figure 7.2: Bohmian trajectories in a tunneling situation. Picture taken from D. Bohm
and B. J. Hiley: The Undivided Universe, London: Routledge (1993)

a needle and a surface is measured by means of measuring the tunneling probability).


There are further related effects: anti-tunneling means that a particle gets reflected
by a barrier so low that a classical particle with the same initial momentum would
be certain to pass it; this happens because a solution of the Schrödinger equation will
partly be reflected even at a low barrier. Another effect has been termed paradoxical
reflection:18 Consider a downward potential step as in

V (x) = −V0 10≤x . (7.54)

Classically, a particle coming from the left has probability zero to be reflected back, but
according to the Schrödinger equation, wave packets will be partly reflected and partly
transmitted. Remarkably, in the limit V0 → ∞, the reflection probability converges to
1. “A quantum ball doesn’t roll off a cliff!” On a potential plateau, surrounded by deep
downward steps, a particle can be confined for a long time, although finally, in the limit
t → ∞, all of the wave function will leave the plateau region and propagate to spatial
infinity.

18
For detailed discussion, see P. L. Garrido, S. Goldstein, J. Lukkarinen, and R. Tumulka: Paradoxical
Reflection in Quantum Mechanics. American Journal of Physics 79(12): 1218–1231 (2011) http:
//arxiv.org/abs/0808.0610

41
8 Operators and Observables
8.1 Heisenberg’s Uncertainty Relation
As before, hXi denotes the expectation of the random variable X. The variance of the
momentum distribution for the initial wave function ψ ∈ L2 (R) (in one dimension) is
D 2 E
σP2 := p − hpi (8.1)
D E
= p2 − 2phpi + hpi2 (8.2)
= hp2 i − 2hpi2 + hpi2 (8.3)
= hp2 i − hpi2 (8.4)
= hψ|P 2 ψi − hψ|P ψi2 (8.5)
D 2 E
= ψ P − hψ|P ψi ψ . (8.6)

The position distribution |ψ(x)|2 has expectation


Z
hQ(0)i = x|ψ(x)|2 dx = hψ|Xψi (8.7)

with the position operator Xψ(x) = xψ(x). Moreover,


Z
hQ(0) i = x2 |ψ(x)|2 dx = hψ|X 2 ψi ,
2
(8.8)

so the variance of the position distribution |ψ(x)|2 is


Z D 2 E
σX := (x − hQ(0)i)2 |ψ(x)|2 dx = ψ X − hψ|Xψi ψ .
2
(8.9)

Theorem 8.1. (Heisenberg uncertainty relation) For any ψ ∈ L2 (R) with kψk = 1,

~
σX σP ≥ . (8.10)
2
This means that any wave function that is very narrow must have a wide Fourier
transform. A generalized version will be proved later as Theorem 13.4.

Example 8.2. Consider the Gaussian wave packet (7.28), for simplicity in 1 dimension.
The standard deviation of the position distribution is σX = σ, and we computed the
width of the momentum distribution in (7.43). We thus obtain for this ψ that

~
σX σP = , (8.11)
2
just the lowest value allowed by the Heisenberg uncertainty relation.

42
Example 8.3. Consider a wave packet passing through a slit. Let us ignore the part of
the wave packet that gets reflected because it did not arrive at the slit, and focus on just
the part that makes it through the slit. That is a narrow wave packet, and its standard
deviation in position, σX , is approximately the width of the slit. If that is very small
then, by the Heisenberg uncertainty relation, σP must be large, so the wave packet must
spread quickly after passing the slit. If the slit is wider, the spreading is weaker.

∗∗∗
In Bohmian mechanics, the Heisenberg uncertainty relation means that whenever
the wave function is such that we can know the position of a particle with (small)
inaccuracy σX then we are unable to know its asymptotic velocity better than with
inaccuracy ~/(2mσX ); thus, we are unable to predict its future position after a large
time t (for V = 0) better than with inaccuracy ~t/(2mσX ). This is a limitation to
knowledge in Bohmian mechanics.
The Heisenberg uncertainty relation is often understood as excluding the possibility
of particle trajectories. If the particle had a trajectory, the reasoning goes, then it would
have a precise position and a precise velocity (and thus a precise momentum) at any
time, so the position uncertainty would be zero and the momentum uncertainty would
be zero, so σX = 0 and σP = 0, in contradiction with (8.10). We know already from
Bohmian mechanics that this argument cannot be right. It goes wrong by assuming
that if the particle has a precise position and a precise velocity then they can also be
precisely known and precisely controlled. Rather, inhabitants of a Bohmian universe,
when they know a particle’s wave function to be ϕ(x), cannot know its position more
precisely than the |ϕ|2 distribution allows.
In the traditional, orthodox view of quantum mechanics, it is assumed that electrons
do not have trajectories. It is assumed that the wave function is the complete description
of the electron, in contrast to Bohmian mechanics, where the complete description is
given by the pair (Q, ψ), and ψ alone would only be partial information and thus an
incomplete description. By virtue of these assumptions, the electron does not have
a position before we attempt to detect it. Likewise, it does not have a momentum
before we attempt to measure it. Thus, in orthodox quantum mechanics the Heisenberg
uncertainty relation does not amount to a limitation of knowledge because there is
no fact in the world that we do not know about when we do not know its position.
Unfortunately, the uncertainty relation is often expressed by saying that it is impossible
to measure position and momentum at the same time with arbitrary accuracy; while
this would be appropriate to say in Bohmian mechanics, it is not in orthodox quantum
mechanics because this formulation presumes that position and momentum have values
that we could discover by measuring them.
The uncertainty relation is also involved in the double slit experiment as follows. If
it did not hold, we could make the electron move exactly orthogonal to the screen after
passing through the narrow slits–and arrive very near the center of the screen. Thus, the
distribution on the detection screen could not have a second- or third-order maximum.

43
Since in orthodox quantum mechanics the double-slit experiment is understood as in-
dicative of a paradoxical nature of reality, the uncertainty relation is then understood
as “protecting” the paradox from becoming a visible contradiction.

8.2 Self-adjoint Operators


The following rule is part of the quantum formalism:
The most relevant experiments are measurements of certain quantities
called observables. Every observable is associated with a self-adjoint op- (8.12)
erator on Hilbert space.
It is actually a mixture of fact and opinion, as it is formulated from the traditional or
orthodox point of view of quantum mechanics. I use this formulation because it is very
common. We need to dissect later which part of it is fact, and which is opinion. As Bell
wrote (Speakable and Unspeakable in Quantum Mechanics, page 215),

“On this list of bad words from good books, the worst of all is measurement.”

But first let us get acquainted with the mathematics of self-adjoint operators.

Theorem 8.4. Every bounded operator A : H → H on a Hilbert space H possesses


one and only one adjoint operator A† , defined by the property that for all ψ, φ ∈ H ,

hψ|Aφi = hA† ψ|φi . (8.13)

For an unbounded operator A : D(A) → H with dense domain D(A) ⊂ H , the adjoint
operator A† is uniquely defined by the property (8.13) for all ψ ∈ D(A† ) and φ ∈ D(A)
on the domain
n o
D(A ) = ψ ∈ H : ∃χ ∈ H ∀φ ∈ D(A) : hψ|Aφi = hχ|φi .

(8.14)

Definition 8.5. An operator A on a Hilbert space H is called self-adjoint or Hermitian


iff A = A† . Then
hψ|Aφi = hAψ|φi . (8.15)

Example 8.6.

• Let H = Cn . Then every operator A is bounded and correponds to a complex


n×n matrix Aij . The matrix of A† has entries (A† )ij = (Aji )∗ (“the adjoint matrix
is the conjugate transpose”). Indeed, if we define the matrix Bij by Bij = (Aji )∗

44
then we obtain, for any ψ = (ψ1 , . . . , ψn ) and φ = (φ1 , . . . , φn ),
n
X
hψ|Aφi = ψi∗ (Aφ)i (8.16)
i=1
XX
= ψi∗ Aij φj (8.17)
i j
XX
= (A∗ij ψi )∗ φj (8.18)
j i
XX ∗
= Bji ψi φj (8.19)
j i
X
= (Bψ)∗j φj (8.20)
j

= hBψ|φi . (8.21)

As a consequence, a matrix A is self-adjoint iff Aij = A∗ji .

• A unitary operator is usually not self-adjoint.

• Let H = L2 (Rd ), and let A be a multiplication operator,

Aψ(x) = f (x) ψ(x) , (8.22)

such as the potential in the Hamiltonian or the position operators. Then A† is the
multiplication operator that multiplies by f ∗ . Indeed,
Z
hψ|Aφi = ψ(x)∗ f (x)φ(x) dx (8.23)
d
ZR
∗
= f ∗ (x) ψ(x) φ(x) dx (8.24)

= hf ∗ ψ|φi . (8.25)

(This calculation is rigorous if f is bounded. If it is not, them some discussion of


the domains of A and A† is needed.) Thus, A is self-adjoint iff f is real-valued.

• (AB)† = B † A† and exp(A)† = exp(A† ).

• On H = L2 (Rd ), the momentum operators Pj = −i~ ∂x∂ j are self-adjoint with the
domain given by the first Sobolev space, i.e., the space of functions ψ L2 whose
Fourier transform ψb has the property that k 7→ |k| ψb is still square-integrable. The

45
relation (8.15) can easily be verified on nice functions using integration by parts:
Z
∂φ
hψ|Pj φi = ψ ∗ (x)(−i~) (x) dx (8.26)
∂xj
∂ψ ∗
Z
=− (x)(−i~)φ(x) dx (8.27)
∂xj
Z  ∗
∂ψ
= −i~ (x) φ(x) dx (8.28)
∂xj
= hPj ψ|φi . (8.29)

• In H = L2 (Rd ), the Hamiltonian is self-adjoint for suitable potentials V on a


suitable domain. By formal calculation (leaving aside questions of domains), since
d
X 1 2
H= P +V , (8.30)
j=1
2m j

we have that
D X 1  E
hψ|Hφi = ψ Pj2 + V φ (8.31)

j
2m
X 1
= hψ|Pj Pj φi + hψ|V φi (8.32)
j
2m
X 1
= hPj ψ|Pj φi + hV ψ|φi (8.33)
j
2m
X 1
= hPj Pj ψ|φi + hV ψ|φi (8.34)
j
2m
DX P 2  E
j
= + V ψ φ (8.35)

j
2m
= hHψ|φi . (8.36)

8.3 The Spectral Theorem


Before we can formulate Born’s rule for arbitrary observables, we need to learn about
the spectral theorem.
Definition 8.7. If
Aψ = αψ , (8.37)
where α is a (complex) number and ψ ∈ H with ψ 6= 0, then ψ is called an eigenvector
(or eigenfunction) of A with eigenvalue α. The number α is called an eigenvalue of A iff
there exists ψ 6= 0 satisfying (8.37). The set of all eigenvalues is called the spectrum of
A. For any eigenvalue α, the set of all eigenvectors with eigenvalue α together with the
zero vector forms a subspace of Hilbert space called the eigenspace of A with eigenvalue
α. The eigenvalue α is said to be degenerate iff the dimension of its eigenspace is > 1.

46
If A is self-adjoint then all eigenvalues must be real. Indeed, if ψ is an eigenvector
of A with eigenvalue α, then

αhψ|ψi = hψ|αψi = hψ|Aψi = hAψ|ψi = hαψ|ψi = α∗ hψ|ψi , (8.38)

so α = α∗ or α ∈ R.
Theorem 8.8. (Spectral theorem) For every self-adjoint operator A in a Hilbert space
H there is a (generalized) orthonormal basis {φα,λ } consisting of eigenvectors of A,

Aφα,λ = αφα,λ . (8.39)

Such a basis is called an eigenbasis of A. (φα,λ has two indices because for every eigen-
value α there may be several eigenvectors, indexed by λ.)
An orthonormal basis (ONB) is a set {φn } elements of the Hilbert space H such
that (a) hφm |φn i = δmn and (b) every ψ ∈ H can be written as a linear combination of
the φn , X
ψ= cn φn . (8.40)
n

A “generalized” orthonormal basis allows a continuous variable k instead of n,


Z
ψ = dk ck φk , (8.41)

as we have encountered with Fourier transformation, where k = k ∈ Rd , ck = ψ(k),


b and

φk (x) = (2π)−d/2 eik·x . (8.42)

For a generalized ONB, we don’t require that the φk themselves be elements of H ;


e.g.,
P the φk of Fourier transformation are not square-integrable. We will often write a
sign even when we mean the integral over k. The precise definition of “generalized
ONB” is a unitary isomorphism U : H → L2 (Ω) with Ω the set of possible k-values
indexing the generalized ONB and U ψ(k) = ck . For example, for the generalized ONB
(8.42), U = F . A “non-generalized” ONB then corresponds to a unitary isomorphism
U : H → `2 = L2 (N).
The big payoff of the spectral theorem is that in this ONB, it is very easy to carry
out the operator A: If X
ψ= cα,λ φα,λ (8.43)
α,λ

then X
Aψ = α cα,λ φα,λ . (8.44)
α,λ

Put differently, in this ONB, A is a multiplication operator, multiplying by the function


f (k) = f (α, λ) = α. For example, in the Fourier basis (8.42), the momentum operator
Pj is multiplication by ~kj .

47
Put differently again, the matrix associated with the operator A in the ONB φα,λ is
a diagonal matrix. That is why one says that this ONB diagonalizes A.

Born’s rule for arbitrary observables. If we measure the observable A on a system


with wave function ψ then the outcome is random with probability distribution

hφα,λ |ψi 2 = U ψ(α, λ) 2 ,


X X
ρA (α) = (8.45)
λ λ

where φα,λ is an orthonormal basis diagonalizing A; ρA may mean either probability


density or just probability, depending on whether α is a discrete or continuous variable.

Note that the previous versions of Born’s rule are contained as special cases for the
position operators X (U the identity) and the momentum operator P = −i~∇ (U the
Fourier transformation).
We further note that the expectation value of the Born distribution is given by
the simple expression hψ|Aψi. Indeed, since the unitary isomorphism U defining the
generalized ONB maps A to a multiplication operator M , U AU −1 = M , the expectation
is given by
Z Z
dα α ρA (α) = d(α, λ) α |U ψ(α, λ)|2 = hU ψ|M U ψiΩ = hψ|U −1 M U ψi = hψ|Aψi.

(8.46)

The spectral theorem also yields a useful perspective on the unitary time evolution
operators. Since the Hamiltonian is self-adjoint, by the spectral theorem it possesses an
eigenbasis diagonalizing it,
HφE,λ = EφE,λ . (8.47)
As H is also called the energy operator, its (generalized) eigenvalues E are called the
energy levels of H, and {φE,λ } is called the energy eigenbasis or simply the energy basis.
Expressing a given vector ψ in this ONB,
X
ψ= cE,λ φE,λ , (8.48)
E,λ

one finds that X


ψt = Ut ψ = e−iHt/~ ψ = e−iEt/~ cE,λ φE,λ . (8.49)
E,λ

In words, the coefficients cE,λ of ψt in the energy basis change with time according to

cE,λ (t) = exp(−iEt/~) cE,λ (0) , (8.50)

which means they are rotating in the complex plane at different speeds proportional to
the eigenvalues E.

48
8.4 Conservation Laws in Quantum Mechanics
As a consequence of (8.50), |cE,λ (t)| is time independent for every E and λ, i.e., is
a conserved quantity. This conservation law has no classical analog. The other way
around, what are the quantum analogs of the classical conservation laws of energy,
momentum, and angular momentum?
The basic answer is that in quantum mechanics, energy, (the 3 components of)
momentum, and (the 3 components of) angular momentum are operators, not numbers;
the are conserved operators, not conserved quantities. Let me explain.
In the discussion of momentum measurements in Section 7.2, we defined the parti-
cle’s momentum as mass times its asymptotic velocity. However, it is common to call
the (generalized) eigenvalues of the momentum operator Pj = −i~∂j (j = 1, 2, 3) the
momentum values in the xj direction. Note that the eigenfunctions are just the plane
waves eik·x , and the eigenvalues of Pj are pj = ~kj (another version of de Broglie’s rela-
tion). Now a wave function ψ is in general a superposition of plane waves with different
values of pj . As we let the wave function evolve freely, the contributions in the superpo-
sition get separated in space, and ultimately the particle is found in just one of them,
corresponding to the measurement outcome pj . So pj as a number is not conserved, in
the sense that the initial superposition may have involved also other pj values than the
outcome of the measurement.
Similarly, an energy measurement corresponds to H as an observable and yields just
one of the many energy levels E which may have had a significantly non-zero cE,λ ; so E
as a number cannot be said to be conserved.
But operators can be conserved, in the following sense. With respect to any ONB
{φi }, any operator S can be represented as a matrix (possibly with infinitely many rows
and columns) with entries Sij = hφi |Sφj i. If we let each of the basis vectors evolve with
Ut , then we obtain time-dependent matrix elements (setting, for convenience, ~ = 1)

Sij (t) = hφi (t)|Sφj (t)i = he−iHt φi |Se−iHt φj i = hφi |eiHt Se−iHt φj i , (8.51)

which are the matrix elements of eiHt Se−iHt . If S commutes with H, i.e., SH = HS or
[S, H] := SH − HS = 0, then S commutes with e−iHt , so

eiHt Se−iHt = S , (8.52)

and Sij (t) is actually time independent. One says that an operator S is conserved
iff (8.52) holds, and this happens iff S commutes with H.19 Examples of conserved
operators include: H itself, the momentum operators if V is translation invariant, and
the angular momentum operators −i~x × ∇ if V is rotationally invariant.

19
At least for bounded operators. For unbounded operators, this is still true if we define carefully
what it means for them to commute.

49
9 Spin
The phenomenon known as spin does not mean that the particle is spinning around its
axis, though it is in some ways similar. The simplest description of the phenomenon
is to say that the wave function of an electron (at time t) is actually not of the form
ψ : R3 → C but instead ψ : R3 → C2 . The space C2 is called spin-space and its elements
spinors (short for spin-vectors). We will in the following write S for spin-space.

9.1 Spinors and Pauli Matrices


Apart from being a 2-dimensional Hilbert space, spin space has the further property
that with every spinor is associated a vector in physical space R3 . This relation can be
expressed as a function
ω : S → R3 , (9.1)
given explicitly by
2 2 2
!
X X X
ω(φ) = φ∗r (σ1 )rs φs , φ∗r (σ2 )rs φs , φ∗r (σ3 )rs φs , (9.2)
r,s=1 r,s=1 r,s=1

where σi are the three Pauli matrices


     
0 1 0 −i 1 0
σ1 = , σ2 = , σ3 = . (9.3)
1 0 i 0 0 −1

Obviously, they are self-adjoint complex 2 × 2 matrices. It is common to write σ =


(σ1 , σ2 , σ3 ) for the vector of Pauli matrices. With this notation, and writing
2
X

φ χ= (φs )∗ χs (9.4)
s=1

for the inner product in spin-space, Eq. (9.2) can be expressed more succinctly as

ω(φ) = φ∗ σφ . (9.5)

For example, the spinor φ = (1, 0) has ω(φ) = (0, 0, 1), which points in the +z-direction;
(1, 0) is therefore called a spin-up spinor. The spinor (0, 1) has ω(0, 1) = (0, 0, −1),
which points in the −z-direction; (0, 1) is therefore called a spin-down spinor. ω has
the properties
ω(zφ) = |z|2 ω(φ) (9.6)
and (homework problem)
|ω(φ)| = kφk2S = φ∗ φ , (9.7)
so unit spinors are associated with unit vectors. (Here, k · kS means the norm in the
spin space S = C2 . This way of mapping unit elements of C2 to unit vectors in R3 is
also sometimes called the Bloch sphere.)

50
Spinors have the curious property that if we rotate a spinor φ in spin-space through
an angle θ, with angles in Hilbert space defined by the relation

hφ|χi
cos θ = , (9.8)
kφkkχk
the corresponding direction ω(φ) in real space rotates through an angle 2θ. For example,
(0, 1) can be obtained from (1, 0) by rotating through 90◦ , while the corresponding vector
is rotated from the +z to the −z-direction, and thus through 180◦ . Expressed the other
way around, spinors rotate by half the angle of vectors. That is why one says that
electrons have spin one half. As a consequence, a rotation in real space by 360◦ will
correspond to one by 180◦ in spin space and carry φ to −φ, whereas a rotation in real
space by 720◦ will carry φ to itself.
There are also other types of spinors, other than spin- 12 : spin-1, spin- 32 , spin-2, spin-
5
2
, etc. The space of spin-s spinors has complex dimension 2s + 1, and the analogs of
the Pauli matrices are (2s + 1) × (2s + 1) matrices. In this context, wave functions
ψ : R3 → C are said to have spin 0. Electrons, quarks, and all known species of matter
particles have spin 12 ; the photon has spin 1; all known species of force particles have
integer spin; the only elementary particle species with spin 0 in the standard model of
particle physics is the Higgs particle or Higgs boson, which was experimentally confirmed
in 2012 at the Large Hadron Collider (LHC) of CERN in Geneva, Switzerland.

9.2 The Pauli Equation


When spin is taken into account, the Schrödinger equation reads a little differently. The
appropriate version is known as the Pauli equation. We will not study this equation in
detail; we write it down mainly for the sake of completeness:
∂ψ 1  2 ~
i~ = −i~∇ − A(x) ψ(x) − σ · B(x)ψ(x) + V (x)ψ(x) (9.9)
∂t 2m 2m
with B the magnetic field, V the electric and gravitational potential, A the magnetic
vector potential defined by the property
 
∂2 A3 − ∂3 A2
B = ∇ × A = ∂3 A1 − ∂1 A3  . (9.10)
∂1 A2 − ∂2 A1
(In words, B is the curl of A. The vector potential is, in fact, not uniquely defined by
this property, but different vector potentials satisfying (9.10) for the same magnetic field
can be translated into each other by gauge transformations, i.e., by different x-dependent
choices of the orthonormal basis in spin-space S.)
The Hilbert space of wave functions with spin is denoted by L2 (R3 , C2 ) and contains
the square-integrable functions R3 → C2 . The inner product is
Z Z X2

hψ|φi = 3
d x ψ (x) φ(x) = 3
dx ψs∗ (x) φs (x) . (9.11)
R3 R3 s=1

51
Born rule for position, given a spinor-valued wave function.
2
X
ρ(x) = |ψ(x)|2 := ψ ∗ (x) ψ(x) = kψ(x)k2S = |ψs (x)|2 . (9.12)
s=1

Note that this is a special case of the general Born rule (8.45) for the position operators
Xj . In the following, we will simply write | · | instead of k · kS .

9.3 The Stern–Gerlach Experiment


Let us write  
ψ1 (x)
ψ(x) = . (9.13)
ψ2 (x)
In the first half of a Stern–Gerlach experiment (first done in 1922 with silver atoms),
a wave packet moves through a magnetic field that is carefully designed so as to deflect
ψ1 (x) in a different direction than ψ2 (x), and thus to separate the two components in
space (Figure 9.1). Put differently, if the initial wave function ψ(t = 0) has support in
the ball Br (y) of radius r around the center y then the final wave function ψ(t = 1) (i.e.,
the wave function after passing through  the magnetic field) is such that ψ1 (x, t = 1)
has support in B+ := Br y + (1, 0, d) and ψ2 (x, t = 1) in B− := Br y + (1, 0, −d)
with deflection distance d > r (so that ψ1 and ψ2 do not overlap). The arrangement
creating this magnetic field is called a Stern–Gerlach magnet. In the second half of the
Stern–Gerlach experiment, one applies detectors to the regions B± . If the electron is
found in B+ then the outcome of the experiment is said to be up, if in B− then down.

2
4
5 N
S 1

3
Figure 9.1: Setup of the Stern-Gerlach experiment. (1) furnace, (2) beam of silver
atoms, (3) inhomogeneous magnetic field, (4) classically expected result, (5) observed
result. Picture credit: http://en.wikipedia.org/wiki/Stern-Gerlach_experiment

A case of particular interest is that the initial wave function satisfies

ψs (x) = φs χ(x) , (9.14)

52
where φ ∈ S, kφkS = 1, and χ : R3 → C, kχk = 1. One says that for such a ψ, the spin
degree of freedom is disentangled from the spatial degrees of freedom. (Before, we have
considered many-particle wave functions for which some particles were disentangled from
others. We may also consider a single particle and say that the x variable is disentangled
from the y and z variables iff ψ(x, y, z) = f (x) g(y, z).)
In the case (9.14), assuming that χ has support in Br (y), the wave function after
passing the magnet is  
φ1 χ x − (1, 0, d) 
, (9.15)
φ2 χ x − (1, 0, −d)
and it follows from the Born rule (9.12) for position that the probability of outcome
“up” is |φ1 |2 and that of “down” is |φ2 |2 .
These probabilities agree with what we would have obtained from the general Born
rule (8.45) for the observable A = σ3 and the vector φ in the Hilbert space H = S.
The spinors φ+1 = (1, 0) and φ−1 = (0, 1) form an orthonormal basis of S consisting
of eigenvectors of σ3 (with eigenvalues +1 and −1, respectively); φ plays the role of
ψ in (8.45); its coefficients in the ONB referred to in Eq. (8.45) are hφ+1 |φi = φ1 and
hφ−1 |φi = φ2 . That is why the Stern–Gerlach experiment is often called a “measurement
of σ3 ”, or a “measurement of the z component of spin.”
The Stern–Gerlach magnet can be rotated into any direction. For example, by
rotating by 90◦ around the x-axis (a rotation that will map the z-axis to the y-axis),
we obtain an arrangement that will deflect part of the initial wave packet ψ in the +y-
direction and another part in the −y-direction. However, these parts are not φ1 and φ2 .
Instead, they are the parts along a different ONB of S:
1 1
φ(+) = √ (1, i) and φ(−) = √ (1, −i) form an ONB of S with ω(φ(±) ) = (0, ±1, 0).
2 2
(9.16)
3 (+) (−)
That is, any ψ : R → S can be written as ψ(x) = c+ (x)φ + c− (x)φ , and these
two terms will get spatially separated (inRthe ±y direction, in fact). The probabilities
of outcomes “up” and “down” are then dx|c± (x)|2 . In the special case (9.14), the
probabilities are just |c± |2 , where φ = c+ φ(+) + c− φ(−) . Equivalently, the probabilities
are |hφ(±) |φi|2 . These values are in agreement with the general Born rule for A = σ2
because φ(±) are eigenvectors of σ2 with eigenvalues ±1.
Generally, if the Stern–Gerlach magnet is rotated from the z-direction to direction
n, where n is any unit vector in R3 , then the probabilities of its outcomes are governed
by the Born rule (8.45) for A = n · σ, which for any n is a self-adjoint 2 × 2 matrix with
eigenvalues ±1.

9.4 Bohmian Mechanics with Spin


John Bell figured out in 1966 how to do Bohmian mechanics for particles with spin.
It is surprisingly simple. Here is the single-particle version. Replace the Schrödinger

53
equation by the Pauli equation and Bohm’s equation of motion (6.1) by
dQ ~ ψ ∗ ∇ψ
= Im ∗ (t, Q(t)) . (9.17)
dt m ψ ψ
Recall that ψ ∗ ψ means the inner product in spin-space, so the denominator means

ψ ∗ (x)ψ(x) = |ψ1 (x)|2 + |ψ2 (x)|2 . (9.18)

Likewise, the numerator means

ψ ∗ (x)∇ψ(x) = ψ1∗ (x)∇ψ1 (x) + ψ2∗ (x)∇ψ2 (x) . (9.19)

The initial position Q(0) is assumed to be random with probability density

ρ0 (x) = |ψ0 (x)|2 . (9.20)

It follows that Q(t) has probability density |ψt |2 at every t. This version of the
equivariance theorem can be obtained by a very similar computation as in the spinless
case, involving the following variant of the continuity equation:
∂|ψ(x, t)|2 ~


= −∇ · Im(ψ ∇ψ) . (9.21)
∂t m
As a consequence of the equivariance theorem, Bohmian mechanics leads to the
correct probabilities for the Stern–Gerlach experiment.

9.5 Is an Electron a Spinning Ball?


If it were then the following paradox would arise. According to classical electrodynamics
(which of course is well confirmed for macroscopic objects), a spinning, electrically
charged object behaves like a magnet in two ways: it creates its own magnetic field, and
it reacts to an external magnetic field. Just as the strength of the electric charge can be
expressed by a number, the charge e, the strength of the magnet can be expressed by
a vector, the magnetic dipole moment or just magnetic moment µ. Its direction points
from the south pole to the north pole, and its magnitude is the strength of the magnet.
The magnetic moment of a charge e spinning at angular frequency ω around the axis
along the unit vector u is, according to classical electrodynamics,

µ = γeωu , (9.22)

where the factor γ depends on the size and shape of the object. Furthermore, if such an
object flies through a Stern–Gerlach magnet oriented in direction n then, still according
to classical electrodynamics, it gets deflected by an amount proportional to µ · n. Put
differently, the Stern–Gerlach experiment for a classical object measures µz , or the
component of µ in the direction of n. The vector ωu is called the spin vector.
Where is the paradox? It is that different choices of n, when applied to objects
with the same µ, would lead to a continuous interval of deflections [−γ|e|ω, +γ|e|ω],

54
whereas the Stern–Gerlach experiment, for whichever choice of n, leads to a discrete set
{+d, −d} of two possible deflections.
The latter fact was called by Wolfgang Pauli the “non-classical two-valuedness of
spin.” This makes it hard to come up with a theory in which the outcome of a Stern–
Gerlach experiment has anything to do with a spinning motion. While Feynman went
too far when claiming that the double-slit experiment does not permit any deeper ex-
planation, it seems safe to say that the Stern–Gerlach experiment does not permit an
explanation in terms of spinning balls.

9.6 Is There an Actual Spin Vector?


Here is another perspective on the question whether the electron is a spinning ball, from
a Bohmian angle. We have seen that Bohmian mechanics does not involve any spinning
motion to account for (what has come to be called) spin; electrons have actual positions
but not an actual spin vector. Some authors felt they should have an actual spin vector,
and have made proposals in this direction; let me explain why the most natural proposal
in this direction, due to Bohm, Schiller, and Tiomno,20 is unconvincing.
Consider a single electron. Since ψt is a function from R3 to spin space C2 , ψt (Qt ) is
a vector in C2 and thus associated with a direction in R3 , i.e., that of ω(ψt (Qt )). The
proposal is to regard the real 3-vector

ω(ψt (Qt )) ψ ∗ σψ
S t := = ∗ (t, Qt ) (9.23)
|ω(ψt (Qt ))| ψ ψ

as a further fundamental variable representing the actual spin vector of the particle,
so that the full state is given by the triple (ψt , Qt , S t ). It is tempting to imagine the
electron as a little ball spinning at angular velocity proportional to S t (i.e., at a fixed
angular speed around the axis in the direction of S t ) while its center moves according
to Qt .
The problem with this picture, and with S t as a further fundamental variable, be-
comes visible when we consider a Stern–Gerlach experiment, say in the z direction, often
called a “measurement of z-spin.” One might expect that the outcome of the experi-
ment is the z-component of S τ , with τ the time at which the Stern–Gerlach experiment
begins. But that is not the case. Rather, the outcome of the experiment is read off from
the final position of the particle, and that position depends on the initial wave function
ψτ and the initial position Qτ , but the equation of motion for Qt does not depend on
S t , so the further fundamental variable actually has no influence on the outcome! It
turns out that by the end of the Stern–Gerlach experiment, the vector S t has turned
so as to point in the z-up direction if the outcome was z-up. (This is because if Qt lies
in a purely z-up wave packet then S t points in the z-up direction.) But this fact does
not change the situation that the variable S t is superfluous. In fact, we have already
discussed how the Stern–Gerlach experiment, and indeed all phenomena involving spin,
20
D. Bohm, R. Schiller, and J. Tiomno: A causal interpretation of the Pauli equation (A). Il Nuovo
Cimento Supplementi 1: 48–66 (1955)

55
are naturally explained with just ψt and Qt as fundamental variables, so there is no
phenomenon whose explanation would require the introduction of S t , or would merely
be made simpler by the introduction of S t . The upshot is that an actual spin vector is
neither useful nor needed in Bohmian mechanics.

9.7 Many-Particle Systems


The wave function of N electrons is of the form

ψs1 ,s2 ,...,sN (x1 , x2 , . . . , xN ) , (9.24)

where each xj varies in R3 and each index sj in {1, 2}. Thus, at any configuration, ψ
N
has 2N complex components, or ψ : R3N → C2 . Note that while R3N is the Cartesian
N
product of N copies of R3 , C2 is not the Cartesian product of N copies of C2 (which
would have dimension 2N ) but the tensor product of N copies of C2 . Equivalently, we
could write ψ as a function R3N × {1, 2}N → C, where the set {1, 2}N of possible index
values (s1 , . . . , sN ) is a Cartesian product of N copies of {1, 2}; but in the following it
N
will be convenient to write ψ as a function R3N → C2 .
The Pauli equation then reads
N N
∂ψ 1 X 2 X ~
i~ = −i~∇k − A(xk ) ψ − σ (k) · B(xk )ψ + V ψ , (9.25)
∂t 2m k=1 k=1
2m

where σ (k) means σ acting on the index sk of ψ. Change the definition (9.4) of the
spin inner product φ∗ ψ, and Born’s rule (9.12), so as to sum over all spin indices sj .
Moreover, in Bohm’s equation of motion (9.17), replace Q ∈ R3 by Q ∈ R3N .

9.8 Representations of SO(3)


A deeper understanding of spinors comes from group representations.21 Let us start
easily. Consider the wave function of a single particle. Suppose it were, instead of
a complex scalar field, a vector field, so ψ : R3 → R3 . Well, it should be complex,
so we complexify the vector field, ψ : R3 → C3 . Now rotate your coordinate system
according to R ∈ SO(3). Then in the new coordinates, the same physical wave function
is represented by a different mathematical function,

ψ̃(x) = Rψ(R−1 x) . (9.26)

Instead of real-valued potentials, the Schrödinger equation could then include matrix-
valued potentials, provided the matrices are always self-adjoint:
∂ψ ~2
i~ =− ∆ψ + V ψ . (9.27)
∂t 2m
21
More details about the topic of this section can be found in R. U. Sexl and H. K. Urbantke:
Relativity, Groups, Particles, Springer-Verlag (2001).

56
Now consider another possibility: that the wave function is tensor-valued, ψab with
a, b = 1, 2, 3. Then in a rotated coordinate system,
3
X
ψ̃ab (x) = Rac Rbd ψcd (R−1 x) . (9.28)
c,d=1

What the two examples have in common is that the components of the wave function
get transformed as well according to the scheme, for ψ : R3 → Cd ,
d
X
ψ̃r (x) = Mrs (R) ψs (R−1 x) . (9.29)
s=1

The matrices M (R) satisfy the composition law

M (R1 ) M (R2 ) = M (R1 R2 ) and M (I) = I , (9.30)

which means that they form a representation of the group SO(3) of rotations—in other
words, a group homomorphism from SO(3) to GL(Cd ), the “general linear group” com-
prising all invertible operators on Cd . Further representations of SO(3) provide further
possible value spaces for wave functions ψ.
Spin space S for spin- 21 is almost of this kind, but there is one more complication:
SO(3) is represented, not by linear mappings S → S, but by mappings P (S) → P (S)
consistent with linear mappings, where P (S) is the set of all 1-dimensional subspaces
of S (called the projective space of S). This seems fitting as two wave functions that
differ only by a phase factor, φ(x) = eiθ ψ(x), are usually regarded as representing the
same physical quantum state (they yield the same Born distribution, at all times and
for all observables, and the same Bohmian trajectories for all times). That is, one can
say that a wave function is really an element of P (H ) rather than H because every
normalized element of Cψ is as good as ψ.
By a mapping F : P (S) → P (S) consistent with a linear mapping, I mean an F such
that there is a linear mapping M : S → S with F (Cψ) = CM ψ. While M determines
F uniquely, F does not determine M , as zM with any z ∈ C \ {0} leads to the same F .
In particular, if we are given F (R) and want an M (R), then −M (R) is always another
possible candidate. For spin- 12 , it turns out that while F (R1 ) F (R2 ) = F (R1 R2 ) as it
should, M (R) can at best be found in such a way that

M (R1 ) M (R2 ) = ±M (R1 R2 ) . (9.31)

This sign mismatch has something to do with the halved angles. The M are elements of
SU (2) (the group of unitary 2 × 2 matrices with determinant 1), and with every element
R of SO(3) are associated two elements of SU (2) that differ by a sign.
This association can actually be regarded as a mapping

ϕ : SU (2) → SO(3) , M 7→ R . (9.32)

57
This mapping ϕ is a group homomorphism (i.e., ϕ(M1 )ϕ(M2 ) = ϕ(M1 M2 ) and ϕ(I) =
I), is smooth, two-to-one [ϕ(−M ) = ϕ(M )], and locally a diffeomorphism. The situation
is similar to the group homomorphism χ : R → U (1), θ 7→ eiθ , which is also smooth,
many-to-one, and locally a diffeomorphism; just like R is what you get from the circle
U (1) when you unfold it, SU (2) is what you get from SO(3) when you “unfold” it. (The
unfolding of a manifold Q is called the covering space Q, \ = SU (2).) For every
b so SO(3)
continuous curve γ in SO(3) starting in I, there is a unique continuous curve γ̂ in SU (2)
with ϕ ◦ γ̂ = γ, called the lift of γ. Thus, continuous rotations in R3 can be translated
uniquely into continuous rotations in S.
The upshot of all this is that spinors are one of the various types of mathematical
objects (besides vectors and tensors) that react to rotations in a well-defined way, and
that is why they qualify as possible values of a wave function.

9.9 Inverted Stern–Gerlach Magnet and Contextuality


Consider again the Stern–Gerlach experiment in the +z direction on an initial wave
function of the form ψs (x) = φs χ(x) as in (9.14), with χ a fixed scalar packet, while
we consider different φ ∈ S. As mentioned, Bohmian mechanics leads to outcome Z =
“up” with probability |φ1 |2 and Z = “down” with |φ2 |2 , in agreement with the Born
rule for σ3 .

z up

down

Figure 9.2: Outcome of the experiment as a function of the yz components of the initial
position Q(τ ) = (x, y, z) of the Bohmian particle. The curve separating the two regions
indicates the critical surface mentioned in the text, the outer circle encloses the support
of χ.

Moreover, the outcome Z is determined by φ and the initial position Q(τ ) of the
Bohmian particle at the time τ at which the experiment begins. In fact, suppose for
simplicity that the approximate time evolution of ψ described in Section 9.3 including
(9.15) is valid and all the Bohmian trajectories have equal x-velocities and vanishing
y-velocity; then the topmost Q(τ ) (above a certain surface) as in Figure 9.2 will end up
in the “up” packet in B+ , and those below the critical surface in the “down” packet in
B− .
Now comes a subtle point: The outcome Z is not determined by Q(τ ), ψ(τ ), and σ3
alone. To understand this statement, let us consider an example (due to David Albert22 )
22
D. Z. Albert: Quantum Mechanics and Experience. Cambridge, MA: Harvard University Press
(1992)

58
consisting of a modified Stern–Gerlach experiment in which the polarity of the magnet
has been changed as in Figure 9.3.

N S

S N

Figure 9.3: LEFT: Schematic picture of a Stern–Gerlach magnet. RIGHT: Modified


Stern-Gerlach magnet with inverted polarity (north and south exchanged roles) while
keeping the shape.

It follows that the spin-up part of the wave function, ψ1 , gets deflected downward
and the spin-down part, ψ2 , deflected upward. For this reason, let us decide that if
the particle gets detected in the upward deflected location, B+ , then we say that the
outcome Z is “down,” and if detected in B− , then the outcome Z is “up.” With this
convention, the probability of “up” is |φ1 |2 and that of “down” is |φ2 |2 , in agreement
with the Born rule for σ3 . That is, the modified experiment (flipped in two ways) is
again a quantum measurement of the observable σ3 .

z
down
up
y

Figure 9.4: Outcome of the modified experiment as a function of the initial position.

In the modified experiment, the Bohmian particle will again end up in B+ if the
initial position Q(τ ) lies above a certain critical surface (possibly different from before)
and in B− if initially below the surface, see Figure 9.4. But now, ending up in B+ means
outcome “down.” Thus, if |φ1 |2 is neither 0 nor 1, then an initial position Q(τ ) near the
top will lead to outcome “up” in the original experiment but “down” in the modified
experiment.
The upshot of this example is that two different experiments, both of which are
“quantum measurements of σ3 ,” will sometimes yield different outcomes when applied
to a particle in the same state (Q(τ ), ψ(τ )). This is what I meant when saying that
the outcome is not determined by Q(τ ), ψ(τ ) and σ3 alone—it depends on which of the
two experiments we carry out. This example shows that “quantum measurements” are
not necessarily measurements in the ordinary meaning of the word. It also shows that
Bohmian mechanics does not define an actual value of σ3 .

59
The fact that different ways of “measuring σ3 ” can yield different outcomes is called
contextuality. In the literature, contextuality is sometimes presented as the weird, mys-
terious trait (of the quantum world or of Bohmian mechanics) that a measurement
outcome may depend on the “context” of the measurement. But really the sense of
paradox arises only from taking the word “measurement” too literally, and the trivial
essence of the perceived mystery has been nicely formulated by Detlef Dürr, Sheldon
Goldstein, and Nino Zanghı̀ (2004):23

“The result of an experiment depends upon the experiment.”

The idea that there should be an actual value of σ3 has led to a lot of discussion
in the literature associated with the key words “non-contextual hidden variables.” It
turns out that they are mathematically impossible, as we will see later in the section on
no-hidden-variables theorems, Section 24.

23
Section 8.4 in D. Dürr, S. Goldstein, and N. Zanghı̀: Quantum Equilibrium and the Role of
Operators as Observables in Quantum Theory. Journal of Statistical Physics 116: 959–1055 (2004)
http://arxiv.org/abs/quant-ph/0308038

60
10 The Projection Postulate
10.1 Notation
In the Dirac notation one writes |ψi for ψ. This may seem like a waste of symbols at
first, but often it is the opposite, as it allows us to replace a notation such as φ1 , φ2 , . . .
by |1i, |2i, . . .. Of course, a definition is needed for what |ni means, just as one would
be needed for φn . It is also convenient when using long subscripts, such as replacing
ψleft slit by |left sliti. In spin space S, one commonly writes
   
1 0
|z-upi = ↑ =
, |z-downi = ↓ =
(10.1)
0 1
   
1 1 1 1
|y-upi = √ , |y-downi = √ (10.2)
2 i 2 −i
   
1 1 1 1
|x-upi = √ , |x-downi = √ (10.3)
2 1 2 −1

(Compare to Eq. (9.16) and Exercise 16 in Assignment 4, and to Maudlin’s article.)


Furthermore, in the Dirac notation one writes hφ| for the mapping H → C given
by ψ 7→ hφ|ψi. Obviously, hφ| applied to |ψi gives hφ|ψi, which suggested the notation.
Paul Dirac called hφ| a bra and |ψi a ket. Obviously, hφ|A|ψi means the same as
hφ|Aψi. Dirac suggested that for self-adjoint A, the notation hφ|A|ψi conveys better
that A can be applied equally well to either φ or ψ. |φihφ| is an operator that maps ψ
to |φihφ|ψi = hφ|ψiφ. If φ is a unit vector then this is the part of ψ parallel to φ, or the
projection of ψ to φ.
Another common and useful notation is ⊗, called the tensor product. For

Ψ(x, y) = ψ(x) φ(y) (10.4)

one writes
Ψ = ψ ⊗ φ. (10.5)
Likewise, for Eq. (9.14) one writes ψ = φ ⊗ χ.
The symbol ⊗ also has meaning when applied to Hilbert spaces.

L2 (x, y) = L2 (x) ⊗ L2 (y) , (10.6)

where L2 (x) means the square-integrable functions of x, etc. Note that not all elements
of L2 (x) ⊗ L2 (y) are of the form ψ ⊗ φ—only a minority are. A general element of
L2 (x)⊗L2 (y) is an infinite linear combination of tensor products such as ψ⊗φ. Likewise,
when we replace the continuous variable y by the discrete index s for spin, the tensor
product of the Hilbert space C2 of vectors φs and the Hilbert space L2 (R3 , C) of wave
functions χ(x) is the Hilbert space L2 (R3 , C2 ) of wave functions ψs (x):

C2 ⊗ L2 (R3 , C) = L2 (R3 , C2 ) . (10.7)

61
Another notation we use is

f (t−) = lim f (s) , f (t+) = lim f (s) (10.8)


s%t s&t

for the left and right limits of a function f at a jump.

10.2 The Projection Postulate


Here is the last rule of the quantum formalism:

Projection postulate. If we measure the observable A at time t on a system with


wave function ψt− and obtain the outcome α then the system’s wave function ψt+ right
after the measurement is the eigenfunction of A with eigenvalue α. If there are several
mutually orthogonal eigenfunctions, then
X
ψt+ = C |φα,λ ihφα,λ |ψt− i , (10.9)
λ

where C > 0 is the normalizing constant.


P R
If λ is a continuous variable, then λ should be dλ. The value of C is, explicitly,
−1
X
C= |φα,λ ihφα,λ |ψt− i . (10.10)


λ

10.3 Projection and Eigenspace


To get a better feeling for what the expression on the RHS of (10.9) means, consider a
vector ψ = ψt− and an ONB φn = φα,λ , and expand ψ in that basis:
X
ψ= cn φn . (10.11)
n

The coefficients are then given by

cm = hφm |ψi (10.12)

because
D X E X X
hφm |ψi = φm cn φn = cn hφm |φn i = cn δmn = cm . (10.13)

n n n

Now change ψ by replacing some of the coefficients cn by zero while retaining the others
unchanged: X
ψ̃ = cn φn , (10.14)
n∈J

62
where J is the set of those indices retained. This procedure is called projection to the
subspace spanned by {φn : n ∈ J}, and the projection operator is
X
P = |φn ihφn | . (10.15)
n∈J

(The only projections we consider are orthogonal projections.) An operator P is a


projection iff it is self-adjoint [P = P † ] and idempotent [P 2 = P ]; equivalently, iff it is
self-adjoint and the spectrum (set of generalized eigenvalues) is {0, 1}.
In Eq. (10.9), the index n numbers the index pairs (α, λ), and the subset J corre-
sponds to those pairs that have a given α and arbitrary λ. Except for the factor C, the
RHS of (10.9) is the corresponding projection of ψt− , which gives the projection postu-
late its name. The subspace of Hilbert space spanned by the φα,λ with given α is the
eigenspace of A with eigenvalue α. Thus, the projection postulate can be equivalently
rewritten as
Pα ψt−
ψt+ = , (10.16)
kPα ψt− k
where Pα denotes the projection to the eigenspace of A with eigenvalue α.
For every closed subspace, there is a projection operator that projects to this sub-
space. For example, for any region B ⊆ R3N in configuration space, the functions whose
support lies in B (i.e., which vanish outside B) form an ∞-dimensional closed subspace
of L2 (R3N ). The projection to this subspace is
(
ψ(q) q ∈ B
(PB ψ)(q) = (10.17)
0 q∈
/ B,

that is, multiplication by the characteristic function 1B of B.

10.4 Remarks
According to the projection postulate (also known as the measurement postulate or
the collapse postulate), the wave function changes dramatically in a measurement. The
change is known as the reduction of the wave packet or the collapse of the wave function.
For example, in a spin-z (or σ3 -) measurement, the wave function before the mea-
surement is an arbitrary spinor (φ1 , φ2 ) ∈ S with |φ1 |2 + |φ2 |2 = 1 (assuming Eq. (9.14)
and ignoring the space dependence). With probability |φ1 |2 , we obtain outcome “up”
and the collapsed spinor (φ1 /|φ1 |, 0) after the measurement. The term φ1 /|φ1 | is just
the phase of φ1 . With probability |φ2 |2 , we obtain “down” and the collapsed spinor
(0, φ2 /|φ2 |).
With the projection postulate, the formalism provides a prediction of probabilities
for any sequence of measurements. If we prepare the initial wave function ψ0 and make
a measurement of A1 at time t1 then the Schrödinger equation determines what ψt1 −
is, the general Born rule (8.45) determines the probabilities of the outcome α1 , and the
projection postulate the wave function after the measurement. The latter is the initial

63
wave function for the Schrödinger equation, which governs the evolution of ψ until the
time t2 at which the second measurement, of observable A2 , occurs. The probability
distribution of the outcome α2 is given by the Born rule again and depends on α1 because
the initial wave function in the Schrödinger equation, ψt1 + , did. And so on. This scheme
is the quantum formalism. Note that the observer can choose t2 and A2 after the first
measurement and thus make this choice depend on the first outcome α1 .
The projection postulate implies that if we make another measurement of A right
after the first one, we will with probability 1 obtain the same outcome α.
For a position measurement, the projection postulate implies that the wave function
collapses to a delta function. This is not realistic, it is over-idealized. A delta function
is not a square-integrable function, and it contains in a sense an infinite amount of
energy. More realistically, a position measurement has a finite inaccuracy ε and could
be expected to collapse the wave function to one of width ε, such as
(x−α)2
ψt+ (x) = Ce− 4ε2 ψt− (x) . (10.18)

However, this operator (multiplication by a Gaussian) is not a projection because its


spectrum is more than just 0 and 1.
Another simple model of position measurement, still highly idealized but less so than
collapse to δ(x − α), considers a region B ⊂ R3 and assumes that a detector either finds
the particle in B or not. The corresponding observable is A = PB as defined in (10.17),
and the probability of outcome 1 is
Z
d3 x |ψt− (x)|2 . (10.19)
B

In case of outcome 1, ψt− collapses to


PB ψt−
ψt+ = . (10.20)
kPB ψt− k

You may feel a sense of paradox about the two different laws for how ψ changes with
time: the unitary Schrödinger evolution and the collapse rule. Already at first sight,
the two seem rather incompatible: the former is deterministic, the latter stochastic; the
former is continuous, the latter not; the former is linear, the latter not. It seems strange
that time evolution is governed not by a single law but by two. And even stranger that
the criterion for when the collapse rule takes over is something as vague as an observer
making a measurement. Upon scrutiny, the sense of paradox will persist and even deepen
in the form of what is known as the measurement problem of quantum mechanics.

64
11 The Measurement Problem
11.1 What the Problem Is
This is a problem about orthodox quantum mechanics. It is solved in Bohmian mechan-
ics and several other theories. Because of this problem, the orthodox view is in trouble
when it comes to analyzing the process of measurement.
Consider a “quantum measurement of the observable A.” Realistically, there are
only finitely many possible outcomes, so A should have finite spectrum. Consider the
system formed by the object together with the apparatus. Since the apparatus consists
of electrons and quarks, too, it should itself be governed by quantum mechanics. (That
is reductionism at work.) So I write Ψ for the wave function of the system (object
and apparatus). Suppose for simplicity that the system is isolated (i.e., there is no
interaction with the rest of the universe), so Ψ evolves according to the Schrödinger
equation during the experiment (recall Exercise 13 of Assignment 3), which begins (say)
at t1 and ends at t2 . It is reasonable to assume that

Ψ(t1 ) = ψ(t1 ) ⊗ φ (11.1)

with ψ = ψ(t1 ) the wave function of the object before the experiment and φ a wave
function representing a “ready” state of the apparatus. By the spectral theorem, ψ can
be written as a linear combination (superposition) of eigenfunctions of A,
X
ψ= cα ψα with Aψα = αψα and kψα k = 1 . (11.2)
α

If the object’s wave function is an eigenfunction ψα , then, by Born’s rule (8.45), the
outcome is certain to be α. Set Ψα (t1 ) = ψα ⊗ φ. Then Ψα (t2 ) must represent a state
in which the apparatus displays the outcome α (for example, by a pointer pointing to
the appropriate position on a scale).
Now consider again a general ψ as in Eq. (11.2). Since the Schrödinger equation is
linear, the wave function of object and apparatus together at t2 is
X
Ψ(t2 ) = cα Ψα (t2 ) , (11.3)
α

a superposition of states corresponding to different outcomes—and not a random state


corresponding to a unique outcome, as one might have expected from the projection
postulate. This is the measurement problem. The upshot is that there is a conflict
between the following assumptions:
• In each run of the experiment, there is a unique outcome.

• The wave function is a complete description of a system’s physical state.

• The evolution of the wave function of an isolated system is always given by the
Schrödinger equation.

65
Thus, we have to drop one of these assumptions. The first is dropped in the many-
worlds picture, in which all outcomes are realized, albeit in parallel worlds. If we drop
the second, we opt for additional variables as in Bohmian mechanics, where the state
at time t is described by the pair (Qt , ψt ). If we drop the third, we opt for replacing
the Schrödinger equation by a non-linear evolution (as in the GRW = Ghirardi–Rimini–
Weber approach). Of course, a theory might also drop several of these assumptions.
Orthodox quantum mechanics insists on all three assumptions, and that is why it has a
problem.
We took for granted that the system was isolated and had a wave function. We may
wonder whether that was asking too much. However, we could just take the system to
consist of the entire universe, so it is disentangled and isolated for sure. More basically,
if we cannot solve the measurement problem for an isolated system with a wave function
then we have no chance of solving it for a system entangled with outside particles.

11.2 How Bohmian Mechanics Solves the Problem


Since it is assumed that the Schrödinger equation is valid for a closed system, the after-
measurement wave function of object and apparatus together is
X
Ψ= cα Ψα . (11.4)
α

Since the Ψα have disjoint supports in the configuration space (of object and apparatus
together), and since the particle configuration Q has distribution |Ψ|2 , the probability
that Q lies in the support of Ψα is
Z Z
3N 2
d3Nq |cα Ψα (q)|2 = |cα |2 , (11.5)

P Q ∈ support(Ψα ) = d q |Ψ(q)| =
support(Ψα ) R3N

which agrees with the prediction of the quantum formalism for the probability of the
outcome α. And indeed, when Q ∈ support(Ψα ), then the particle positions (including
the particles of both the object and the apparatus!) are such that the pointer of the
apparatus points to the value α. Thus, the way out of the measurement problem is
that although the wave function is a superposition of terms corresponding to different
outcomes, the actual particle positions define the actual outcome.
As a consequence of the above consideration, we also see that the predictions of
Bohmian mechanics for the probabilities of the outcomes of experiments agree with
those of standard quantum mechanics. In particular, there is no experiment that could
empirically distinguish between Bohmian mechanics and standard quantum mechanics,
while there are (in principle) experiments that distinguish the two from a GRW world.
If Bohmian mechanics and standard quantum mechanics agree about all probabili-
ties, then where do we find the collapse of the wave function in Bohmian mechanics?
There are two parts to the answer, depending on which wave function we are talking
about.

66
The first part of the answer is, if the Ψα are macroscopically different then they
will never overlap again (until the time when the universe reaches thermal equilibrium,
10
perhaps in 1010 years). This fact is independent of Bohmian mechanics, it is a trait
of the Schrödinger equation called decoherence.24 If Q lies in the support of one among
several disjoint packets then only the packet containing Q is relevant, by Bohm’s law
of motion (6.1), to determining dQ/dt. Thus, as long as the packets stay disjoint, only
the packet containing Q is relevant to the trajectories of the particles, and all other
packets could be replaced by zero without affecting the trajectories. That is why we can
replace Ψ by cα Ψα , with α the actual outcome. Furthermore, the factor cα cancels out
in Bohm’s law of motion (6.1) and thus can be dropped as well.
The second part of the answer is, the quantum formalism does not, in fact, talk
about the wave function Ψ of object and apparatus but about the wave function ψ of
the object alone. This leads us to the question what is meant by the wave function of
a subsystem. If
Ψ(x, y) = ψ(x)φ(y) (11.6)
then it is appropriate to call ψ the wave function of the x-system, but in general Ψ does
not factorize as in (11.6). In Bohmian mechanics, a natural general definition for the
wave function of a subsystem is the conditional wave function
ψ(x) = N Ψ(x, Y ) , (11.7)
where Y is the actual configuration of the y-system (while x is not the actual configu-
ration X but any configuration of the x-system) and
Z −1/2
2
N = |Ψ(x, Y )| dx (11.8)

is the normalizing factor. The conditional wave function does not, in general, evolve
according to a Schrödinger equation, but in a complicated way depending on Ψ, Y ,
and X. There are special situations in which the conditional wave function does evolve
according to a Schrödinger equation, in particular when the x-system and the y-system
do not interact and the wave packet in Ψ containing Q = (X, Y ) is of a product form such
as (11.6). Indeed, this is the case for the object before, but not during the measurement;
as a consequence, the wave function of the object (i.e., its conditional wave function)
evolves according to the Schrödinger equation before, but not during the measurement—
in agreement with the quantum formalism. To determine the conditional wave function
after the quantum measurement, suppose that Ψα is of the form
Ψα = ψα ⊗ φα (11.9)
with φα a wave function of the apparatus with the pointer pointing to the value α.
Let α be the actual outcome, i.e., Q ∈ support(Ψα ). Then Y ∈ support(φα ) and the
conditional wave function is indeed
ψ = ψα . (11.10)
24
“Coherence” originally meant the ability to interfere, “decoherence” the loss thereof. Another,
related but inequivalent, widespread meaning of “decoherence” is that the reduced density matrix is
(approximately) diagonal in the eigenbasis of A, see Section 22.4.

67
11.3 Decoherence
People sometimes say that decoherence solves the measurement problem. We have seen
that decoherence (i.e., the fact that the Ψα stay disjoint practically forever) plays a role
in how Bohmian mechanics solves the measurement problem. But it is clear that mere
disjointness of the packets Ψα does not make any of the packets go away. So the problem
remains unless we drop Pone of the three assumptions mentioned in Section 11.1.
It is striking that cα Ψα is the kind of wave function for which, if we applied the
Born rule to a quantum measurement of the pointer position, we would get kcα Ψα k2 =
|cα |2 as the probability that the pointer points to α, and that is exactly the value we
wanted. So a “super-measurement” of the pointer position would seem to help. But if
we apply the reasoning of the measurement problem to the “super-apparatus” used for
the super-measurement, we obtain again a nontrivial superposition of terms associated
with different outcomes α. So which idea wins? When we push this thought further
and if necessary iterate it further, we have to stop at the point where the system is
the whole universe and includes all types of apparatus used. Thus, we end up with a
superposition, and the measurement problem remains.
It is also striking that the super-observer, applying her super-apparatus to measure
the pointer position of the first apparatus, cannot distinguish between decoherenceP and
collapse; that is, she cannot decide whether the wave function of the system was cα Ψα
(a superposition) or one of the Ψα with probability |cα |2 (a “mixture”). This means
two things: first, that both will yield the result α with probability |cα |2 ; and second,
that even if she tried to manipulate the system (i.e., act on it with external forces,
etc.), she would not be able to find out whether it is a superposition or a mixture.
That is because the crucial difference between a superposition and a mixture is that a
superposition is capable of interference. However, if the packets Ψα are macroscopically
disjoint (i.e., if decoherence has occurred), then it becomes so extraordinarily difficult
as to be practically impossible to make these packets overlap again, which would be a
necessary condition for interference.
In particular, the probability of the outcome obtained by the super-observer does
not depend on whether we treat the first apparatus as a quantum mechanical system
(as we did in the measurement problem) or simply as triggering a collapse of the wave
function. This fact is a consistency property of the rules for making predictions. But
this fact does not mean the measurement problem did not exist.
That is because the measurement problem is not about what the outcome will be, it
is about what happens in reality. If the three assumptions about reality are true, then it
follows that reality will not agree with the prediction of the quantum formalism. That
is the problem.
This point also makes clear that right from the start, the super-apparatus did not
actually help. If we are talking about what happens in reality, we expect that already
the first apparatus produces an actual outcome, not merely a superposition. But it did
not without the super-apparatus, and that is why there is a problem.

68
11.4 Schrödinger’s Cat
Often referred to in the literature, this is Schrödinger’s25 1935 formulation of the mea-
surement problem:

“One can even set up quite ridiculous cases. A cat is penned up in a steel
chamber, along with the following diabolical device (which must be secured
against direct interference by the cat): in a Geiger counter there is a tiny
bit of radioactive substance, so small, that perhaps in the course of one hour
one of the atoms decays, but also, with equal probability, perhaps none; if it
happens, the counter tube discharges and through a relay releases a hammer
which shatters a small flask of hydrocyanic acid. If one has left this entire
system to itself for an hour, one would say that the cat still lives if meanwhile
no atom has decayed. The first atomic decay would have poisoned it. The
ψ-function of the entire system would express this by having in it the living
and dead cat (pardon the expression) mixed or smeared out in equal parts.
It is typical of these cases that an indeterminacy originally restricted to the
atomic domain becomes transformed into macroscopic indeterminacy, which
can then be resolved by direct observation. That prevents us from so naively
accepting as valid a “blurred model” for representing reality. In itself it
would not embody anything unclear or contradictory. There is a difference
between a shaky or out-of-focus photograph and a snapshot of clouds and
fog banks.”

11.5 Positivism and Realism


Positivism is the view that a statement which cannot be tested in experiment is meaning-
less or unscientific. For example, the statement in Bohmian mechanics that an electron
went through the upper slit of a double-slit if and only if it arrived in the upper half
of the screen, cannot be tested in experiment. After all, if you try to check which slit
the electron went through by detecting every electron at the slit then the statement is
no longer true in Bohmian mechanics (and in fact, no correlation with the location of
arrival is found). So a positivist thinks that Bohmian mechanics is unscientific. Good
statements for a positivist are operational statements, i.e., statements of the form “if we
set up an experiment in this way, the outcome has such-and-such a probability distri-
bution.” Positivists think that the quantum formalism (thought of as a summary of all
true operational statements of quantum mechanics) is the only scientific formulation of
quantum mechanics. They also tend to think that ψ is the complete description of a
system, as it is the only information about the system that can be found experimentally
25
From E. Schrödinger: Die gegenwärtige Situation in der Quantenmechanik, Naturwissenschaften
23: 807–812, 823–828, 844–849 (1935). English translation by J. D. Trimmer: The Present Situa-
tion in Quantum Mechanics, Proceedings of the American Philosophical Society 124: 323–338 (1980).
Reprinted in J. A. Wheeler, W. H. Zurek (ed.s): Quantum Theory and Measurement, Princeton Uni-
versity Press (1983), pages 152–167.

69
without disturbing ψ. They tend not to understand the measurement problem, or not
to take it seriously, because it requires thinking about reality.
Realism is the view that a fundamental physical theory needs to provide a coher-
ent story of what happens. Bohmian mechanics, GRW theory, and many-worlds are
examples of realist theories. For a realist, the quantum formalism by itself does not
qualify as a fundamental physical theory. The story provided by Bohmian mechanics,
for example, is that particles have trajectories, that there is a physical object that is
mathematically represented by the wave function, and that the two evolve according to
certain equations. For a realist, the measurement problem is serious and can only be
solved by denying one of the 3 conflicting premises.
Feynman had a nice example for expressing his reservations about positivism:26

“For those people who insist that the only thing that is important is
that the theory agrees with experiment, I would like to imagine a discussion
between a Mayan astronomer and his student. The Mayans were able to
calculate with great precision predictions, for example, for eclipses and for
the position of the moon in the sky, the position of Venus, etc. It was all
done by arithmetic. They counted a certain number and subtracted some
numbers, and so on. There was no discussion of what the moon was. There
was no discussion even of the idea that it went around. They just calculated
the time when there would be an eclipse, or when the moon would rise at the
full, and so on. Suppose that a young man went to the astronomer and said,
‘I have an idea. Maybe those things are going around, and they are balls
of something like rocks out there, and we could calculate how they move in
a completely different way from just calculating what time they appear in
the sky.’ ‘Yes,’ says the astronomer, ‘and how accurately can you predict
the eclipses?’ He says, ‘I haven’t developed the thing very far yet.’ Then
says the astronomer, ‘Well, we can calculate eclipses more accurately than
you can with your model, so you must not pay any attention to your idea
because obviously the mathematical scheme is better.’ ”

The point is that positivism, if taken too far (as the imaginary ancient astronomer did),
will stifle efforts to understand the world around us. People often say that the goal of
physics is to make predictions that can be compared to experiment. I would not say
that. I think that the goals of physics include to understand how the world works and
to figure out what its fundamental laws are. In fact, often we do not make theories to
compute predictions for experiments but make experiments to investigate our theories.
(Physics also has further goals, such as making use of the laws of nature for technical
applications, or to study remarkable behavior of special physical systems.)
Positivism may appear as particularly safe and modest. After all, operational state-
ments may appear as safe statements, and it may seem modest to refrain from spec-
ulation about the nature of things and the explanation of the phenomena we observe.
However, often this appearance is an illusion, and Feynman’s example suggests why.
26
Page 169 in R. P. Feynman: The Character of Physical Law. Cambridge, MA: MIT Press (1967)

70
Someone who refrains too much from speculation may miss out on understanding the
nature of things and explanation of phenomena. Here is another example: Wheeler’s
fallacy (see Section 6.5). Wheeler took for granted that a particle will reach the lower
detector if and only if it went through the upper slit. Positivists like to call this an
“operational definition” of what it means for a particle to have gone through the upper
slit in a situation in which no attempt was made at detection during the passage through
the slits. But such a “definition” is actually neither safe nor modest: it goes far beyond
what is within our choice to define and, as described in Section 6.5, it conflicts with
where the particle actually went in that theory in which it makes sense to ask which slit
the particle went through—i.e., in Bohmian mechanics.
Positivism and realism play a role not only in the foundations of quantum mechan-
ics but widely in philosophy. As a side remark, I mention that they also play a role in
mathematics. According to the positivist view of mathematics (also known as formal-
ism), a mathematical statement that can neither be proved nor disproved is meaningless,
whereas a realist about mathematics (also called a Platonist) would object that if we can
understand the content of a mathematical statement then it must be meaningful and
have a truth value (i.e., be either true or false), regardless of whether it can be proven.
Perhaps the most prominent positivist in mathematics was David Hilbert, and perhaps
the most prominent realist Kurt Gödel who, in his famous incompleteness theorem,27
gave an explicit example of an obviously meaningful mathematical statement that can
neither be proven nor disproven from standard axioms using standard rules28 (but, for
curious reasons I will not discuss here, can actually be known to be true).29

27
K. Gödel: Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme
I. Monatshefte für Mathematik und Physik 38: 173–198 (1931)
28
such as B. Russel and A. N. Whitehead: Principia Mathematica. Cambridge University Press
(1913)
29
Another illuminating example of (presumably) unprovable mathematical statements is due to Tim
Maudlin and involves “reasonless truths”: Define that any two real numbers x, y match if their decimal
expansions have equal digit 1, or digits 2 and 3, or digits 4–6, or digits 7–10, etc. Random numbers
match with probability 1/9 (geometric series). Let P be the statement “541/18 and π 78 do not match.”
If P is false, it can be disproven, but if true then presumably it cannot be proven from standard axiom
systems of mathematics (such as Russell and Whitehead’s Principia Mathematica) because there is
no deeper reason behind it—it happens to be true “by coincidence.” Now generate many similar √
statements; 8/9 of them should be true and unprovable. Let P 0 be the statement “For all n ∈ N, 2
matches cos n.” It is presumably unprovable no matter if true or false, and presumably false because
it has probability 0.

71
12 The GRW Theory
Bohmian mechanics is not the only possible explanation of quantum mechanics. Another
one is provided by the GRW theory, named after GianCarlo Ghirardi, Alberto Rimini,
and Tullio Weber, who proposed it in 1986. A similar theory, CSL (for continuous
spontaneous localization), was proposed by Philip Pearle in 1989. In both theories, Ψt
does not evolve according to the Schrödinger equation, but according to a modified
evolution law. This evolution law is stochastic, as opposed to deterministic. That is, for
any fixed Ψ0 , it is random what Ψt is, and the theory provides a probability distribution
over Hilbert space. A family of random variables Xt , with one variable for every time t,
is called a stochastic process. Thus, the family (Ψt )t>0 is a stochastic process in Hilbert
space. We leave CSL aside and focus on the GRW process. In it, periods governed by
the Schrödinger equation are interrupted by random jumps. Such a jump occurs, within
any infinitesimal time interval dt, with probability λ dt, where λ is a constant called
the jump rate. Let us call the random jump times T1 , T2 , . . .; the sequence T1 , T2 , . . . is
known as the Poisson process with rate λ; it has widespread applications in probability
theory. Let us have a closer look.

12.1 The Poisson Process


Think of T1 , T2 , . . . as the times at which a certain type of random event occurs; standard
examples include the times when an earthquake (of a certain strength) occurs, or when
the phone rings, or when the price of a certain share falls below a certain value. We
take for granted that the ordering is chosen such that 0 < T1 < T2 < . . ..
Let us figure out the probability density function of T1 . The probability that T1
occurs between 0 and dt is λ dt. Thus, the probability that it does not occur is 1 − λ dt.
Suppose that it did not occur between 0 and dt. Then the probability that it doesn’t
occur between dt and 2 dt is again 1 − λ dt. Thus, the total probability that no event
occurs between 0 and 2 dt is (1−λ dt)2 . Proceeding in the same way, the total probability
that no event occurs between 0 and n dt is (1 − λ dt)n . Thus, the total probability that
no event occurs between 0 and t, P(T1 > t), can be approximated by setting dt = t/n
and letting n → ∞. That is,
 n
λt
P(T1 > t) = lim 1 − = e−λt . (12.1)
n→∞ n

Let us write ρ(t) for the probability density function of T1 . By definition,

ρ(t) dt = P(t < T1 < t + dt) . (12.2)

To compute this quantity, we reason as follows. If T1 has not occured until t, then the
probability that it will occur within the next dt is λ dt. Thus, (12.2) differs from (12.1)
by a factor λ dt, or, as the factor dt cancels out,

ρ(t) = 1t>0 e−λt λ , (12.3)

72
where the expression 1C is 1 whenever the condition C is satisfied, and 0 otherwise. The
distribution (12.3) is known as the exponential distribution with parameter λ, Exp(λ).
We have thus found that the waiting time for the first event has distribution Exp(λ).
After T1 , the next dt has again probability λ dt for the next event to occur. The
above reasoning can be repeated, with the upshot that the waiting time T2 − T1 for
the next event has distribution Exp(λ) and is independent of what happened up to time
T1 . The same applies to the other waiting times Tn+1 − Tn . In fact, at any time t0 the
waiting time until the next event has distribution Exp(λ).
The exponential distribution has expectation value
Z ∞
1
t ρ(t) dt = . (12.4)
0 λ
This fact is very plausible if you think of it this way: If in every second the probability of
an earthquake is, say, 10−8 , then you would guess that an earthquake occurs on average
every 108 seconds. The constant λ, whose dimension is 1/time, is thus the average
frequency of the earthquakes (or whichever events).
Another way of representing the Poisson process is by means of the random variables

Xt = #{i ∈ N : Ti < t} , (12.5)

the number of earthquakes up to time t.

Theorem 12.1. If T = {T1 , T2 , . . .} is a Poisson process with rate λ and T 0 is an


independent Poisson process with rate λ0 , then T ∪ T 0 is a Poisson process with rate
λ + λ0 .

For example, suppose earthquakes in Australia occur with rate λ and are independent
of those in Africa, which occur with rate λ0 ; then the earthquakes in Africa and Australia
together occur with rate λ + λ0 .

Theorem 12.2. If we choose n points at random in the interval [0, n/λ], independently
with uniform distribution, then the joint distribution of these points converges, as n →
∞, to the Poisson process with parameter λ.

12.2 Definition of the GRW Process


Now let us get back to the definition of the GRW process. To begin with, set the particle
number N = 1, so that Ψt : R3 → C. The random events are, instead of earthquakes,
spontaneous collapses of the wave function. That is, suppose that the random variables
T1 , T2 , T3 , . . ., are governed by a Poisson process with parameter λ; suppose that between
Tk−1 and Tk , the wave function Ψt evolves according to the Schrödinger equation (where
T0 = 0); at every Tk , the wave function changes discontinuously (“collapses”) as if an
outside observer made an unsharp position measurement with inaccuracy σ > 0. I will
give the formula below.

73
The constants λ and σ are thought of as new constants of nature, for which GRW
suggested the values
λ ≈ 10−16 sec−1 , σ ≈ 10−7 m . (12.6)
Alternatively, Stephen Adler suggested

λ ≈ 3 × 10−8 sec−1 , σ ≈ 10−6 m . (12.7)

This completes the definition of the GRW process for N = 1.


Now consider arbitrary N ∈ N, and let Ψ0 be (what is normally called) an N -particle
wave function Ψ0 = Ψ0 (x1 , . . . , xN ). Consider N independent Poisson processes with
rate λ, Ti,1 , Ti,2 , . . . for every i ∈ {1, . . . , N }. Let T1 be the smallest of all these random
times, T2 the second smallest etc., and let I1 be the index associated with T1 and I2 the
index associated with T2 etc. Equivalently, T1 , T2 , . . . is a Poisson process with rate N λ,
and along with every Tk we choose a random index Ik from {1, . . . , N } with uniform
distribution (i.e., each i has probability 1/N ), independently of each other and of the Tk .
Equivalently, a collapse with index i occurs with rate λ for each i ∈ {1, . . . , N }. Between
Tk−1 and Tk , Ψt evolves according to the Schrödinger equation. At Tk , Ψ changes as
if an observer outside of the system30 made an unsharp position measurement with
inaccuracy σ on particle number Ik .

12.3 Definition of the GRW Process in Formulas


Let us begin with N = 1.
C(X k )ΨTk −
ΨTk + = , (12.8)
kC(X k )ΨTk − k
where the collapse operator C(X) is a multiplication operator multiplying by the square
root of a 3-d Gaussian function centered at X:
q
C(X)Ψ(x) = gX,σ (x) Ψ(x) (12.9)

with
1 2 2
gX,σ (x) = 2 3/2
e−(X−x) /2σ . (12.10)
(2πσ )
The point X k ∈ R3 is chosen at random with probability density

ρ(X k = y|T1 , . . . , Tk , X 1 , . . . , X k−1 ) = kC(y)ΨTk − k2 , (12.11)

where ρ(· · · | · · · ) means the probability density, given the values of T1 , . . . , Tk , X 1 , . . . , X k−1 .
The right hand side of (12.11) is indeed a probability density because it is nonnegative
and
Z Z Z Z
d y ρ(X k = y| · · · ) = d y kC(y)Ψk = d y d3 x |C(y)Ψ(x)|2 = (12.12)
3 3 2 3

30
Or rather, outside of the universe, as the idea is that the entire universe is governed by GRW
theory.

74
Z Z Z
3 3 2
= dx d y gy,σ (x) |Ψ(x)| = d3 x |Ψ(x)|2 = 1 . (12.13)

For arbitrary N ∈ N and Ψt = Ψt (x1 , . . . , xN ),


CIk (X k )ΨTk −
ΨTk + = (12.14)
kCIk (X k )ΨTk − k
where the collapse operator CI (X) is the following multiplication operator:
q
CI (X)Ψ(x1 , . . . , xN ) = gX,σ (xI ) Ψ(x1 , . . . , xN ) . (12.15)

The random point X k is chosen at random with probability density

ρ(X k = y|T1 , . . . , Tk , I1 , . . . , Ik , X 1 , . . . , X k−1 ) = kCIk (y)ΨTk − k2 . (12.16)

Let us examine the probability distribution (12.11) of the center X of a collapse. For
a one-particle wave function Ψ, it is essentially |Ψ|2 ; more precisely, it is the quantum
distribution |Ψ|2 convolved with gσ , that is, smeared out (or blurred, or coarse-grained)
over a distance σ that is smaller than the macroscopic scale. For an N -particle wave
function Ψ, ρ(X = y) is essentially the marginal of |Ψ|2 connected to the xI -variable,
i.e., the distribution on 3-space obtained from the |Ψ|2 distribution on 3N -space by
integrating out 3N − 3 variables. (More precisely, smeared over width σ.) Thus, again,
on the macroscopic scale, the distribution of X is the same as the quantum mechanical
probability distribution for the position of the I-th particle. For many purposes, it
suffices to think of X as |Ψ|2 -distributed; the reason why GRW chose its distribution
not as exactly |Ψ|2 is that the definition (12.11), (12.16) above will lead to a no-signaling
theorem, i.e., to the property of the theory that the observable behavior of one system
cannot be influenced faster than light by collapses acting on another system, as we will
show in Section 22.5 using density matrices.
This completes the definition of the GRW process. But not yet the definition of the
GRW theory.

12.4 Primitive Ontology


There is a further law in GRW theory, concerning matter in 3-space. There are two
different versions of this law and, accordingly, two different versions of the GRW theory,
abbreviated as GRWm (m for matter density ontology) and GRWf (f for flash ontology).
For comparison, in Bohmian mechanics the matter in 3-space consists of the particles
(with trajectories).

In GRWm it is a law that, at every time t, matter is continuously distributed in


space with density function m(x, t) for every location x ∈ R3 , given by
N Z
X 2
m(x, t) = mi d3 x1 · · · d3 xN δ 3 (xi − x) ψt (x1 , . . . , xN ) . (12.17)
i=1
R3N

75
In words, one starts with the |ψ|2 distribution in configuration space R3N , then obtains
the marginal distribution of the i-th degree of freedom xi ∈ R3 by integrating out all
other variables xj , j 6= i, multiplies by the mass associated with xi , and sums over i.

In GRWf it is a law that matter consists of material points in space-time called


flashes. That is, matter is neither made of particles following world lines, nor of a
continuous distribution of matter such as in GRWm, but rather of discrete points in
space-time. According to GRWf, the space-time locations of the flashes can be read off
from the history of the wave function: every flash corresponds to one of the spontaneous
collapses of the wave function, and its space-time location is just the space-time location
of that collapse. The flashes form the set

F = {(X 1 , T1 , I1 ), . . . , (X k , Tk , Ik ), . . .} . (12.18)

Note that if the number N of the degrees of freedom in the wave function is large,
as in the case of a macroscopic object, the number of flashes is also large (if λ = 10−16
s−1 and N = 1023 , we obtain 107 flashes per second). Therefore, for a reasonable choice
of the parameters of the GRWf theory, a cubic centimeter of solid matter contains more
than 107 flashes per second. That is to say that large numbers of flashes can form
macroscopic shapes, such as tables and chairs. “A piece of matter then is a galaxy of
[flashes].” (Bell, page 205) That is how we find an image of our world in GRWf.

A few remarks. The m function of GRWm and the flashes of GRWf are called the
primitive ontology of the theory. Ontology means what exists according to a theory; for
example, in Bohmian mechanics ψ and Q, in GRWm ψ and m, in GRWf ψ and F . The
“primitive” ontology is the part of the ontology representing matter in 3-d space (or 4-d
space-time): Q in Bohmian mechanics, m in GRWm, and F in GRWf.
Bell coined the word beables (pronounced bee-abbles) for variables representing the
ontology. The word is a counterpart to “observables”; in contrast to the observed
outcomes of experiments, the beables represent what is real. The suffix “able” can be
understood as reflecting the fact that the ontology can be different for different theories.
It may seem that a continuous distribution of matter should conflict with the ev-
idence for the existence of atoms, electrons and quarks, and should thus make wrong
predictions. We will see below why that is not the case—why GRWm makes nearly the
same predictions as the quantum formalism.

12.5 The GRW Solution to the Measurement Problem


We will now look at why the GRW process succeeds in solving the measurement problem,
specifically in collapsing macroscopic (but not microscopic) superpositions, and why the
deviations from quantum mechanics are in a sense small.
First, the collapses are supposed to occur spontaneously, just at random, without
the intervention of an outside observer, indeed without any physical cause described by
the theory; GRW is a stochastic theory. Let us look at the number of collapses. The
average waiting time between two collapses is 1/N λ. For a single particle, N = 1, this

76
time is ≈ 1016 sec ≈ 108 years. That is, for a single particle the wave function collapses
only every 100 million years. So we should not expect to see any of these spontaneous
collapses when doing an experiment with a single particle, or even with hundreds of
particles. If, however, we consider a macroscopic system, consisting perhaps of 1023
particles, then the average waiting time is 10−7 sec, so we have a rather dense shower of
collapses.
A collapse amounts to multiplication by a Gaussian with width σ ≈ 10−7 m, which
is large on the atomic scale (recall that the size of an atom is about one Angstrom =
10−10 m) but small on the macroscopic scale. So, if an electron is in a superposition of
being in Paris and being in Tokyo, and if the center X of the collapse lies in Paris,
then the collapse operator has the effect of damping the wave function in Tokyo (which
is roughly 107 m away from Paris) by a factor of exp(1028 ). Thus, after the collapse,
the wave function in Tokyo is very near zero. On the other hand, if a collapse hits an
electron in a bound state in an atom, the collapse will not much affect the electron’s
wave function.
A wave function like the one we encountered in the measurement problem,
X
Ψ= cα Ψα , (12.19)
α

where Ψα is a wave function corresponding to the pointer pointing to the value α, would
behave in the following way. Assuming the pointer contains 1023 particles, then every
10−7 sec a collapse would occur connected to one of the pointer particles. Since Ψα is
concentrated in a region in configuration space where all of the pointer particles are at
some location y α , and assuming that the y α are sufficiently distant for different values of
α (namely much more than σ), a single collapse connected to any of the pointer particles
will suffice for essentially removing all contributions Ψα except one. Indeed, suppose
the collapse is connected to the particle xi , which is one of the pointer particles. Then
the random center X of the collapse will be distributed according to a coarse-grained
version of the i-th marginal of |Ψ|2 ; since the separation between the y α is greater than
σ, we can neglect the coarse graining, and we can just take the i-th marginal of the
|Ψ|2 distribution. Thus, X will be close to one of the y α , and the probability that
X is close to y α0 is |cα0 |2 . Then, the multiplication by a Gaussian centered at X will
shrink all other packets Ψα by big factors, of the order exp(−(y α −y α0 )2 /2σ 2 ), effectively
collapsing them away.
Thus, within a fraction of a second, a superposition such as (12.19) would decay
into one of the packets Ψα (times a normalization factor), and indeed into Ψα0 with
probability |cα0 |2 , the same probability as attributed by quantum mechanics to the
outcome α0 .
Let us make explicit how GRW succeeded in setting up the laws in such a way
that they are effectively different laws for microscopic and macroscopic objects: (i) We
realize that a few collapses (or even a single collapse) acting on a few (or one) of the
pointer particles will collapse the entire wave function Ψ of object and apparatus together
to essentially just one of the contributions Ψα . (ii) The frequency of the collapses
is proportional to the number of particles (which serves as a quantitative measure of

77
“being macroscopic”). (iii) We can’t ensure that microscopic systems experience no
collapses at all, but we can ensure the collapses are Pvery infrequent. (iv) We can’t
ensure that macroscopic superpositions such as Ψ = cα Ψα collapse immediately, but
we can ensure they collapse within a fraction of a second.

12.6 Empirical Tests

104 104

100 ERR 100 ERR


10−4 10−4

10−8 10−8
Adler Adler
−12 −12
10 10
λ [s ]

λ [s ]
GRW GRW
−1

−1
10−16 10−16

10−20 10−20

10−24 10−24

10−28 10−28
PUR PUR
10−32 10−32

10−36 10−36

10−12 10−8 10−4 100 10−12 10−8 10−4 100


(a) σ [m] (b) σ [m]

Figure 12.1: Parameter diagram (log-log-scale) of the GRW theory with the primitive
ontology given by (a) flashes, (b) the matter density function. ERR = empirically refuted
region as of 2012 (equal in (a) and (b)), PUR = philosophically unsatisfactory region.
GRW’s and Adler’s choice of parameters are marked. Figure taken from W. Feldmann
and R. Tumulka: Parameter Diagrams of the GRW and CSL Theories of Wave Function
Collapse. Journal of Physics A: Mathematical and Theoretical 45: 065304 (2012) http:
//arxiv.org/abs/1109.6579

I have pointed out why GRW theory leads to essentially the same probabilities as
prescribed by the quantum formalism. Yet, it is obvious that there are some experiments
for which GRW theory predicts different outcomes than the quantum formalism. Here is
an example. GRW theory predicts that if we keep a particle isolated it will spontaneously
collapse after about 100 million years, and quantum mechanics predicts it will not
collapse. So let’s take 104 electrons, for each of them prepare its wave function to be a
superposition of a packet in Paris and a packet in Tokyo; let’s keep each electron isolated

78
for 100 million years; according to GRW, a fraction of
Z 1/λ Z 1
−λt
λ e dt = e−s ds = 1 − e−1 = 63.2% (12.20)
0 0

of the 104 wave functions will have collapsed; according to quantum mechanics, none
will have collapsed; now let’s bring the packets from Paris and Tokyo together, let
them overlap and observe the interference pattern; according to quantum mechanics, we
should observe a clear interference patterns; if all of the wave functions had collapsed, we
should observe no interference pattern at all; according to GRW, we should observe only
a faint interference pattern, damped (relative to the quantum prediction) by a factor
of e. Ten thousand points should be enough to decide whether the damping factor is
there or not. This example illustrates two things: that in principle GRW makes different
predictions, and that in practice these differences may be difficult to observe (because
of the need to wait for 100 million years, and because of the difficulty with keeping the
electrons isolated for a long time, in particular avoiding decoherence).
Another testable consequence of the GRW process is universal warming. Since the
GRW collapse usually makes wave packets narrower, their Fourier transforms (momen-
tum representation) become wider, by the Heisenberg uncertainty relation. As a ten-
dency, this leads to a long-run increase in energy. This effect amounts to a spontaneous
warming at a rate of the order of 10−15 K per year.
No empirical test of GRW theory against the quantum formalism can presently be
carried out, but experimental techniques are progressing; see Figure 12.1. Adler’s pa-
rameters have in the meantime been empirically refuted as a byproduct of the LIGO
experiment that detects gravitational waves. A test of GRW’s parameters seems fea-
sible using a planned interferometer on a satellite in outer space. Interferometers are
disturbed by the presence of air, temperatures far from absolute zero, vibrations of the
apparatus, and the presence of gravity; that is why being in outer space is an advantage
for an interferometer and allows for heavier objects shot through the double slit and
longer flight times. Such an interferometer is being considered by the European Space
Agency ESA.

12.7 The Need for a Primitive Ontology


Primitive ontology is a subtle philosophical topic.
We may wonder whether, instead of GRWf or GRWm, we could assume that only ψ
exists, and no primitive ontology; let us call this view GRW∅. To illustrate the difference
between GRWf/GRWm and GRW∅, let me make up a creation myth (as a metaphorical
way of speaking): Suppose God wants to create a universe governed by GRW theory.
He creates a wave function ψ of the universe that starts out as a particular ψ0 that
he chose and evolves stochastically according to a particular version of the GRW time
evolution law. According to GRW∅, God is now done. According to GRWf or GRWm,
however, a second act of creation is necessary, in which he creates the matter, i.e., either
the flashes or continously distributed matter with density m, in both cases coupled to
ψ by the appropriate laws.

79
There are several motivations for considering GRW∅. First, it seems more parsimo-
nious than GRWm or GRWf. Second, it was part of the motivation behind GRW theory
to avoid introducing an ontology in additon to ψ. In fact, much of the motivation came
from the measurement problem, which requires that we either modify the Schrödinger
equation or introduce additional ontology (such as Q in Bohmian mechanics), and GRW
theory was intended to choose the first option, not the second.
Furthermore, there is a sense in which GRW∅ clearly works: The GRW wave function
ψt is, at almost all times, concentrated, except for tiny tails, on a set of configurations
that are macroscopically equivalent to each other. So we can read off from the post-
measurement wave function, e.g., what the actual outcome of a quantum measurement
was.
On the other hand, there is a logical gap between saying

“ψ is the wave function of a live cat” (12.21)

and saying
“there is a live cat.” (12.22)
After all, in Bohmian mechanics, (12.22) follows from (12.21) by virtue of a law of the
theory, which asserts that the configuration Q(t) is |ψt |2 distributed at every time t.
Thus, Bohmian mechanics suggests that (12.22) would not follow from (12.21) if there
was not a law connecting the two by means of the primitive ontology. If that is so, then
it does not follow in GRW∅ either. Another indication in this direction is the fact that
the region “PUR” in Figure 12.1 depends on the primitive ontology we consider, GRWf
or GRWm.
Other aspects of the question whether GRW∅ is a satisfactory theory have to do
with a number of paradoxes that arise in GRW∅ but evaporate in GRWf and GRWm.31
For the sake of simplicity, I will focus on GRWm and leave aside GRWf.

Paradox: Here is a reason one might think that the GRW theory fails to solve
the measurement problem. Consider a quantum state like Schrödinger’s cat, namely a
superposition
ψ = c1 ψ1 + c2 ψ2 (12.23)
of two macroscopically distinct states ψi with kψ1 k = 1 = kψ2 k, such that both contri-
butions have nonzero coefficients ci . Given that there is a problem—the measurement
problem—in the case in which the coefficients are equal, one should also think that there
is a problem in the case in which the coefficients are not exactly equal, but roughly of
the same size. One might say that the reason there is a problem is that, according to
quantum mechanics, there is a superposition whereas according to our intuition there
should be a definite state. But then it is hard to see how this problem should go away
just because c2 is much smaller than c1 . How small would c2 have to be for the problem
31
The following discussion is adapted from R. Tumulka: Paradoxes and Primitive Ontology in Col-
lapse Theories of Quantum Mechanics. Pages 139–159 in S. Gao (editor), Collapse of the Wave Func-
tion, Cambridge University Press (2018) http://arxiv.org/abs/1102.5767.

80
to disappear? No matter if c2 = c1 or c2 = c1 /100 or c2 = 10−100 c1 , in each case both
contributions are there. But the only relevant effect of the GRW process replacing the
unitary evolution, as far as Schrödinger’s cat is concerned, is to randomly make one of
the coefficients much smaller than the other (although it also affects the shape of the
suppressed contribution).
Answer: From the point of view of GRWm, the reasoning misses the primitive
ontology. Yes, the wave function is still a superposition, but the definite facts that our
intuition wants can be found in the primitive ontology. The cat is made of m, not of
ψ. If ψ is close to |deadi, then m equals m|deadi up to a small perturbation, and that
can reasonably be accepted as the m function of a dead cat. While the wave function
is a superposition of two packets ψ1 , ψ2 that correspond to two very different kinds
of (particle) configurations in ordinary QM or Bohmian mechanics, there is only one
configuration of the matter density m—the definite fact that our intuition wants.

Paradox: As a variant of the first paradox, one might say that even after the GRW
collapses have pushed |c1 |2 near 1 and |c2 |2 near 0 in the state vector (12.23), there is
still a positive probability |c2 |2 that if we make a quantum measurement of the macro-
state—of whether the cat is dead or alive—we will find the state ψ2 , even though the
GRW state vector has collapsed to a state vector near ψ1 , a state vector that might be
taken to indicate that the cat is really dead (assuming ψ1 = |deadi). Thus, it seems not
justified to say that, when ψ is close to |deadi, the cat is really dead.
Answer: In GRWm, what we mean when saying that the cat is dead is that the m
function looks and behaves like a dead cat. In orthodox QM, one might mean instead
that a quantum measurement of the macro-state would yield |deadi with probability 1.
These two meanings are not exactly equivalent in GRWm: that is because, if m ≈ m|deadi
(so we should say that the cat is dead) and if ψ is close but not exactly equal to |deadi,
then there is still a tiny but non-zero probability that within the next millisecond the
collapses occur in such a way that the cat is suddenly alive! But that does not contradict
the claim that a millisecond before the cat was dead; it only means that GRWm allows
resurrections to occur—with tiny probability! In particular, if we observe the cat after
that millisecond, there is a positive probability that we find it alive (simply because it
is alive) even though before the millisecond it actually was dead.

Paradox: Let ψ1 be the state “the marble is inside the box” and ψ2 the state
“the marble is outside the box”; these wave functions have disjoint supports S1 , S2 in
configuration space (i.e., wherever one is nonzero the other is zero). Let ψ be given
by (12.23) with 0 < |c2 |2  |c1 |2 < 1; finally, consider a system of n (non-interacting)
marbles at time t0 , each with wave function ψ, so that the wave function of the system
is ψ ⊗n . Then for each of the marbles, we would feel entitled to say that it is inside the
box, but on the other hand, the probability that all marbles be found inside the box is
|c1 |2n , which can be made arbitrarily small by making n sufficiently large.
Answer: According to the m function, each of the marbles is inside the box at
the initial time t0 . However, it is known that, if we assume H = 0 for simplicity, a

81
superposition like (12.23) of macroscopically distinct states ψi will converge as t → ∞
under the GRW evolution with probability |c1 |2 to a function ψ1 (∞) concentrated in S1
and with probability |c2 |2 to a function ψ2 (∞) concentrated in S2 .32 Thus, as t → ∞
the initial wave function ψ ⊗n will evolve towards one consisting of approximately n|c1 |2
factors ψ1 (∞) and n|c2 |2 factors ψ2 (∞) for large n, so that ultimately about n|c1 |2 of
the marbles will be inside and about n|c2 |2 outside the box—independently of whether
anybody observes them or not. The occurrence of some factors ψ2 (∞) at a later time
provides another example of the resurrection-type events mentioned earlier; they are
unlikely but do occur, of course, if we make n large enough.
The act of observation plays no role in the argument and can be taken to merely
record pre-existing macroscopic facts. To be sure, the physical interaction involved
in the act of observation may have an effect on the system, such as speeding up the
evolution from ψ towards either ψ1 (∞) or ψ2 (∞); but GRWm provides unambiguous
facts about the marbles also in the absence of observers.

This concludes my discussion of these paradoxes. As a final remark concerning the


primitive ontology, I want to mention an example of an unreasonable choice of primitive
ontology33 : We set up a theory GRWp combining the GRW wave function ψt with a
particle ontology governed by Bohm’s equation of motion. Nobody seriously proposed
this theory, and it makes completely wrong predictions. For example, suppose that ψt− is
the wave function of Schrödinger’s cat, the Bohmian configuration Q lies in the support
of |alivei, and a GRW collapse occurs; since the collapse center is chosen randomly (and
independently of Q), it may well collapse the wave function to near |deadi. Should we
say then that the cat is really alive, as suggested by Q, or really dead, as suggested
by ψt+ ? If we take the primitive ontology seriously, then we should conclude the cat
is alive. However, the collapse has deformed the wave packet of the live cat due to the
slopes of the tails of the Gaussian, and the packet will from now on evolve in a way very
different from a usual live cat. Despite its wrong predictions, this theory is useful to
consider because it illustrates the role of the primitive ontology and that of laws linking
the wave function to the matter.

32
As an idealization, consider instead of Gaussian factors the characteristic functions of S1 and S2 ,
so that the coefficients of the superposition will change with every collapse but not the shape of the
two contributions, ψ1 (∞) = ψ1 and ψ2 (∞) = ψ2 . Although both coefficients will still be nonzero after
any finite number of collapses, one of them will tend to zero in the limit t → ∞.
33
from V. Allori, S. Goldstein, R. Tumulka, and N. Zanghı̀: Predictions and Primitive Ontology
in Quantum Foundations: A Study of Examples. British Journal for the Philosophy of Science 65:
323–352 (2014) http://arxiv.org/abs/1206.0019

82
13 The Copenhagen Interpretation
A very influential view, almost synonymous with the orthodox view of quantum me-
chanics, is the Copenhagen interpretation (CI), named after the research group headed
by Niels Bohr, who was the director of the Institute for Theoretical Physics at the Uni-
versity of Copenhagen, Denmark. Further famous defenders of this view and members
of Bohr’s group (temporarily also working in Copenhagen) include Werner Heisenberg,
Wolfgang Pauli, and Leon Rosenfeld. Bohr and Einstein were antagonists in a debate
about the foundations of quantum mechanics that began around 1925 and continued
until Einstein’s death in 1955. Here is a description of the main elements of CI.

13.1 Two Realms


In CI, the world is separated into two realms: macroscopic and microscopic. In the
macroscopic realm, there are no superpositions. Pointers always point in definite direc-
tions. The macroscopic realm is described by the classical positions and momenta of
objects. In the microscopic realm, there are no definite facts. For example, an electron
does not have a definite position. The microscopic realm is described by wave functions.
One could say that the primitive ontology of CI consists of the macroscopic matter
(described by its classical positions and momenta). In CI terminology, the macroscopic
realm is called classical and the microscopic realm quantum.34 Instead of classical and
quantum, Bell called them speakable and unspeakable. (The macroscopic realm hosts
the objects with definite properties, of which one can speak. Since in ordinary English,
something “unspeakable” is not something nice, you may have gotten the sense that
Bell is not a supporter of the idea of two separate realms.)
The microscopic realm, when isolated, is governed by the Schrödinger equation.
The macroscopic realm, when isolated, is governed by classical mechanics. The two
realms interact whenever a measurement is made; then the macro realm records the
measurement outcome, and the micro realm undergoes a collapse of the wave function.
I see a number of problems with the concept of two separate realms.

• It is not precisely defined where the border between micro and macro lies. That
lies in the nature of the word “macroscopic.” Clearly, an atom is micro and a
table is macro, but what is the exact number of particles required for an object
to be “macroscopic”? The vagueness inherent in the concept of “macroscopic” is
unproblematical in Bohmian mechanics, GRW theory, or classical mechanics, but
it is problematical here because it is involved in the formulation of the laws of
nature. Laws of nature should not be vague.
34
This is a somewhat unfortunate terminology because the word classical suggests not only definite
positions but also particular laws (say, Newton’s equation of motion) which may actually not apply.
The word quantum is somewhat unfortunate as well because in a reductionist view, all laws (also those
governing macroscopic objects) should be consequences of the quantum laws applying to the individual
electrons, quarks, etc.

83
• Likewise, what counts as a measurement and what does not? This ambiguity is
unproblematical when we only want to compute the probabilities of outcomes of
a given experiment because it will not affect the computed probabilities. But an
ambiguity is problematical when it enters the laws of nature.
• The special role played by measurements in the laws according to CI is also implau-
sible and artificial. Even if a precise definition of what counts as a measurement
were given, it would not seem believable that during measurement other laws than
normal are in place.
• The separation of the two realms, without the formulation of laws that apply to
both, is against reductionism. If we think that macro objects are made out of
micro objects, then the separation is problematical.

13.2 Positivism
CI leans towards positivism. In the words of Werner Heisenberg (1958):35
“We can no longer speak of the behavior of the particle independently of the
process of observation.”
Feynman (1962) did not like that:36
“Does this mean that my observations become real only when I observe an
observer observing something as it happens? This is a horrible viewpoint.
Do you seriously entertain the thought that without observer there is no
reality? Which observer? Any observer? Is a fly an observer? Is a star an
observer? Was there no reality before 109 B.C. before life began? Or are
you the observer? Then there is no reality to the world after you are dead?
I know a number of otherwise respectable physicists who have bought life
insurance.”

13.3 Impossibility of Non-Paradoxical Theories


Another traditional part of CI is the claim that it is impossible to provide any coherent
(non-paradoxical) realist theory of what happens in the micro realm. Heisenberg (1958)
again:
“The idea of an objective real world whose smallest parts exist objectively
in the same sense as stones or trees exist, independently of whether or not
we observe them [...], is impossible.”
We know from Bohmian mechanics that this claim is, in fact, wrong.
35
W. Heisenberg: Physics and Philosophy. New York: Harper (1958)
36
Page 14 in R.P. Feynman, F.B. Morinigo, and W.G. Wagner: Feynman Lectures on Gravitation.
Edited by Brian Hatfield. Addison-Wesley (1995). Although printed only in 1995, the lecture was given
in 1962.

84
13.4 Completeness of the Wave Function
In CI, a microscopic system is completely described by its wave function. That is, there
are no further variables (such as Bohm’s particle positions) whose values nature knows
and we do not. For this reason, the wave function is also called the quantum state or
the state vector.

13.5 Language of Measurement


CI introduced (and established) the words “measurement” and “observable,” and em-
phasized the analogy suggested by these words: E.g., that the momentum operator is
analogous to the momentum variable in classical mechanics, and that the spin observable
σ = (σ1 , σ2 , σ3 ) is analogous to the spin vector of classical mechanics (which points along
the axis of spinning, and whose magnitude is proportional to the angular frequency).
I have already mentioned that these two words are quite inappropriate because they
suggest that there was a pre-existing value of the observable A that was merely discov-
ered (i.e., made known to us) in the experiment, whereas in fact the outcome is often
only created during the experiment. Think, for example, of a Stern–Gerlach experiment
in Bohmian mechanics: The particle does not have a value of z-spin before we carry out
the experiment. And in CI, since it insists that wave functions are complete, it is true
in spades that A does not have a pre-existing, well-defined value before the experiment.
So this terminology is even less appropriate in CI—and yet, it is a cornerstone of CI!
Well, CI leans towards paradoxes.

13.6 Complementarity
Another idea of CI, called complementarity, is that in the micro realm, reality is para-
doxical (contradictory) but the contradictions can never be seen (and are therefore not
problematical) because of the Heisenberg uncertainty relation. (Recall Feynman’s dis-
cussion of how the uncertainty relation keeps some things invisible.) Here is Bohr’s
definition of complementarity:

“Any given application of classical concepts precludes the simultaneous use of


other classical concepts which in a different connection are equally necessary
for the elucidation of the phenomena.”

I would describe the idea as follows. In order to compute a quantity of interest (e.g.,
the wave length of light scattered off an electron), we use both Theory A (e.g., classical
theory of billiard balls) and Theory B (e.g., classical theory of waves) although A and
B contradict each other.37 It is impossible to find one Theory C that replaces both A
37
In fact, before 1926 many successful theoretical considerations for predicting the results of exper-
iments proceeded in this way. For example, people made a calculation about the collision between an
electron and a photon as if they were classical billiard balls, then converted the momenta into wave
lengths using de Broglie’s relation p = ~k, then made another calculation about waves with wave
number k.

85
and B and explains the entire physical process. (Here we meet again the impossibility
claim mentioned in Section 13.3.) Instead, we should leave the conflict between A and
B unresolved and accept the idea that reality is paradoxical.
Bell (Speakable and Unspeakable in Quantum Mechanics, page 190) wrote the fol-
lowing about complementarity:
“It seems to me that Bohr used this word with the reverse of its usual
meaning. Consider for example the elephant. From the front she is head,
trunk and two legs. From the back she is bottom, tail, and two legs. From
the sides she is otherwise, and from the top and bottom different again.
These various views are complementary in the usual sense of the word. They
supplement one another, they are consistent with one another, and they are
all entailed by the unifying concept ‘elephant.’ It is my impression that to
suppose Bohr used the word ‘complementary’ in this ordinary way would
have been regarded by him as missing his point and trivializing his thought.
He seems to insist rather that we must use in our analysis elements which
contradict one another, which do not add up to, or derive from, a whole. By
‘complementarity’ he meant, it seems to me, the reverse: contradictoriness.”
Einstein (1949):
“Despite much effort which I have expended on it, I have been unable to
achieve a sharp formulation of Bohr’s principle of complementarity.”
Bell commented (1986):
“What hope then for the rest of us?”

13.7 Complementarity and Non-Commuting Operators


Another version of complementarity concerns observables that cannot be simultaneously
measured. We have encountered this situation in a homework exercise. Compare two
experiments, each consisting of two measurements: (a) first measure σ2 and then σ3 ,
(b) first measure σ3 and then σ2 . We have seen that the joint probability distribution
of the outcomes depends on the order. Some observables, though, can be measured
simultaneously, i.e., the joint distribution does not depend on the order. Examples: X2
and X3 , the y-component of position and the z-component; or σ2 of particle 1 and σ3 of
particle 2.
Theorem 13.1. (Extension of the spectral theorem to several commuting operators) If
and only if A and B commute, then there exists a generalized ONB {φn } whose elements
are eigenvectors of both operators A and B, Aφn = αn φn and Bφn = βn φn .
Sketch of proof. If A commutes with B then B maps the eigenspace of A with eigenvalue
α to itself, so B is block diagonal in any eigen ONB of A; now diagonalize each block.
(Alternative strategy: If A and B commute then also any polynomial(A) with any
polynomial(B), and by continuity any function(A) with any function(B); consider the
characteristic function of an interval containing only one eigenvalue.)

86
Theorem 13.2. Of two observables A and B with discrete spectrum, one is measured
after the other. The joint probability distribution of the outcomes (α, β) is independent
of the order of the two measurements for every wave function if and only if the operators
A and B commute, AB = BA.
P P
Proof. Let A have spectral decomposition A = α αPα , likewise B = β βQβ with Qβ
the projection to the eigenspace of B with eigenvalue β. The joint distribution, if A is
measured first and B thereafter, is kQβ Pα ψk2 .
“if”: Suppose AB = BA. By Theorem 13.1, they can be simultaneously diagonal-
ized, so Pα Qβ = Qβ Pα for all α, β, leading to the same probability of (α, β).
“only if”: Fix α, β. We have that kP Qψk = kQP ψk for all ψ and show that
k(QP − P Q)ψk = 0. Since every ψ can be decomposed as ψ = u + v with Qu = u and
Qv = 0,
k(QP − P Q)(u + v)k2 = hu + v|(QP − P Q)(P Q − QP )|u + vi
= hu + v|(QP Q − QP QP − P QP Q + P QP )|u + vi
= hu|(P − P QP )|ui
+ hu|(−P QP + P QP )|vi
+ hv|(−P QP + P QP )|ui (13.1)
+ hv|P QP |vi
= kP uk2 − kQP uk2 + kQP vk2
= kP uk2 − kP Quk2 + kP Qvk2
= kP uk2 − kP uk2 + 0 = 0.

Example 13.3.    
0 i 0 −i
σ2 σ3 = , σ3 σ2 = . (13.2)
i 0 −i 0
Any two multiplication operators commute. In particular, the position operators Xi ,
Xj commute with each other. The momentum operators Pj = −i~∂/∂xj commute with
each other. Xi commutes with Pj for i 6= j, but
[Xj , Pj ] = i~I , (13.3)
with I the identity operator. Eq. (13.3) is called Heisenberg’s canonical commutation
relation. To verify it, it suffices to consider a function ψ of a 1-dimensional variable x.
Using the product rule,
[X, P ]ψ(x) = XP ψ(x) − P Xψ(x) (13.4)
∂ψ ∂  
= x(−i~) − (−i~) xψ(x) (13.5)
∂x ∂x
∂ψ ∂ψ
= −i~x + i~ψ(x) + i~x (13.6)
∂x ∂x
= i~ψ(x) . (13.7)

87
So, for two commuting observables, the quantum formalism provides a joint proba-
bility distribution. For non-commuting observables, it does not. That is, it provides two
joint probability distributions, one for each order, but that means it does not provide
an unambiguous joint probability distribution. Moreover,
two non-commuting observables typically do not both have sharp values
(13.8)
at the same time.
Also this fact is often called complementarity. For example, there is no quantum state
that is an eigenvector to both σ2 and σ3 . In CI, this fact is understood as a paradox-
ical trait of the micro-realm that we are forced to accept. That this paradoxical trait
is connected to non-commutativity fits nicely with the analogy between operators in
quantum mechanics and quantities in classical mechanics (as described in Section 13.5):
In classical mechanics, which is free of paradoxes, all physical quantities (e.g., positions,
momenta, spin vectors) are just numbers and therefore commute.
As a further consequence of (13.8), a measurement of B must disturb the value of
A if AB 6= BA. (Think of the exercise in which |z-upi underwent a σ2 - and then a σ3 -
measurement: After the σ2 -measurement, the particle was not certain any more to yield
“up” in the σ3 -measurement.) Also the Heisenberg uncertainty relation is connected to
(13.8), as it expresses that position and momentum cannot both have sharp values (i.e.,
σX = 0 and σP = 0) at the same time. In fact, the following generalized version of
Heisenberg’s uncertainty relation applies to observables A and B instead of X and P :

Theorem 13.4. (Robertson–Schrödinger inequality)38 For any bounded self-adjoint op-


erators A, B and any ψ ∈ H with kψk = 1,
1
σA σB ≥ hψ|[A, B]|ψi . (13.9)

2
Note that the inequality is so much the stronger as the commutator [A, B] is bigger,
and becomes vacuous when [A, B] = 0.
Proof. Recall that the distribution over the spectrum of A defined by ψ has expectation
value hAi := hψ|A|ψi and variance

σA2 = hψ|(A − hAi)2 |ψi = kφA k2 (13.10)

with
φA := (A − hAi)ψ , (13.11)
where we simply wrote hAi for hAiI. By the Cauchy-Schwarz inequality,
2
σA2 σB2 = kφA k2 kφB k2 ≥ hφA |φB i . (13.12)
38
H.P. Robertson: The Uncertainty Principle. Physical Review 34: 163–164 (1929)
E. Schrödinger: Zum Heisenbergschen Unschärfeprinzip. Sitzungsberichte der Preussischen Akademie
der Wissenschaften, physikalisch-mathematische Klasse 14: 296–303 (1930)

88
Since

hφA |φB i = hψ|(A − hAi)(B − hBi)|ψi (13.13)


= hψ|(AB − hAiB − AhBi + hAihBi)|ψi (13.14)
= hABi − hAihBi , (13.15)

we obtain that
 2
hφA |φB i 2 ≥ ImhφA |φB i

(13.16)
hφ |φ i − hφ |φ i 2
A B B A
= (13.17)
2i

hABi − hAihBi − hBAi + hBihAi 2
= (13.18)

2

1 2
= hψ|[A, B]|ψi . (13.19)
4

Now let me try to summarize the concept of complementarity. According to key


elements of the Copenhagen view, reality itself is contradictory. That is why there is
no Theory C, no single picture that completely describes reality. At the same time,
we can never observe a contradiction in experiment (e.g., because we can only observe
one of two non-commuting observables). And since we cannot observe contradictions,
the contradictions are somehow not a problem. (Again, this is my understanding of
Bohr.) That is, according to Copenhagen, the situation is like in the cartoon shown in
Figure 13.1.

13.8 Reactions to the Measurement Problem


While Bohmian mechanics, GRW theory, and many-worlds theories have clear answers
to the measurement problem, this is not so with Copenhagen. I report some answers
that I heard Copenhagenists give (with some comments in brackets); I must admit that
I do not see how these answers would make the problem go away.

• Nobody can actually solve the Schrödinger equation for 1023 interacting particles.
(Sure, and we do not need to. If Ψα looks like a state including
P a pointer pointing
to α then we know by linearity that Ψt1 evolves to Ψt2 = cα Ψα , a superposition
of macroscopically different states.)

• Systems are never isolated. (If we cannot solve the problem for an isolated system,
what hope can we have to treat a non-isolated one? The way you usually treat a
non-isolated system is by regarding it as a subsystem of a bigger, isolated system,
maybe the entire universe.)

89
Figure 13.1: According to the Copenhagen view, we never see the paradoxical thing
happen. But we see traces showing that it must have happened. Drawing by Charles
Addams

• Maybe there is no wave function of the universe. (It is up to Copenhagenists to


propose a formulation that applies to the entire universe. Bohm, GRW, and many-
worlds can do that. If, according to Copenhagen, there is nothing other than the
wave function, and if even the wave function does not exist for the universe, then
what is the complete description of the universe?)

• Who knows whether the initial wave function is really a product as in Ψt1 = ψ ⊗ φ.
(It is not so important that it is precisely a product, as long as we can perform a
quantum measurement on ψs that are non-trivial superpositions of eigenvectors.
Note
P that if Ψ(t1 ) is approximately but not exactlyP equal to ψ ⊗ φ with ψ =
cα ψα , then Ψ(t2 ) is still approximately equal to cα Ψα (t2 ), so it is still a
non-trivial superposition of contributions corresponding to different outcomes.)

• The collapse of the wave function is like the collapse of a probability distribution:
as soon as I have more information, such as X ∈ B, I have to update my probability
distribution ρt− for X accordingly, namely to

ρt+ (x) = 1x∈B ρt− (x) . (13.20)

(The parallel is indeed striking. However, if we insist that the wave function is
complete, then there never is any new information, as there is nothing that we are
ignorant of.)

90
P
• Decoherence makes sure that you can replace the superposition Ψ = cα Ψα by
a mixture [i.e., a random one of the Ψα ]. (A super-observer cannot distinguish
between the superposition and the mixture, but we are asking whether in reality
it is a superposition or a mixture; see Section 11.3.)

91
14 Many Worlds
Put very briefly, Everett’s many-worlds theory is GRW∅ with λ = 0, and Schrödinger’s
many-worlds theory is GRWm with λ = 0.
The motivation for the many-worlds view comes from the wave function (11.3) of
object and apparatus together after a quantum measurement. It is a superposition of
macroscopically different terms. If we insist that the Schrödinger equation is correct
(and thus reject non-linear modifications such as GRW), and if we insist that the wave
function is complete, then we must conclude that there are different parts of reality,
each looking like our world but with a different measurement outcome, and without
any interaction between the different parts. They are parallel worlds. This view was
suggested by Hugh Everett III in 1957.39
Everett’s is not the only many-worlds theory, though. It is less well known that also
Schrödinger had a many-worlds theory in 1926, and it is useful to compare the two.40
Schrödinger, however, did not realize that his proposal was a many-worlds theory. He
thought of it as a single-world theory. He came to the conclusion that it was empirically
inadequate and abandoned it. Let us first try to get a good understanding of this theory.

14.1 Schrödinger’s Many-Worlds Theory


According to Schrödinger’s 1926 theory, matter is distributed continuously in space with
density
N Z
X 2
m(x, t) = mi d3 x1 · · · d3 xN δ 3 (xi − x) ψt (x1 , . . . , xN ) , (14.1)
i=1
R3N

and ψt evolves according to the Schrödinger equation. The equation for m is exactly
the same as in GRWm, except that ψ is not the same wave function. (Actually, Schrö-
dinger replaced the mass factor mi by the electric charge ei , but this difference is not
crucial. It amounts to a different choice of weights in the weighted average over i. In
fact, Schrödinger’s choice has the disadvantage that the different signs of charges will
lead to partial cancellations and thus to an m function that looks less plausible as the
density of matter. Nevertheless, the two choices turn out to be empirically equivalent,
i.e., lead to the same predictions.)
39
H. Everett: The Theory of the Universal Wavefunction. Ph. D. thesis, Department of Physics,
Princeton University (1955). Reprinted on page 3–140 in B. DeWitt and R.N. Graham (editors): The
Many-Worlds Interpretation of Quantum Mechanics. Princeton: University Press (1973)
H. Everett: Relative State Formulation of Quantum Mechanics. Reviews of Modern Physics 29:
454–462 (1957)
40
E. Schrödinger: Quantisierung als Eigenwertproblem (Vierte Mitteilung). Annalen der Physik 81:
109–139 (1926). English translation by J.F. Shearer in E. Schrödinger: Collected Papers on Wave
Mechanics. New York: Chelsea (1927).
See also V. Allori, S. Goldstein, R. Tumulka, and N. Zanghı̀: Many-Worlds and Schrödinger’s First
Quantum Theory. British Journal for the Philosophy of Science 62(1): 1–27 (2011) http://arxiv.
org/abs/0903.2211

92
In analogy to GRWm, we may call this theory Sm (where S is for the Schrödin-
ger equation). Consider a double-slit experiment in this theory. Before the arrival at
the detection screen, the contribution to the m function coming from the electron sent
through the double slit (which is the only contribution in the region of space between
the double-slit and the detection screen) is a lump of matter smeared out over rather
large distances (as large as the interference pattern). This lump is not homogeneous, it
has interference fringes. And the overall amount of matter in this lump is tiny: If you
integrate m(x, t) over x in the region between the double-slit and the detection screen,
the result is 10−30 kg, the mass of an electron. But focus now on the fact that the
matter is spread out. Schrödinger incorrectly thought that this fact must lead to the
wrong prediction that the entire detection screen should glow faintly instead of yielding
one bright spot, and that was why he thought Sm was empirically inadequate.
To understand why this reasoning was incorrect, consider a post-measurement situ-
ation (e.g., Schrödinger’s
P cat). The wave function is a superposition of macroscopically
different terms, Ψ = α cα Ψα . The Ψα do not overlap; i.e., where one Ψα is significantly
nonzero, the others are near zero. Thus, when we compute |Ψ|2 there are no (significant)
cross terms; that is, for each q there is at most one α contributing, so

|Ψ(q)|2 = |cα |2 |Ψα (q)|2 . (14.2)

Define mα (x) as what m would be according to (14.1) with ψ = Ψα . Then we obtain


(to an excellent degree of approximation)
X
m(x) = |cα |2 mα (x) . (14.3)
α

In words, the m function is a linear combination of m functions corresponding to the


macroscopically different terms in Ψ. So, for Schrödinger’s cat in Sm, there is a dead
cat and there is a live cat, each with half the mass. However, they do not notice they
have only half the mass, and they do not notice the presence of the other cat. That is
because, if we let the time evolve, then each Ψα (t) evolves in a way that corresponds
to a reasonable story of just one cat; after all, it is how the wave function would evolve
according to the projection postulate after a measurement P of the cat had collapsed the
superposition to one of the Ψα . Furthermore, Ψ(t) = α cα Ψα (t) by linearity, and since
the Ψα (t) remain non-overlapping, we have that (14.3) applies to every t from now on,
that is X
m(x, t) = |cα |2 mα (x, t) . (14.4)
α

Each mα (t) looks like the reasonable story of just one cat that Ψα (t) corresponds to.
Thus, the two cats do not interact with each other; they are causally disconnected. After
all, the two contributions mα come from Ψα that are normally thought of as alternative
outcomes of the experiment. So the two cats are like ghosts to each other: they can see
and walk through each other.
And not just the cat has split in two. If a camera takes a photograph of the cat
then Ψ must be taken to be a wave function of the cat and the camera together (among

93
other things). Ψ1 may then correspond to a dead cat and a photo of a dead cat, Ψ2 to
a live cat and a photo of a live cat. If a human being interacts with the cat (say, looks
at it), then Ψ1 will correspond to a brain state of seeing a dead cat and Ψ2 to one of
seeing a live cat. That is, there are two copies of the cat, two copies of the photo, two
copies of the human being, two copies of the entire world. That is why I said that Sm
has a many-worlds character. In each world, though, things seem rather ordinary: Like
a single cat in an ordinary (though possibly pitiful) state, and all records and memories
are consistent with each other and in agreement with the state of the cat.

14.2 Everett’s Many-Worlds Theory


Everett’s many-worlds theory, which could be called S∅ (S for the Schrödinger equation
and ∅ for the empty primitive ontology) is based on the idea that the same picture
would arise if we dispense with the m function. Frankly, I do not see how it would; I
actually cannot make sense of S∅ as a physical theory. Some authors argue that it has
a problem with how to obtain probabilities, but I would say the more basic problem is
how to obtain things such as cats, chairs, pointers. For S∅, we would have to assume a
re-interpretation of ordinary language, that the statement “there is a live cat” does not
actually refer to a thing called a cat but is really a statement about the wave function,
really expressing that there is the kind of wave packet that a live cat would have. In
short, the primitive ontology is missing in S∅. And that problem is solved in Sm. Note,
though, that for a person who believes that S∅ makes sense, this theory would seem like
the simplest possible coherent theory that would account for quantum mechanics. To
such a person it would seem that the existence of many worlds is a necessary consequence
of the Schrödinger
P equation, which, after all, leads to macroscopic superpositions such as
the Ψ = α cα Ψα above. In contrast, a person who believes that S∅ does not make sense
while Sm does, will not have such a sense of necessity, as the many-worlds character
of the theory does not come from Ψ but from m, and if we had postulated a different
primitive ontology (say, Bohmian particles instead of (14.1)), then no many-worlds
character would have arisen.
While there is disagreement in the literature about the relevance of a primitive
ontology, many authors argue that S∅ has a preferred basis problem: If there exists
nothing more than Ψ, and if Ψ is just a vector in Hilbert space H , then how do we
know which basis to choose in H to obtain the different worlds? For example, if

Ψ= √1 |deadi + √1 |alivei , (14.5)


2 2

then we could also write


iπ/4
e√ e−iπ/4
Ψ= 2
|+i + √
2
|−i , (14.6)
where
|+i = √1 |deadi + √i |alivei , |−i = √1 |deadi − √i |alivei (14.7)
2 2 2 2

form another ONB of the subspace spanned by |deadi and |alivei. So how do we know
that the two worlds correspond to |deadi and |alivei rather than to |+i and |−i? Ob-

94
viously, in Sm there is no such problem because a preferred basis (the position basis) is
built into the law (14.1) for m.
It is sometimes objected against many-worlds hypotheses that one cannot observe
the other worlds. I must admit I do not see why that could be an objection. Sm
correctly predicts (and so does S∅, if it works at all) that any inhabitant of one world
cannot observe the other worlds, so it is not a question of making a wrong prediction.
The existence of other worlds, whether we can see them or not, is a mathematical
consequence of the Schrödinger equation and the law (14.1) for the matter density.
In fact, the existence of wave packets cα Ψα in Ψ that in some sense look like these
other worlds is a consequence of the Schrödinger equation alone, so it seems inevitable
unless we modify the Schrödinger equation as in collapse theories. By decoherence,
macroscopically disjoint packets stay macroscopically disjoint, so it seems we will have
to accept that the wave function of the universe or of macroscopic systems tends to
split more and more, resulting in a tree-like shape of its significant support in time ×
configuration space as shown schematically in Figure 14.1. The macroscopically different
packets Ψα are therefore also often called branches of Ψ.

R3N

Figure 14.1: Qualitative picture of the tree-like structure typically featured by wave
functions of macroscopic objects such as measurement apparatus. Of the 3N dimensions
of configuration space, only one is shown. The shaded area consists of those (q, t) where
Ψ(q, t) is significantly non-zero.

14.3 Bell’s First Many-Worlds Theory


Bell also made a proposal (first formulated in 1971, published41 in 1981) adding a prim-
itive ontology to Everett’s S∅; Bell did not seriously propose or defend the resulting
theory, he just regarded it as an ontological clarification of Everett’s theory. According
to this theory, at every time t there exists an uncountably infinite collection of universes,
each of which consists of N material points in Euclidean 3-space. Thus, each world has
its own configuration Q, but some configurations are more frequent in the ensemble of
41
J.S. Bell: Quantum Mechanics for Cosmologists. Pages 611–637 in C. Isham, R. Penrose and
D. Sciama (editors), Quantum Gravity 2, Oxford: Clarendon Press (1981). Reprinted as chapter 15 of
J.S. Bell: Speakable and Unspeakable in Quantum Mechanics. Cambridge: University Press (1987)

95
worlds than others, with |Ψt |2 distribution across the ensemble. At every other time t0 ,
there is again an infinite collection of worlds, but there is no fact about which world at
t0 is the same as which world at t.

14.4 Bell’s Second Many-Worlds Theory


Another variant of this theory, considered by Bell in 1976,42 supposes that there is really
a single world at every time t consisting of N material points in Euclidean 3-space. The
configuration Qt chosen with |Ψt |2 distribution indepedently at every time. Although
this theory has a definite Qt at every t, it also has a many-worlds character because in
every arbitrarily short time interval, configurations from all over configuration space are
realized, in fact with distribution roughly equal to |Ψt |2 (if the interval is short enough
and Ψt depends continuously on t) across the ensemble of worlds existing at different
times. This theory seems rather implausible compared to Bohmian mechanics, as it
implies that our memories are completely wrong: after all, it implies that one minute
ago the world was not at all like what we remember it to be like a minute ago. Given that
all of our reasons for believing in the Schrödinger equation and the Born rule are based
on memories of reported outcomes of experiments, it seems that this theory undercuts
itself: if we believe it is true then we should conclude that our belief is not justified.
It is not very clear to me whether the same objection applies to Bell’s first many-
worlds theory. But certainly, both theories have, due to their radically unusual idea of
what reality is like, a flavor of skeptical scenarios (such as the brain in the vat), in fact
a stronger such flavor than Sm.

14.5 Probabilities in Many-World Theories


Maudlin expressed in his article on the measurement problem a rather negative opinion
about many-worlds theories; I think a bit too negative. His objection was, if every
outcome α of an experiment is realized, what could it mean to say that outcome α has
probability |cα |2 to occur? If, as in Sm and in S∅, all the equations are deterministic,
then there is nothing random; and in the situation of the measurement problem, there
is nothing that we are ignorant of. So what could talk of probability mean?
Here is what it could mean in Sm: Suppose we have a way of counting worlds.
And suppose we repeat a quantum experiment (say, a Stern–Gerlach experiment with
|cup |2 = |cdown |2 = 1/2) many times (say, a thousand times). Then we obtain in each
world a sequence of 1000 ups and downs such as
↑↓↑↑↓↑↓↓↓ . . . . (14.8)
Note that there are 21000 ≈ 10300 such sequences. The statement that the fraction of
ups lies between 47% and 53% is true in some worlds and false in others. Now count
42
J.S. Bell: The Measurement Theory of Everett and de Broglie’s Pilot Wave. Pages 11–17 in M. Flato
et al. (editors): Quantum Mechanics, Determinism, Causality, and Particles, Dordrecht: Reidel (1976).
Reprinted as chapter 11 of J.S. Bell: Speakable and Unspeakable in Quantum Mechanics. Cambridge:
University Press (1987)

96
the worlds in which the statement is true. Suppose that the statement is true in the
overwhelming majority of worlds. Then that would explain why we find ourselves in
such a world. And that, in turn, would explain why we observe a relative frequency
of ups of about 50%. And that is what we needed to explain for justifying the use of
probabilities.
Now consider |cup |2 = 1/3, |cdown |2 = 2/3. Then the argument might seem to break
down, because it is then still true that in the overwhelming majority of sequences such
as (14.8) the frequency of ups is about 50%. But consider the following

Rule for counting worlds. The “fractionPof worlds” f (P ) with property P in the
splitting given by Ψ = α cα Ψα and m(x) = α |cα |2 mα (x) is
P

X
f (P ) = |cα |2 , (14.9)
α∈M

where M is the set of worlds α with property P .

Note that f (P ) lies between 0 and 1 because α |cα |2 = 1. It is not so clear whether
P
this rule makes sense—whether there is room in physics for such a law. But let us
accept it for the moment and see what follows. Consider the property P that the
relative frequency of ups lies between 30% and 36%. Then f (P ) is actually the same
value as the probability of obtaining a frequency of ups between 30% an 36% in 1000
consecutive independent random tossings of a biased coin with P(up) = 1/3 and P(down)
= 2/3. And in fact, this value is very close to 1. Thus, the above rule for counting worlds
implies the frequency of ups lies between 30% and 36% in the overwhelming majority
of worlds. This reasoning was essentially developed by Everett.
A comparison with Bohmian mechanics is useful. The initial configuration of the
lab determines the precise sequence such as (14.8). If the initial configuration is chosen
with |Ψ0 |2 distribution, then with overwhelming probability the sequence will have a
fraction of ups between 30% and 36%. That is, if we count initial conditions with
the |Ψ0 |2 distribution,R that is, if we say that the fraction of initial conditions lying
in a set B ⊆ R3N is B |Ψ0 |2 , then we can say that for the overwhelming majority of
Bohmian worlds, the observed frequency is about 33%. Now to make the connection
with many-worlds, note that the reasoning does not depend, in fact, on whether all of
the worlds are realized or just one. That is, imagine many Bohmian worlds with the
same initial wave function Ψ0 but different initial configurations, distributed across the
ensemble according to |Ψ0 |2 . Then there is an explanation for why inhabitants should
see a frequency of about 33%.
The problem that remains is whether there is room for a rule for counting worlds.
In terms of a creation myth, suppose God created the wave function Ψ and made it a
law that Ψ evolves according to the Schrödinger equation; then he created matter in
3-space distributed with density m(x, t) and made it a law that m is given by (14.1).
Now what would God need to do in order to make the rule for counting worlds a law?
He does not create anything further, so in which way would two universes with equal Ψ

97
and m but different rules for counting worlds differ? That is a reason for thinking that
ultimately, Sm fails to work (though in quite a subtle way).
Various authors have proposed other reasonings for justifying probabilities in many-
worlds theories; they seem less relevant to me, but let me mention a few. David
Deutsch43 proposed that it is rational for inhabitants of a universe governed by a many-
worlds theory (a “multiverse,” as it is often called) to behave as if the events they
perceive were random with probabilities given by the Born rule; he proposed certain
principles of rational behavior from which he derived this. (Of course, this reasoning
does not provide an explanation of why we observe frequencies in agreement with Born’s
rule.) Lev Vaidman44 proposed that in a many-worlds scenario, I can be ignorant of
which world I am in: before the measurement, I know that there will be a copy of me
in each post-measurement world, and afterwards, I do not know which world I am in
until I look at the pointer position. And I could try to express my ignorance through
a probability distribution, although it is not clear why (and in what sense) the Born
distribution would be “correct” and other distributions would not.
For comparison, in Bell’s many-worlds theories it is not hard to make sense of prob-
abilities. In Bell’s first theory, there is an ensemble of worlds at every time t, and clearly
most of the worlds have configurations that look as if randomly chosen with |Ψ|2 distri-
bution, in particular with a frequency of ups near 33% in the example described earlier.
In Bell’s second theory, Qt is actually random with |Ψt |2 distribution, and although the
recorded sequence of outcomes fluctuates within every fraction of a second, the sequence
in our memories and records at time t has, with probability near 1, a frequency of ups
near 33%.

43
D. Deutsch: Quantum theory of probability and decisions. Proceedings of the Royal Society of
London A 455: 3129–3137 (1999) http://arxiv.org/abs/quant-ph/9906015
44
L. Vaidman: On Schizophrenic Experiences of the Neutron or Why We should Believe in the
Many-Worlds Interpretation of Quantum Theory. International Studies in the Philosophy of Science
12: 245–261 (1998) http://arxiv.org/abs/quant-ph/9609006

98
15 Special Topics
15.1 The Mach–Zehnder Interferometer
A variant of the double-slit experiment is the Mach–Zehnder interferometer, developed
around 1892, an arrangement used for experiments with photons, although in principle
it could also be set up for use with electrons. Like the double slit, it involves splitting
the wave packet in two, having the two packets travel along different paths, and then
making them overlap again and interfere, see Figure 15.1; together with the double slit,
such experiments are called two-way or which-way experiments. In practice, the paths
are usually between millimeters and meters long.

D1

M2 D2
BS2

BS1
S M1

Figure 15.1: Design of a Mach–Zehnder experiment. Wave packets travel along grey
paths in the direction indicated. S = source, BS = beam splitter, M = mirror, D =
detector.
A beam splitter is a potential barrier with height and width so adjusted that, by
the tunnel effect (see Section 7.4), an incoming wave packet with approximate wave
number k will be half reflected and half transmitted; that is, a normalized incoming
wave packet ψin evolves to crefl ψrefl + ctransm ψtransm , and the reflection
√ coefficient crefl
and the transmission coefficient ctransm both have modulus 1/ 2. For photons, beam
splitters are realized as thin layers of metal (“half-silvered mirror”) or resin, usually on
top of a glass plate or in between two glass plates (the potential is higher inside the
metal than in glass or air).
The following further properties of a beam splitter, which can be regarded as con-
sequences of the Schrödinger equation for a 1d potential barrier, are relevant to the
Mach–Zehnder experiment. First, the reflected and transmitted packet have, up to the
sign, the same pretty sharp wave number k as the incoming packet. Second, by symme-
try of the potential V (−x) = V (x) (taking the middle of the barrier as the origin), the
behavior is symmetric under the parity transformation x → −x represented by the parity
operator P ψ(x) = ψ(−x): specifically, P ψin evolves to crefl P ψrefl + ctransm P ψtransm . Since
ψrefl and ψtransm have the same wave number up to the sign, move at the same speed, and
were generated during the same time interval, ψrefl = P ψtransm to a good degree of ap-

proximation; likewise, ψin = ψrefl , provided the shape of ψin is symmetric, and provided
we consider the right instant of time. Third, by time reversal, c∗refl ψrefl∗
+ c∗transm ψtransm

99

evolves to ψin . That is, if we send in two packets, one from each side, that have the same
absolute wave number and arrive at the same time, then only one packet comes out,
leaving to the left—at least, if the phases of the coefficients ci are prepared in the right
way. After all, c∗refl P ψrefl

+ c∗transm P ψtransm
∗ ∗
evolves to P ψin , a packet leaving to the right.
So, depending on the phase difference of two wave packets arriving at a beam splitter,
either only one packet comes out towards the left, or only one packet to the right, or
two packets (one to the left and one to the right). And this effect can be regarded as
(constructive or destructive) interference of the transmitted part coming from the left
with the reflected part coming from the right.
The phase difference can be influenced in a practical way by shifting a wave packet
slightly; after all, eik(x+∆x) = eik∆x eikx , corresponding to a change of phase by k∆x.
Thus, if the two paths (via M1 or M2 ) have the same length, only one packet will come
out of BS2 leaving towards D1 . If one of the paths is longer than the other by half a
wave length (phase change π), only one packet will come out of BS2 leaving towards
D2 . If one of the paths is longer than the other by a quarter wave length (phase change
π/2), then two packets of equal magnitude will come out of BS2 , one towards D1 and
one towards D2 . In this way, the setup can be used for detecting small changes in the
path lengths (similar principles are used for detecting gravitational waves) or changes
in potentials (or, for photons, refraction index) somewhere along one of the paths.
If one of the paths is blocked, then only one packet arrives at BS2 , and two packets
of equal magnitude come out of it. The situation is analogous to the double slit, with
D1 corresponding to a maximum and D2 to a minimum of the interference pattern. If
none of the paths is blocked (and they have the same length and no further potentials),
then D2 never clicks.

15.2 Path Integrals


Path integrals are a way of computing the time evolution operators Ut ; they arise as
follows. Consider a unitary 1-parameter group Ut = exp(−iHt/~), for simplicity on a
finite-dimensional Hilbert space H = Cd and just at times t that are multiples of a
(small) time step τ > 0, t = nτ with n ∈ Z. Set U := Uτ . Then Ut = Unτ = U n , and the
matrix elementsPof this power can be expressed through repeated matrix multiplication
2
as in (A )ik = j Aij Ajk :

d
X
n
(U )in i0 = Uin in−1 · · · Ui2 i1 Ui1 i0 . (15.1)
i1 ...in−1 =1

Now think of the sequence (i0 , i1 , . . . , in−1 , in ) as a path in the set {1, . . . , d} as in
Figure 15.2.
To think of it as a path will be particularly natural if Uij is small unless i and j are
“close” to each other (in a sense to be determined). Basically, Eq. (15.1) is already a
path integral, except that the integral is in this case a sum: it is a sum over all possible
paths connecting a given value of i0 to a given value of in . Now, we want to let τ → 0

100
t

..
.


..
.

i
1 2 3 ··· d − 1 d

Figure 15.2: Example of a path in {1, . . . , d}

while keeping t fixed, and we want to let H approach L2 (R3N ), say H = L2 (Ω) with Ω
a finite set approaching R3N , for example Ω = εZ3N ∩ BR (0) in the limit ε → 0, R → ∞.
Then i0 and in get replaced by two points q1 , q2 ∈ R3N , (U n )i0 in becomes hq2 |Ut |q1 i, the
3N
sum becomes an integral over R all smooth functions q : [0, t] → R with q(0) = q1
and q(t) = q2 ; let us write Dq for this kind of integral. I report that, although less
obviously, the integrand also converges to a simple expression, as discovered by Feynman
in 1942:45 to exp(iS[q]/~) with the so-called classical action functional
Z t h
i
S[q] := dt0 m2 q̇(t0 )2 − V q(t0 ) , (15.2)
0

where q̇ means the derivative of t 7→ q(t). Thus,


Z
hq2 |Ut |q1 i = Dq eiS[q]/~ . (15.3)

People sometimes say that this formula shows or suggests that “the particle takes all
paths from q1 to q2 ,” but I find this statement incomprehensible and without basis. To
begin with, the path t 7→ q(t) is not the path of the particle configuration, as would be
the t 7→ Q(t) in Bohmian mechanics. In fact, since the formula expresses Ut and thus
45
R. P. Feynman: Space-time approach to non-relativistic quantum mechanics. Reviews of Modern
Physics 20: 367–387 (1948)

101
the time evolution of ψ, it is the wave ψ, not the particle, that follows q(t). And for a
wave it is not strange to follow many paths. Note also that one can just as well re-write
Maxwell’s equations of classical electrodynamics, as in fact any linear field equation, in
terms of path integrals, but nobody would claim that in classical electrodynamics, any
particle “takes all paths.” People sometimes also talk as if path integrals meant that the
particle took a random path. Of course, Bohmian mechanics involves a random path,
but its distribution is concentrated on a 3N -dimensional set of paths, while here people
mean a distribution spread out over all paths. However, the expression Dq exp(iS[q]/~)
is a complex measure, not a probability measure, and it does not appear here in the role
of a probability measure but in an expression for Ut .
I should also mention that Eq. (15.3) is not rigorously true because, strictly speaking,
there is no volume measure such as Dq in infinite-dimensional spaces (such as the space
of all smooth paths from q1 to q2 ). Nevertheless, various computations have successfully
used (15.3), and mathematicians have come up with several techniques to get around
the difficulty.

15.3 Point Interaction


A potential given by a Dirac delta function, for example

V (x) = λ δ d (x) (15.4)

for a single particle in d dimensions with real (positive or negative) prefactor λ, would
represent an interaction, of the quantum particle with another particle fixed at the ori-
gin, that occurs only at contact between the two particles. It is called point interaction
or zero-range interaction. It is not obvious that a potential like that makes mathe-
matical sense, i.e., that a self-adjoint operator H exists in L2 (Rd ) that corresponds to
−(~2 /2m)∆ + V with (15.4). It turns out46 that no such operator exists in dimension
d ≥ 4.
In dimension d = 1, we can reason as follows. Suppose ψ is an eigenfunction of H,
Hψ = Eψ, that is,
~2
− 2m ∆ψ(x) + λ δ(x) ψ(x) = E ψ(x) . (15.5)
Integrate this relation over x from −ε to +ε for small ε > 0 to obtain
h i Z ε
~2 0 0
− 2m ψ (ε) − ψ (−ε) + λ ψ(0) = E dx ψ(x) (15.6)
−ε

with ψ 0 the derivative of ψ. Taking the limit ε → 0 and assuming that ψ is bounded
near 0, the right-hand side vanishes, so

ψ 0 (0+) − ψ 0 (0−) = 2mλ


~2
ψ(0) . (15.7)
46
See, e.g., S. Albeverio, F. Gesztesy, R. Høegh-Krohn, and H. Holden: Solvable models in quantum
mechanics. Berlin: Springer-Verlag (1988).

102
That is, ψ 0 has a jump discontinuity at 0 of height given by the right-hand side. Con-
~2
versely, assuming (15.7) while ψ 00 exists everywhere except at 0, then − 2m ∆ψ consists of
~2 0 0
a Dirac delta peak − 2m [ψ (0+) − ψ (0−)]δ(x) = −λψ(0) δ(x) at the origin and a regular
~2
function everywhere else, with the consequences that Hψ(x) = − 2m ∆ψ(x) + λ δ(x) ψ(x)
is a regular function (for suitable ψ, a square-integrable one). Mathematicians say that
the domain D ⊂ L2 (R) of the Hamiltonian consists of functions obeying (15.7), and
H maps D to L2 (R); D is a dense subspace. Condition (15.7) is called a boundary
condition (regarding 0 as the common boundary of the positive and negative half-axis).
Without giving details, I report that in d = 3 dimensions, the potential (15.4) makes
sense (i.e., admits a self-adjoint Hamiltonian) for λ of the form λ = η + αη 2 with
infinitesimal η and α ∈ R. The domain of H then consists of functions satisfying the
Bethe–Peierls boundary condition at the origin,
h  i
lim ∂r rψ(rω) + αrψ(rω) = 0 (15.8)
r&0

for all unit vectors ω ∈ R3 . Put differently, if we can expand ψ in powers of r = |x| as

X
ψ(rω) = c−1 (ω) r−1 + cn (ω) rn , (15.9)
n=0

then (15.8) demands that


c0 (ω) + αc−1 (ω) = 0 . (15.10)
In particular, for α 6= 0 and c0 6= 0, it follows that c−1 6= 0, so ψ is forced to diverge at
the origin like 1/r.

103
16 The Einstein–Podolsky–Rosen Argument
In the literature, the “EPR paradox” is often mentioned. It is clear from EPR’s article
that they did not intend to describe a paradox (as did, e.g., Wheeler when describing
the delayed-choice experiment), but rather to describe an argument. The argument
supports the following

Claim: There are additional variables beyond the wave function.

I now explain their reasoning in my own words, partly in preparation for Bell’s 1964
argument, which builds on EPR’s argument.

16.1 The EPR Argument


EPR considered 2 particles in 1 dimension with entangled wave function

Ψ(x1 , x2 ) = δ(x1 − x2 + x0 ) , (16.1)

with x0 a constant. (We ignore the fact that this wave function is unphysical because it
does not lie in Hilbert space; the same argument could be made with square-integrable
functions but would become less transparent.) An observer, let us call her Alice, mea-
sures the position of particle 1. The outcome X1 is uniformly distributed, and the wave
function collapses to

Ψ0 (x1 , x2 ) = δ(x1 − X1 )δ(x2 − X1 − x0 ) , (16.2)

so that another observer, Bob, measuring the position of particle 2, is certain to obtain
X2 = X1 + x0 . It follows that particle 2 had a position even before Bob made his
experiment. Now EPR make the assumption that
“no real change can take place in the second system in
(16.3)
consequence of [a measurement on] the first system.”
This assumption is a special case of locality. They took it as obviously true, but it is wor-
thy of a closer examination; we will come back to it in the next chapter. It then follows
that particle 2 had a definite position even before Alice made her experiment, despite
the fact that Ψ is not an eigenfunction of x2 -position. Quod erat demonstrandum.

16.2 Further Conclusions


EPR draw further conclusions from their example by considering also momentum. Note
that the Fourier transform of Ψ is
b 1 , k2 ) = e−ik1 x0 δ(k1 + k2 ) .
Ψ(k (16.4)

Alice could measure either the position or the momentum of particle 1, and Bob either
the position or the momentum of particle 2. If Alice measures position then, as seen

104
above, the outcome X1 is uniformly distributed and Bob, if he chooses to measure
position, finds X2 = X1 + x0 with certainty. If, alternatively, Alice measures momentum
then the outcome K1 will be uniformly distributed and the wave function in momentum
representation collapses from Ψ
b to

b 00 (k1 , k2 ) = e−iK1 x0 δ(k1 − K1 ) δ(k2 + K1 )


Ψ (16.5)

so that Bob, if he chooses to measure momentum, is certain to find K2 = −K1 . In


the same way as above, it follows that Bob’s particle had a position before any of the
experiments, and that it had a momentum!
There even arises a way of simultaneously measuring the position and momentum of
particle 2: Alice measures position X1 and Bob momentum K2 . Since particle 2 has, as
just proved, a well-defined position and a well-defined momentum, and since, by (16.3),
Alice’s measurement did not influence particle 2, K2 must be the original momentum
of particle 2. Likewise, if Bob had chosen to measure position, his result would have
agreed with the original position, and since it would have obeyed X2 = X1 + x0 , we can
infer from Alice’s result what the original position must have been.

16.3 Bohm’s Version of the EPR Argument Using Spin


In 1951, before he discovered Bohmian mechanics, Bohm wrote a textbook about quan-
tum mechanics in which he followed the orthodox view. In it, he also described the
following useful variant of the EPR argument, sometimes called the EPRB experiment
(B for Bohm).
Consider two spin- 21 particles with joint spinor in C4 given by the singlet state
 
1
φ = √ |z-upi|z-downi − |z-downi|z-upi . (16.6)
2
Alice measures σ3 on particle 1. The outcome Z1 is ±1, each with probability 1/2. If
Z1 = +1 then the wave function collapses to

φ0+ = |z-upi|z-downi , (16.7)

and Bob, measuring σ3 on particle 2, is certain to obtain Z2 = −1. If, however, Z1 = −1


then the wave function collapses to

φ0− = |z-downi|z-upi , (16.8)

and Bob is certain to obtain Z2 = +1. Thus, always Z2 = −Z1 ; one speaks of perfect
anti-correlation. As a consequence, particle 2 had a definite value of z-spin even before
Bob’s experiment. Now, from the assumption (16.3) it follows that it had that value
even before Alice’s experiment. Likewise, particle 1 had a definite value of z-spin before
any attempt to measure it.
Again as in EPR’s reasoning, we can consider other observables, say σ1 and σ2 . In
homework Exercise 30 of Assignment 7, we checked that the singlet state has the same

105
form relative to the x-spin basis or the y-spin basis. It follows that if Alice and Bob both
measure x-spin then their outcomes are also perfectly anti-correlated, and likewise for
y-spin. It can be inferred that each spin component, for each particle, has a well-defined
value before any experiment.
Moreover, Alice and Bob together can measure σ1 and σ3 of particle 2: Alice measures
σ1 of particle 1 and Bob σ3 of particle 2. By (16.3) and the perfect anti-correlation, the
negative of Alice’s outcome is what Bob would have obtained had he measured σ1 ; and
by (16.3), Bob’s outcome is not affected by Alice’s experiment.

16.4 Einstein’s Boxes Argument


We have seen that EPR’s argument yields more than just the incompleteness of the
wave function. It also yields that particles have well-defined positions and momenta.
If we only want to establish the incompleteness of the wave function, which seems like
a worthwhile goal for a proof, a simpler argument will do. Einstein developed such an
argument already in 1927 (before the EPR paper), presented it at a conference but never
published it.47
Consider a single particle whose wave function ψ(x) is confined to a box B with
impermeable walls and (more or less) uniform in B. Now split B (e.g., by inserting a
partition) into two boxes B1 and B2 , move one box to Tokyo and the other to Paris.
There is some nonzero amount of the particle’s wave function in Paris and some in
Tokyo. Carry out a detection in Paris. Let us assume that
no real change can take place in Tokyo in consequence
(16.9)
of a measurement in Paris.
(Again a special case of locality.) If we believed that the wave function was a complete
description of reality, then there would be no fact of the matter, before the detection
experiment, about whether the particle is in Paris or Tokyo, but afterwards there would
be. This contradicts (16.9), so the wave function cannot be complete.
The assumption (16.9) is intended as allowing changes in Tokyo after a while, such
as the while it would take a signal to travel from Paris to Tokyo at the speed of light.
That is, (16.9) (and similarly (16.3)) is particularly motivated by the theory of relativity,
which strongly suggests that signals cannot propagate faster than at the speed of light.
On one occasion, Einstein wrote that the faster-than-light effect entailed by insisting
on completeness of the wave function was “spukhafte Fernwirkung” (spooky action-at-
a-distance).

16.5 Too Good To Be True


EPR’s argument is, in fact, correct. Nevertheless, it may strike you that its conclusion,
the incompleteness of the wave function, is very strong—maybe too strong to be true.
47
It has been reported by, e.g., L. de Broglie: The Current Interpretation of Wave Mechanics: A
Critical Study. Elsevier (1964). A more detailed discussion is given by T. Norsen: Einstein’s Boxes,
American Journal of Physics 73(2): 164–176 (2005) http://arxiv.org/abs/quant-ph/0404016

106
After all, it is not true in GRW or many-worlds! How can this be: that EPR proved
something that is not true?
This can happen only because the assumption (16.3) is actually not true in these
theories. And in Bohmian mechanics, where the wave function is in fact incomplete,
it is not true that all spin observables have pre-existing actual values, as would follow
from EPR’s reasoning. Thus, also in Bohmian mechanics (16.3) is not true. We will see
in the next chapter that (16.3) is problematical in any version of quantum mechanics.
This fact was discovered 30 years after EPR’s paper by John Bell.

107
17 Proof of Nonlocality
Two space-time points x = (s, x) and y = (t, y) are called spacelike separated iff no
signal propagating at the speed of light can reach x from y or y from x. This occurs iff

|x − y| > c|s − t| , (17.1)

with c = 3 × 108 m/s the speed of light. Einstein’s theory of relativity strongly suggests
that signals cannot propagate faster than at the speed of light (superluminally). That
is, if x and y are spacelike separated then no signal can be sent from x to y or from y
to x. This in turn suggests that
If x and y are spacelike separated then events at x do
(17.2)
not influence events at y.
This statement is called locality. It is true in relativistic versions of classical physics
(mechanics, electrodynamics, and also in Einstein’s relativistic theory of gravity he
called the general theory of relativity). Bell proved in 1964 a result often called Bell’s
theorem:48
Locality is sometimes false if certain empirical predic-
(17.3)
tions of the quantum formalism are correct.
The relevant predictions have since been experimentally confirmed; the first convincing
tests were carried out by Alain Aspect in 1982.49 Thus, locality is false in our world; this
fact is often called quantum nonlocality. Our main goal in this chapter is to understand
Bell’s proof.
Some remarks.

• Einstein believed in locality until his death in 1955. The EPR assumption (16.3)
is a special case of locality: If Alice’s measurement takes place at x and Bob’s
at y, and if x and y are spacelike separated, then locality implies that Alice’s
measurement on particle 1 at x cannot affect particle 2 at y. Conversely, the
only situation in which we can be certain that the two particles cannot interact
occurs if Alice’s and Bob’s experiments are spacelike separated and locality holds
true. Ironically, EPR were wrong even though their argument was correct: The
premise (16.3) is false. They took locality for granted. Likewise in Einstein’s boxes
argument, the assumption (16.9) is a special case of locality: The point of talking
about Tokyo and Paris is that these two places are distant, and since there clearly
can be influences if we allow more time than distance/c, the assumption is that
there cannot be an influence between spacelike separated events.
48
J. S. Bell: On the Einstein-Podolsky-Rosen Paradox. Physics 1: 195–200 (1964) Reprinted as
chapter 2 of J. S. Bell: Speakable and unspeakable in quantum mechanics. Cambridge University Press
(1987)
49
A. Aspect, J. Dalibard, G. Roger: Experimental Test of Bell’s Inequalities using Time-Varying
Analyzers. Physical Review Letters 49: 1804–1807 (1982)

108
• Despite nonlocality, it is not possible to send messages faster than light, according
to the appropriate relativistic version of the quantum formalism; this fact is often
called the no-signaling theorem. We will prove it in great generality in Section 22.5.
Put differently, the superluminal influences cannot be used by agents for sending
messages.

• Does nonlocality prove relativity wrong? That statement would be too strong.
Nonlocality proves a certain understanding of relativity wrong. Much of relativity
theory, however, remains untouched by nonlocality.

• If x and y are spacelike separated then relativistic Hamiltonians contain no inter-


action term between x and y.
Let me explain this statement. The Schrödinger equation is non-relativistic and
needs to be replaced, in a relativistic theory, by a relativistic equation. The latter
is different from the non-relativistic Schrödinger equation in two ways: (i) Instead
of interaction potentials, interaction arises from the creation and annihilation of
particles. For example, an electron can create a photon, which travels to another
electron and is annihilated there. Potentials can only be used as an approximation.
(ii) Even leaving interaction aside, relativity requires a modification of the Schrö-
dinger equation. The best known such modification is the Dirac equation for
electrons. It entails that the wave function can propagate no faster than at the
speed of light c. Since also photon wave functions propagate no faster than at c,
and since potentials are absent, there is no interaction term in the Hamiltonian
between particles at x and at y.
So there are two meanings to the word “interaction”: first, an interaction term in
the Hamiltonian; second, any influence. Bell’s proof shows that in the absence of
the first type of interaction, the second type can still be present.

• Bell’s proof shows for a certain experiment that either events at x must have
influenced events at y or vice versa, but does not tell us who influenced whom.

17.1 Bell’s Experiment


As in Bohm’s version of the EPR example, consider two spin- 21 particles in the singlet
state  
1
φ = √ |z-upi|z-downi − |z-downi|z-upi . (17.4)
2
While keeping their spinor constant, the two particles are brought to distant places.
Alice makes an experiment on particle 1 at (or near) space-time point x and Bob one on
particle 2 at y; x and y are spacelike separated. Each experimenter chooses a direction
in space, corresponding to a unit vector n ∈ R3 , and carries out a Stern–Gerlach exper-
iment in that direction, i.e., a quantum measurement of n · σ. The difference to Bohm’s
example is that Alice and Bob can choose different directions. I write α for Alice’s unit

109
vector, β for Bob’s, Z 1 for the random outcome ±1 of Alice’s experiment, and Z 2 for
that of Bob’s. Let us compute the joint distribution µα,β of Z 1 and Z 2 .

Fact 1. For any unit vector n ∈ R3 ,


 
φ ∝ |n-upi|n-downi − |n-downi|n-upi . (17.5)

Proof. There is a unique operator Π : C2 ⊗ C2 → C2 ⊗ C2 such that Π(φ ⊗ χ) = χ ⊗ φ


for all φ, χ ∈ C2 ; in fact,

Π ↑↑ = ↑↑ , Π ↑↓ = ↓↑ , Π ↓↑ = ↑↓ , Π ↓↓ = ↓↓ , (17.6)

where ↑ means z-up etc. Let us call Π the permutation operator. An element ψ of
C2 ⊗ C2 is called anti-symmetric iff Πψ = −ψ. The anti-symmetric elements form a
subspace A of C2 ⊗C2 . It has dimension 1 because ψ is anti-symmetric iff its components
ψ↑↑ etc. relative to the basis mentioned in (17.6) satisfy

ψ↑↑ = −ψ↑↑ , ψ↑↓ = −ψ↓↑ , ψ↓↑ = −ψ↑↓ , ψ↓↓ = −ψ↓↓ , (17.7)

and the solutions of these equations are exactly (ψ↑↑ , ψ↑↓ , ψ↓↑ , ψ↓↓ ) = (0, c, −c, 0) with
arbitrary c ∈ C.
Now the vectors on both sides of (17.5) are clearly anti-symmetric, so they must
both lie in A , so they can only differ by a scalar factor.
Fact 2. Independently of whether Alice’s or Bob’s experiment occurs first, the joint
distribution of Z 1 , Z 2 is
 
P(up,up) P(up,down)
µα,β := (17.8)
P(down,up) P(down,down)
1 1 1 1
!
4
− 4
α · β 4
+ 4
α · β
= 1 1 1 1
(17.9)
4
+ 4
α · β 4
− 4
α · β
1 2 1 2
!
2
sin (θ/2) 2
cos (θ/2)
= 1 2 1 2
, (17.10)
2
cos (θ/2) 2
sin (θ/2)

with θ the angle between α and β.

Proof. Assume that Alice’s experiment occurs first and write the initial spinor as

φ = c|α-upi|α-downi − c|α-downi|α-upi (17.11)



with c a complex constant with |c| = 1/ 2. According to Born’s rule, Alice obtains +1
or −1, each with probability 1/2. In case Z 1 = +1, φ collapses to

φ0+ = |α-upi|α-downi . (17.12)

110
According to Born’s rule, the probability that Bob obtains Z 2 = +1 is
2 2
P(Z 2 = +1|Z 1 = +1) = hβ-up|α-downi = 1 − hβ-up|α-upi . (17.13)

Since the angle in Hilbert space between |β-upi and |α-upi is half the angle between β
and α, and since they are unit vectors in Hilbert space, we have that

hβ-up|α-upi = cos(θ/2) (17.14)

and thus
P(Z 2 = +1|Z 1 = +1) = 1 − cos2 (θ/2) = sin2 (θ/2) (17.15)
and
1 2
P(Z 1 = +1, Z 2 = +1) = sin (θ/2) . (17.16)
2
1
Since cos2 x = 2
+ 12 cos(2x), this value can be rewritten as

1 1 1 1 1 1 1
P(Z 1 = +1, Z 2 = +1) = − cos2 (θ/2) = − − cos θ = − α · β . (17.17)
2 2 2 4 4 4 4
The other three matrix elements can be computed in the same way. Assuming that
Bob’s experiment occurs first leads to the same matrix.

Remarks.
• Note that the four entries in µα,β are nonnegative and add up to 1, as they should.

• In the case α = β corresponding to Bohm’s version of the EPR example,


!
0 12
µα,α = 1 , (17.18)
2
0

implying the perfect anti-correlation Z 2 = −Z 1 .

• The marginal distribution is the distribution of Z 1 alone, irrespective of Z 2 . It


is 1/2, 1/2. Likewise for Z 2 . Let us assume that Alice’s experiment occurs first.
Then the fact that the marginal distribution for Z 2 is 1/2, 1/2 amounts to a no-
signalling theorem for Bell’s experiment: Bob cannot infer from Z 2 any information
about Alice’s choice α because the distribution of Z 2 does not depend on α. (The
general no-signaling theorem that we will prove in Section 22.5 covers all possible
experiments.)

• The fact that the joint distribution of the outcomes does not depend on the order
of experiments means that the observables measured by Alice and Bob can be
simultaneously measured. What are these observables, actually? Alice’s is the
matrix σα ⊗ I with components σαs1 s01 δs2 s02 , and Bob’s is I ⊗ σβ with components
δs1 s01 σβs2 s02 .

111
17.2 Bell’s 1964 Proof of Nonlocality
Let us recapitulate what needs to be shown in Bell’s theorem. The claim is that the
joint distribution µα,β of Z 1 and Z 2 , as a function of α and β, is such that it cannot
be created in a local way (i.e., in the absence of influences) if no information about α
and β is available beforehand. We can also put it this way: it is impossible for two
computers A and B to be set up in such a way that, upon input of α into A and β into
B, A produces a random number Z 1 and B Z 2 with joint distribution µα,β if A and B
cannot communicate (while they can use prepared random bits that both have copies
of).50 To put this yet differently, two suspects interrogated separately by police cannot
provide answers Z 1 and Z 2 with distribution µα,β when asked the questions α and β,
no matter which prior agreement they took.
Bell’s proof involves two parts. The first part is the EPR argument (in Bohm’s ver-
sion), applied to all directions α; it shows that if locality is true then the values of Z 1
and Z 2 must have been determined in advance. Thus, in every run of the experiment,
there exist well-defined values Zα1 for every α and Zα2 = −Zα1 even before any measure-
ment. Moreover, Alice’s outcome will be Zα1 for the α she chooses; also Bob’s outcome
will be Zβ2 = −Zβ1 for the β he chooses, also if β 6= α and independently of whether
Alice’s or Bob’s experiment occurs first. (Put differently, the two suspects must have
agreed in advance on the answer to every possible question.)
In other words, locality implies the existence of random variables Zαi , i = 1, 2 and
|α| = 1, such that Alice’s outcome is Zα1 and Bob’s is Zβ2 . In particular, focusing on
components in only 3 directions a, b and c, locality implies the existence of 6 random
variables Zαi , i = 1, 2, α = a, b, c such that
Zαi = ±1 (17.19)
Zα1 = −Zα2 (17.20)
and, more generally,
P(Zα1 6= Zβ2 ) = qαβ , (17.21)
2
where the qαβ = µα,β (+−)+µα,β (−+) = (1+α·β)/2 = cos (θ/2) are the corresponding
quantum mechanical probabilities.
The second part of the proof involves only very elementary mathematics. Clearly,
P {Za1 = Zb1 } ∪ {Zb1 = Zc1 } ∪ {Zc1 = Za1 } = 1 ,

(17.22)
since at least two of the three (2-valued) variables Zα1 must have the same value. Hence,
by elementary probability theory,
P Za1 = Zb1 + P Zb1 = Zc1 + P Zc1 = Za1 ≥ 1,
  
(17.23)
and using the perfect anti-correlations (17.20) we have that
P Za1 = −Zb2 + P Zb1 = −Zc2 + P Zc1 = −Za2 ≥ 1.
  
(17.24)
50
This statement is perhaps a bit less general than Bell’s theorem because computers always work
in either a deterministic or a stochastic way, while Bell’s theorem would apply even to a theory, if it
exists, that is neither deterministic nor stochastic.

112
(17.24) is equivalent to the celebrated Bell inequality. It is incompatible with (17.21).
For example, when the angles between a, b and c are 120◦ , the 3 relevant qαβ are all
1/4, implying a value of 3/4 for the left hand side of (17.24).

17.3 Bell’s 1976 Proof of Nonlocality


Here is a different proof of nonlocality, first published by Bell in 1976;51 it is also
described in Bell’s article “Bertlmann’s socks.” It was designed for the purpose of
allowing small experimental errors in all probabilities, so that the perfect anti-correlation
in the case θ = 0 becomes merely a near-perfect anti-correlation, and the conclusion of
pre-existing values cannot be drawn.52
Suppose that two computers produce outcomes Z 1 , Z 2 , each either +1 or −1, with
joint distribution P(Z 1 , Z 2 |α, β) when given the input α respectively β. Let λ be the
information given in advance to both computers, such as an algorithm and random bits,
and let ρ be the probability distribution of λ. Then
Z
P(Z , Z |α, β) = dλ ρ(λ) P(Z 1 , Z 2 |α, β, λ) ,
1 2
(17.25)

where the last factor is the conditional distribution of the outcomes, given λ.
What is the condition on P that characterizes the absence of communication? Sup-
pose computer 1 makes its decision about Z 1 first. In the absence of communication,
it has only λ and α as the basis of its decision (which may still be random); thus, the
(marginal) distribution of Z 1 does not depend on β:

P(Z 1 |α, β, λ) = P(Z 1 |α, λ) . (17.26)

Computer 2 has only λ and β as the basis of its decision; thus, the (conditional) distri-
bution of Z 2 does not depend on α or Z 1 :

P(Z 2 |Z 1 , α, β, λ) = P(Z 2 |β, λ) . (17.27)

From these two equations together, we obtain

P(Z 1 , Z 2 |α, β, λ) = P(Z 1 |α, λ) P(Z 2 |β, λ) (17.28)

as the characterization of locality (i.e., the absence of communication). Note that Z 1


and Z 2 can very well be dependent (correlated), like Bertlmann’s socks or the glove
left at home and the glove in my pocket, if the mutual dependence is based on their
dependence on the common cause λ.
51
J. S. Bell: The theory of local beables. Epistemological Letters 9: 11 (1976)
52
The advantage of robustness of the argument under small errors comes at the price that the argu-
ment needs to assume that the true theory of quantum mechanics is either deterministic or stochastic.
I am unable to provide an example of a theory that is neither, but some authors (e.g., John H. Con-
way and Simon Kochen) have conjectured that the true laws of nature be neither; and Bell’s original
nonlocality proof, presented in Section 17.2 above, would apply also in that case.

113
Now we want to know how the locality condition (17.28) restricts the possibility of
functions to occur as P(Z 1 , Z 2 |α, β). To this end, we introduce the correlation coefficient
defined by X X
κ(α, β) = z1 z2 P(Z 1 = z1 , Z 2 = z2 |α, β) . (17.29)
z1 =±1 z2 =±1

Proposition 17.1. Locality implies the following version of Bell’s inequality known as
the CHSH inequality53 :

κ(α, β) + κ(α, β 0 ) + κ(α0 , β) − κ(α0 , β 0 ) ≤ 2 . (17.30)

Proof. Locality (17.28) implies that


Z X X
κ(α, β) = dλ ρ(λ) z1 z2 P(Z 1 = z1 , Z 2 = z2 |α, β, λ) (17.31)
z1 =±1 z2 =±1
Z X X
= dλ ρ(λ) z1 z2 P(Z 1 |α, λ)P(Z 2 |β, λ) (17.32)
z1 =±1 z2 =±1
Z
= dλ ρ(λ) E(Z 1 |α, λ) E(Z 2 |β, λ) (17.33)

Since Z i ∈ {1, −1}, we have that



1 2
|α, λ) ≤ 1 and |β, λ) ≤ 1. (17.34)

E(Z E(Z

So,
Z  
0 1 2 2 0
κ(α, β) ± κ(α, β ) = dλ ρ(λ) E(Z |α, λ) E(Z |β, λ) ± E(Z |β , λ) (17.35)

Z
0
2 2

≤ dλ ρ(λ) E(Z |β, λ) ± E(Z |β , λ) .
(17.36)

Now for any u, v ∈ [−1, 1],


|u + v| + |u − v| ≤ 2 (17.37)
because

(u + v) + (u − v) = 2u ≤ 2 (−u − v) + (u − v) = −2v ≤ 2 (17.38)


(u + v) + (v − u) = 2v ≤ 2 (−u − v) + (v − u) = −2u ≤ 2 . (17.39)
53
This version (though with a different derivation making stronger assumptions) first appeared in
J. F. Clauser, R. A. Holt, M. A. Horne, A. Shimony: Proposed Experiment to Test Local Hidden-
Variable Theories. Physical Review Letters 23: 880–884 (1969)

114
Hence, setting u = E(Z 2 |β, λ) and v = E(Z 2 |β 0 , λ),

0 0 0 0
κ(α, β) + κ(α, β ) + κ(α , β) − κ(α , β )


≤ κ(α, β) + κ(α, β 0 ) + κ(α0 , β) − κ(α0 , β 0 ) (17.40)

(17.36)
Z  
2 2 0 2 2 0
≤ dλ ρ(λ) E(Z |β, λ) + E(Z |β , λ) + E(Z |β, λ) − E(Z |β , λ) (17.41)

(17.37)
≤ 2. (17.42)

Since the quantum mechanical prediction µα,β for the Bell experiment has

κ(α, β) = µα,β (++) − µα,β (+−) − µα,β (−+) + µα,β (−−) = −α · β = − cos θ , (17.43)

setting (in some plane)

α = 0◦ , α0 = 90◦ , β = 45◦ , β 0 = −45◦ (17.44)

leads to √
κ(α, β) + κ(α, β 0 ) + κ(α0 , β) − κ(α0 , β 0 ) = −2 2 , (17.45)
violating (17.30).
Now if the values of P(Z 1 = z1 , Z 2 = z2 |α, β) are known only with some inaccuracy
(because they were obtained experimentally, not from the quantum formalism) then also
the κ(α, β) are subject to some inaccuracy. But if (17.30) is violated by more than the
inaccuracy, then locality is refuted.

17.4 Photons
Experimental tests of Bell’s inequality are usually done with photons instead of electrons.
For photons, spin is usually called polarization, and the Stern–Gerlach magnets are
replaced with polarization analyzers (also known as polarizers), i.e., crystals that are
transparent to the |z-upi part of the wave but reflect (or absorb) the |z-downi part.
Like the Stern–Gerlach magnets, the analyzers can be rotated into any direction. Since
photons have spin 1, θ/2 needs to be replaced by θ.

115
18 Discussion of Nonlocality
18.1 Nonlocality in Bohmian Mechanics, GRW, Copenhagen,
Many-Worlds
Since we have considered only non-relativistic formulations of these theories, we cannot
directly analyze spacelike separated events, but instead we can analyze the case of two
systems (e.g., Alice’s lab and Bob’s lab) without interaction (i.e., without an interaction
term between them in the Hamiltonian).

• Bohmian mechanics is explicitly nonlocal, as the velocity of particle 2 depends


on the position of particle 1, no matter how distant and no matter whether there is
interaction. That is where the superluminal influence occurs. (Historically, Bell’s
nonlocality analysis was inspired by the examination of Bohmian mechanics.)
This influence depends on entanglement: In the absence of entanglement, the ve-
locity of particle 2 is independent of the position of particle 1. The fact that
Bohmian mechanics is local for disentangled wave functions shows that it was nec-
essary for proving non-locality to consider at least two particles and an entangled
wave function (such as the singlet state). It can be shown that any entangled wave
function violates Bell’s inequality for some observables.
Furthermore, the position of particle 1 will depend on the external fields at work
near particle 1. That is, for any given initial position of particle 1, its later position
will depend on the external fields. An example of an external field is the field of
the Stern–Gerlach magnet. To a large extent, we can control external fields at our
whim; e.g., we can rotate the Stern–Gerlach magnet. Bohm’s equation of motion
implies that these fields have an instantaneous influence on the motion of particle
2.

• In GRW theory, nonlocality comes in at the point when the wave function
collapses, as then it does so instantaneously over arbitrary distances.
At least, this trait of the theory suggests that GRW is nonlocal, and in fact that is
the ultimate source of the nonlocality. Strictly speaking, however, the definition of
nonlocality, i.e., the negation of (17.2), requires that events at x and at y influence
each other, and the value of the wave function ψt (x1 , x2 ) is linked to several space-
time points, (t, x1 ) and (t, x2 ), and thus is not an example of an “event at x.”
So we need to formulate the proof that GRW theory is nonlocal more carefully;
of course, Bell’s proof achieves this, but we can give a more direct proof. Since
the “events at x” are not given by the wave function itself but by the primitive
ontology, we need to consider GRWf and GRWm separately.
In GRWf, consider Einstein’s boxes example. The wave function of a particle
is half in a box in Paris and half in a box in Tokyo. Let us apply detectors to
both boxes at time t, and consider the macroscopic superposition of the detectors
arising from the Schrödinger equation. It is random whether the first flash (in

116
any detector) after t occurs in Paris or in Tokyo. Suppose it occurs in Tokyo, and
suppose it can occur in one of two places in Tokyo, corresponding to the outcomes
0 or 1. If it was 1, then after the collapse the wave function of the particle is
100% in Tokyo, and later flashes in Paris are certain to occur in a place where
they indicate the outcome 0—that is a nonlocal influence of a flash in Tokyo on
the flashes in Paris.
Likewise in GRWm: If, after the first collapse, the pointer of the detector in Tokyo,
according to the m function, points to 1 then the pointer in Paris immediately
points to 0. (You might object that the Tokyo pointer position according to
the m function was not the cause of the Paris pointer position, but rather both
pointer positions were caused by the collapse of the wave function. However, this
distinction is not relevant to whether the theory is nonlocal.)
Note that while Bell’s proof shows that any version of quantum mechanics must
be nonlocal, for proving that GRWf and GRWm are nonlocal it is sufficient to
consider a simpler situation, that of Einstein’s boxes.
Both GRWf and GRWm are already nonlocal when governing a universe containing
only one particle; thus, their nonlocality does not depend on the existence of a
macroscopic number of particles, and they are even nonlocal in a case (one particle)
in which Bohmian mechanics is local. For example, consider a particle with wave
function
1 
ψ = √ |herei + |therei (18.1)
2
at time t, as in Einstein’s boxes example. Suppose that |herei and |therei are
two narrow wave packets separated by a distance of 500 million light years. The
distance is so large that the first collapse is likely to occur before a light signal can
travel between the two places. For GRWf, a flash here precludes a flash there—
that is a nonlocal influence. For GRWm, if the wave function collapses to |herei
then m(here) doubles and m(there) instantaneously goes to zero—that is a nonlo-
cal influence. (There is a relativistic version of GRWm54 in which m(there) goes to
zero only after a delay of distance/c, or when a collapse centered “there” occurs.
Nevertheless, also this theory is nonlocal even for one particle because when a col-
lapse centered “there” occurs, which can happen any time, then m(there) cannot
double (as it could in a local theory) but must go to zero.)

• That orthodox quantum mechanics (OQM) is nonlocal can also be seen from
Einstein’s boxes argument: OQM says the outcomes of the detectors are not pre-
determined. (That is, there is no fact about where the particle really is before
any detectors are applied.) Thus, the outcome of the Tokyo detector must have
influenced the Paris detector, or vice versa.
54
D. Bedingham, D. Dürr, G.C. Ghirardi, S. Goldstein, R. Tumulka, and N. Zanghı̀: Matter Density
and Relativistic Models of Wave Function Collapse. Journal of Statistical Physics 154: 623–631 (2014)
http://arxiv.org/abs/1111.1425

117
This, of course, was the point of Einstein’s boxes argument: He objected to OQM
because it is nonlocal.

• Many-worlds is nonlocal, too. This is not obvious from Bell’s argument because
the latter is formulated in a single-world framework. Here is why Sm is nonlocal.55
After Alice carries out her Stern–Gerlach experiment, there are two pointers in her
lab, one pointing to +1 and the other to −1. Then Bob carries out his experiment,
and there are two pointers in his lab. Suppose Bob chose the same direction as
Alice. Then the world in which Alice’s pointer points to +1 is the same world as
the one in which Bob’s pointer points to −1, and this nonlocal fact was created
in a nonlocal way by Bob’s experiment. The same kind of nonlocality occurs in
Sm already in Einstein’s boxes experiment: The world in which a particle was
detected in Paris is the same as the one in which no particle was detected in
Tokyo—a nonlocal fact that arises as soon as both experiments are completed,
without the need to wait for the time it takes light to travel from Paris to Tokyo.
How about Bell’s many-worlds theories? The second theory, involving a random
configuration selected independently at every time, is very clearly nonlocal, for
example in Einstein’s boxes experiment: At every time t, nature makes a random
decision about whether the particle is in Paris, and if it is, nature ensures imme-
diately that there is no particle in Tokyo. A local theory would require that the
particle has a continuous history of traveling, at a speed less than that of light,
to either Paris or Tokyo, and this history is missing in Bell’s second many-worlds
theory. Bell’s first many-worlds theory is even more radical, in fact in such a way
that the concept of locality is not even applicable. The concept of locality requires
that at every point in space, there are local variables whose changes propagate at
most at the speed of light. Since in Bell’s first many-worlds theory, no association
is made between worlds at different times, one cannot even ask how any local
variables would change with time. Thus, this theory is nonlocal as well.

Another remark concerns the connection between Bell’s 1976 nonlocality proof and
the theories mentioned above. In physical theories, λ represents the information located
at all space-time points from which light signals can reach both x and y. In orthodox
quantum mechanics and GRW theory, λ is the wave function ψ; in Bohmian mechanics,
λ is ψ together with the initial configuration of the two particles.

18.2 Popular Myths About Bell’s Proof


Let P be the hypothesis that, prior to any experiment, there exist values Zni (for all
i = 1, 2 and n ∈ R3 with |n| = 1) such that Alice and Bob obtain as outcomes Zα1 and
Zβ2 . These values are often called hidden variables. Then Bell’s nonlocality argument,
55
The argument is taken from V. Allori, S. Goldstein, R. Tumulka, and N. Zanghı̀: Many-Worlds
and Schrödinger’s First Quantum Theory. British Journal for the Philosophy of Science 62(1): 1–27
(2011) http://arxiv.org/abs/0903.2211

118
described in Section 17.2, has the following structure:

Part 1 (EPR): quantum mechanics + locality ⇒ P (18.2)


Part 2: quantum mechanics ⇒ not P (18.3)
Conclusion: quantum mechanics ⇒ not locality (18.4)

For this argument what is relevant about “quantum mechanics” is merely the predictions
concerning experimental outcomes corresponding to (17.19)–(17.21) (with part 1 using
in fact only (17.20)).
Certain popular myths about Bell’s proof arise from missing part 1 and noticing only
part 2 of the argument. (In Bell’s 1964 paper, part 1 is formulated in 3 lines, part 2 in
2.5 pages.) Bell, Speakable and unspeakable, p. 143:
It is important to note that to the limited degree to which determinism plays
a role in the EPR argument, it is not assumed but inferred. What is held
sacred is the principle of ‘local causality’ – or ‘no action at a distance’. [. . . ]
It is remarkably difficult to get this point across, that determinism is not a
presupposition of the analysis.
Here, “determinism” means P. What Bell writes about the EPR argument is true in
spades about his own nonlocality argument: P plays a “limited role” because it is only
an auxiliary statement, and non-P is not the upshot of the argument.
The mistake of missing part 1 leads to the impression that Bell proved that

hidden variables are impossible, (18.5)

or that
hidden variables, while perhaps possible, must be nonlocal. (18.6)
These claims are still widespread, and were even more common in the 20th century.56
They are convenient for Copenhagenists, who tend to think that coherent theories of
the microscopic realm are impossible (see Section 13.3). Let me explain what is wrong
about (18.5) and (18.6).
Statement (18.5) is plainly wrong, since a deterministic hidden-variables theory exists
and works, namely Bohmian mechanics. The hidden variables that Bohmian mechanics
provides57 for the Bell experiment are of the form Zα,β i
, as the outcome according to
Bohmian mechanics depends on both parameter choices (at least for one i, namely for the
second Stern–Gerlach experiment). Considering the three directions relevant to Bell’s
i
inequality, the Zα,β are 18 random variables instead of 6 Zαi , and the dependence on
both α and β reflects the nonlocality of Bohmian mechanics. Bell did not establish the
impossibility of a deterministic reformulation of quantum theory, nor did he ever claim
to have done so.
56
For example, recall the title of Clauser et al.’s paper: Proposed Experiment to Test Local Hidden-
Variable Theories. Other authors claimed that Bell’s argument excludes “local realism.”
57
We assume a fixed temporal order of the two spin measurements, and that each is carried out as a
Stern–Gerlach experiment.

119
Statement (18.6) is true and non-trivial but nonetheless rather misleading. It follows
from (18.2) and (18.3) that any (single-world) account of quantum phenomena must be
nonlocal, not just any hidden-variables account. Bell’s argument shows that nonlocality
is implied by the predictions of standard quantum theory itself. Thus, if nature is
governed by these predictions (as has been confirmed in experiment), then nature is
nonlocal.

18.3 Bohr’s Reply to EPR


Let us go back once more to EPR. Bohr wrote a reply to EPR, which was published the
same year in the same journal under the same title as EPR’s paper.58 Before we look
at it, let us pause for a moment and think about what kind of reply would be possible.
EPR argued, assuming locality, that the wave function is incomplete. Given that Bohr
insisted that the wave function was complete, he had two options: either argue that
there is a mistake in EPR’s argument, or deny locality, the premise of EPR’s argument.
It is hard to make sense of Bohr’s reply. It is even hard to say which of the two
options he chose. Here is what he wrote. He referred to the following sentence of EPR
that expresses the version of locality that they used in their argument:

“If, without in any way disturbing a system, we can predict with certainty
(i.e., with probability equal to unity) the value of a physical quantity, then
there exists an element of physical reality corresponding to this physical
quantity.”

About this, Bohr wrote:

“the wording of the above-mentioned criterion of physical reality proposed by


Einstein, Podolsky and Rosen contains an ambiguity as regards the meaning
of the expression “without in any way disturbing a system.” Of course there
is in a case like that just considered no question of a mechanical disturbance
of the system under investigation during the last critical stage of the measur-
ing procedure. But even at this stage there is essentially the question of an
influence on the very conditions which define the possible types of predictions
regarding the future behavior of the system.” [emphasis in the original]

Clearly, an ambiguity in terminology can easily lead to a mistake in an argument: if you


show that A implies B, if you then change the meaning of a word in B so B becomes B 0 ,
and if you then show that B 0 implies C, then you have not shown that A implies C. On
the other hand, somebody could point to an ambiguity to express that the hypothesis is
less plausible than EPR thought. In any case, what is the ambiguity? Bohr offered two
readings of the assumption of locality. Reading 1, which Bohr agreed with, says that
there is no “mechanical disturbance of the system,” which sounds very much like saying
that there is no change in the physical state of particle 2 as a consequence of Alice’s
58
N. Bohr: Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?
Physical Review 48: 696–702 (1935)

120
interaction with particle 1. That is, Bohr appears to have agreed with the hypothesis
of locality. Reading 2, which he disagreed with, is about another kind of influence, on
“conditions . . . of predictions.” I am not sure what that means. Here is a possibility:59
Elsewhere in his article, Bohr emphasized that measurements of different observables
(such as position and momentum) require different experimental setups. Maybe these
setups are the “conditions of predictions.” However, although EPR talked about position
and momentum measurements later on, their basic argument concerns only positions.
So no different setups are involved. Did Bohr miss that?
After the passage I quoted, Bohr continued,

“Since these conditions constitute an inherent element of the description


of any phenomenon to which the term “physical reality” can be properly
attached, we see that the argumentation of the mentioned authors does not
justify their conclusion that quantum-mechanical description is essentially
incomplete.”

This sounds positivistic, as if Bohr was unwilling to consider physical reality itself but
kept sliding instead into considering what observers know or how they would describe
phenomena. However, one cannot understand EPR’s reasoning without thinking about
how reality is affected by Alice’s actions.
It seems that reading 1 is what EPR assumed, and I do not see them switch the
meaning in the middle of the argument. After all, their argument is very short, and
there is only one step that makes use of the locality assumption. I am left wondering
whether Bohr understood EPR’s argument.

59
pointed out on pages 129–131 in T. Maudlin: Quantum Non-Locality and Relativity, 3rd edition.
Oxford: Wiley-Blackwell (2011)

121
19 POVMs: Generalized Observables
19.1 Definition
An observable is mathematically represented by a self-adjoint operator. A generalized ob-
servable is mathematically represented by a positive-operator-valued measure (POVM).
Definition 19.1. An operator is called positive iff it is self-adjoint and all (generalized)
eigenvalues are greater than or equal to zero. (In linear algebra, a positive operator is
commonly called “positive semi-definite.”) Equivalently, a bounded operator A : H →
H is positive iff
hψ|A|ψi ≥ 0 for every ψ ∈ H . (19.1)
The sum of two positive operators is again a positive operator, whereas the product
of two positive operators is in general not even self-adjoint. Note that every projection
is a positive operator.
As a first, rough definition, we can say the following: A POVM is a family of positive
operators Ez such that X
Ez = I . (19.2)
z

(Refined definition later.)


   
1/2 1/2
Example 19.2. 1. E1 = , E2 = .
1/3 2/3
In fact, all (generalized) eigenvalues of Ez must lie in [0, 1] because if Eζ ψ = ηψ,
then
(19.2) X
hψ|ψi = hψ|I|ψi = hψ|Eζ |ψi + hψ| Ez |ψi ≥ hψ|Eζ |ψi = ηhψ|ψi , (19.3)
z6=ζ

so η ≤ 1.
   
1 0
2. E1 = , E2 = . In the special case in which all operators Ez are
0 1
projection operators, E is called a projection-valued measure (PVM). In this case,
the subspaces to which Ez and Ez0 (z 6= z 0 ) project must be mutually orthogonal
(homework problem).

3. Every self-adjoint matrix defines a PVM: Let z = α run through the eigenvalues
of A and let Eα be the projection to the eigenspace of A with eigenvalue α,
X
Eα = |φα,λ ihφα,λ | . (19.4)
λ

Then their sum is I, as easily seen from the point of view of an orthonormal basis
of eigenvectors of A. So E is a PVM, the spectral PVM of A. Example 2 above is
of this form for A = σ3 .

122
4. A POVM E and a vector ψ ∈ H with kψk = 1 together define a probability
distribution over z as follows:

Pψ (z) = hψ|Ez |ψi . (19.5)

To see this, note that hψ|Ez |ψi is a nonnegative real number since Ez is a positive
operator, and
X X
Pψ (z) = hψ|Ez |ψi = hψ|I|ψi = kψk2 = 1. (19.6)
z z

5. Fuzzy position observable:


1 (x−z)2
Ez ψ(x) = √ e− 2σ 2 ψ(x) . (19.7)
2πσ 2
Each Ez is a positive operator (but not a projection) because
Z
1 (x−z)2
hψ|Ez |ψi = dx ψ ∗ (x) √ e− 2σ2 ψ(x) ≥ 0 . (19.8)
2πσ 2
The Ez add to unity in the continuous sense:
Z
Ez dz = I . (19.9)

Indeed, Z Z
1 (x−z)2
dz Ez ψ(x) = √ ψ(x) dz e− 2σ 2 = ψ(x) . (19.10)
2πσ 2
The case of a continuous variable z brings us to the general definition of a POVM,
which I will formulate rigorously although we do not aim at rigor in general. The defini-
tion is, in fact, quite analogous to the rigorous definition of a probability distribution in
measure theory: A measure associates a value (i.e., a number or an operator) not with
a point but with a set: E(B) instead of Ez , where B ⊆ Z and Z is the set of all z’s.
More precisely, let Z be a set and B a σ-algebra of subsets of Z ,60 the family of the
“measurable sets.” A probability measure is a mapping µ : B → [0, 1] such that for any
B1 , B2 , . . . ∈ B with Bi ∩ Bj = ∅ for i 6= j,

[  ∞
X
µ Bn = µ(Bn ) . (19.11)
n=1 n=1
60
A σ-algebra is a family B of subsets of Z such that ∅ ∈ B and, for every B1 , B2 , B3 , . . . in A also
B1c:= Z \ B1 ∈ B and B1 ∪ B2 ∪ . . . ∈ B. It follows that Z ∈ B and B1 ∩ B2 ∩ . . . ∈ B. A set Z
equipped with a σ-algebra is also called a measurable space. The σ-algebra usually considered on Rn
consists of the “Borel sets” and is called the “Borel σ-algebra.”

123
Definition 19.3. A POVM on the measurable space (Z , B) acting on the Hilbert
space H is a mapping E from B to the set of bounded operators on H such that each
E(B) is positive, E(Z ) = I, and for any B1 , B2 , . . . ∈ B with Bi ∩ Bj = ∅ for i 6= j,

[  ∞
X
E Bn = E(Bn ) , (19.12)
n=1 n=1

where the series on the right-hand side converges in the operator norm.61

It follows that a POVM E and a vector ψ ∈ H with kψk = 1 together define a


probability measure on Z as follows:

µψ (B) = hψ|E(B)|ψi . (19.13)

(Verify the definition of a probability measure.) Again, one defines a PVM to be a


POVM such that every E(B) is a projection. In the special case in which Z is a
countable set and B consists of all subsets, any POVM satisfies
X
E(B) = Ez (19.14)
z∈B

with Ez = E({z}), so in that case Definition 19.3 boils down to the earlier definition
around (19.2). The fuzzy position observable of Example 5 corresponds to Z = R, B
the Borel sets, and E(B) the multiplication operator
Z
1 (x−z)2
E(B)ψ(x) = dz √ e− 2σ2 ψ(x) , (19.15)
B 2πσ 2
which multiplies by the function 1B ∗ g, where 1B is the characteristic function of B, g
is the Gaussian density function, and ∗ means convolution.
It turns out that every observable is a generalized observable; that is, every self-
adjoint operator A defines a PVM E with E(B) the projection to the so-called spectral
subspace of B. If there is an ONB of eigenvectors of A, then the spectral subspace of B
is the closed span of all eigenspaces with eigenvalues in B; that is, in that case E({z}) is
the projection to the eigenspace of eigenvalue z (and 0 if z is not an eigenvalue). In the
case of a general self-adjoint operator A, the following is a reformulation of the spectral
theorem:

Theorem 19.4. For every self-adjoint operator A there is a uniquely defined PVM E
on the real line with the Borel σ-algebra (the “spectral PVM” of A) such that
Z
A= α E(dα) . (19.16)
R
61
P It is equivalent to merely demand that the series on the right-hand side converges weakly, i.e., that
n hψ|E(Bn )|ψi converges for every ψ ∈ H .

124
R
To explain the last equation: In the same way as one can define the integral Z f (z) µ(dz)
of a measurable Rfunction f : Z → R relative to a measure µ, one can define an operator-
valued integral Z f (z) E(dz) relative to a POVM E. Eq. (19.16) is a generalization of
the relation X
A= α Eα (19.17)
α
for self-adjoint matrices A. In the literature, the spectral PVM is also sometimes called
the spectral measure. Another way of characterizing the spectral PVM is through the
equation
E(B) = 1B (A) (19.18)
for any set B ⊆ R; here, 1B is the characteristic function of the set B, and we apply
this function to the operator A. (For example, in the representation in which A is
a multiplication operator multiplying by the function f , 1B (A) is the multiplication
operator multiplying by 1B ◦ f .)
If several self-adjoint operators A1 , . . . , An commute pairwise, then they can be diag-
onalized simultaneously, i.e., there is a PVM E on Rn such that for every k ∈ {1, . . . , n},
Z
Ak = αk E(dα) . (19.19)
Rn

Example 19.5. The PVM diagonalizing the three position operators X1 , X2 , X3 on


L2 (R3 ) is (
ψ(x) if x ∈ B
E(B)ψ(x) = (19.20)
0 if x ∈
/ B,
mentioned before in (10.17). Equivalently, E(B) is the multiplication by the character-
istic function of B.
Example 19.6. It follows from the quantum formalism that if we make consecutive
ideal quantum measurements of observables A1 , . . . , An (which need not commute with
each other) at times 0 < t1 < . . . < tn respectively on a system with initial wave function
ψ0 ∈ H with kψ0 k = 1, then the joint distribution of the outcomes Z1 , . . . , Zn is of the
form  
P (Z1 , . . . , Zn ) ∈ B = hψ0 |E(B)|ψ0 i (19.21)
for all (Borel) subsets B ⊆ Rn , where E is a POVM on Rn . The precise version of this
statement requires that each Ak has purely discrete spectrum (or, equivalently, an ONB
of eigenvectors in H ). The derivation is a homework exercise.
Example 19.7. In GRWf, the joint distribution of all flashes is of the form
P(F ∈ B) = hΨ0 |G(B)|Ψ0 i (19.22)
for all sets B ⊆ Z , with Ψ0 the initial wave function and G a POVM on the history
space Z of flashes,
n o
Z = (t1 , x1 , i1 ), (t2 , x2 , i2 ), . . . ∈ (R × {1...N }) : 0 < t1 < t2 < . . . . (19.23)
4 ∞


125
Derivation: Consider first the joint distribution of the first two flashes for N = 1
particle: The probability of T1 ∈ [t1 , t1 + dt1 ] is 1t1 >0 e−λt1 λ dt1 ; given T1 , the probability
of X 1 ∈ d3 x1 is, according to (12.11), kC(x1 )ΨT1 − k2 with ΨT1 − = e−iHT1 Ψ0 and C(x1 )
the collapse operator defined in (12.9). Given T1 and X 1 , the probability of T2 ∈
[t2 , t2 + dt2 ] is 1t2 >t1 e−λ(t2 −t1 ) λ dt2 ; given T1 , X 1 , and T2 , the probability of X 2 ∈ d3 x2
is kC(x2 )e−iH(T2 −T1 ) ΨT1 + k2 with ΨT1 + = C(X 1 )ΨT1 − . Putting these formulas together,
the joint distribution of T1 , x1 , T2 , and X 2 is given by
 
P T1 ∈ [t1 , t1 + dt1 ], X 1 ∈ d3 x1 , T2 ∈ [t2 , t2 + dt2 ], X 2 ∈ d3 x2
2
= 10<t1 <t2 e−λt2 λ2 C(x2 )e−iH(t2 −t1 ) C(x1 )e−iHt1 Ψ0 dt1 d3 x1 dt2 d3 x2 (19.24)

= hΨ0 |G(dt1 × d3 x1 × dt2 × d3 x2 )|Ψ0 i (19.25)

with

G(dt1 × d3 x1 × dt2 × d3 x2 ) = 10<t1 <t2 e−λt2 λ2 ×

× eiHt1 C(x1 )eiH(t2 −t1 ) C(x2 )2 e−iH(t2 −t1 ) C(x1 )e−iHt1 dt1 d3 x1 dt2 d3 x2 , (19.26)

which is self-adjoint and positive because (19.25) is always real and ≥ 0. It follows
that also G(B), obtained by summing (that is, integrating) over all infinitesimal vol-
ume elements in B, is self-adjoint and positive. Additivity holds by construction, and
G(Z ) = I because (19.25) is a probability distribution (so hΨ0 |G(Z )|Ψ0 i = 1 for ev-
ery Ψ0 with kΨ0 k = 1). Thus, G is a POVM. For the joint distribution of more than
two flashes or more than one particle, the reasoning proceeds in a similar way. For the
joint distribution of all (infinitely many) flashes, the rigorous proof requires some more
technical steps62 but bears no surprises.

19.2 The Main Theorem about POVMs


It says: For every quantum physical experiment E on a quantum system S whose possible
outcomes lie in a space Z , there exists a POVM E on Z such that, whenever S has
wave function ψ at the beginning of E , the random outcome Z has probability distribution
given by
P(Z ∈ B) = hψ|E(B)|ψi . (19.27)
We will prove this statement in Bohmian mechanics and GRWf. It plays the role of
Born’s rule for POVMs. The experiment E consists of coupling S to an apparatus A at
some initial time ti , letting S ∪ A evolve up to some final time tf , and then reading off
the result Z from A. It is assumed that S and A are not entangled at the beginning of
E:
ΨS∪A (ti ) = ψS (ti ) ⊗ φA (ti ) (19.28)
62
carried out in R. Tumulka: A Kolmogorov Extension Theorem for POVMs. Letters in Mathematical
Physics 84: 41–46 (2008) http://arxiv.org/abs/0710.3605

126
with φA the ready state of A. (The main theorem of POVMs can also be proven for the
case in which tf is itself chosen by the experiment; e.g., the experiment might wait for a
detector to click, and the outcome Z may be the time of the click. I give the proof only
for the simpler case in which tf is fixed in advance.) I will further assume that E has
only finitely many possible outcomes Z; actually, this assumption is not needed for the
proof, but it simplifies the consideration a bit and is satisfied in every realistic scenario.

Proof from Bohmian mechanics. Since the outcome is read off from the pointer
position, 
Z = ζ Q(tf ) , (19.29)
where Q is the Bohmian configuration and ζ is called the calibration function. (In
practice, the function ζ depends only on the configuration of the apparatus, in fact only
on its macroscopic features, not on microscopic details. However, the arguments that
follow apply to arbitrary calibration functions.) Let

U = e−iHS∪A (tf −ti ) (19.30)

and
Bz = {q ∈ R3N : ζ(q) = z} . (19.31)
Then, using the projection operator PB defined in (10.17),

P(Z = z) = P Q(tf ) ∈ Bz (19.32)
Z
= |Ψ(q, tf )|2 dq (19.33)
Bz
= hΨ(tf )|PBz |Ψ(tf )i (19.34)

= hψ ⊗ φ|U † PBz U |ψ ⊗ φi (19.35)


= hψ|Ez |ψiS , (19.36)

where h·|·iS denotes the inner product in the Hilbert space of the system S alone (as
opposed to the Hilbert space of S ∪ A), and Ez is defined as follows: For given ψ, form
ψ ⊗ φ, then apply the operator U † PBz U , and finally take the partial inner product with
φ. The partial inner product of a function Ψ(x, y) with the function φ(y) is a function
of x defined as Z
hφ|Ψiy (x) = dy φ∗ (y) Ψ(x, y) . (19.37)

Thus,
Ez ψ = hφ|U † PBz U (ψ ⊗ φ)iy . (19.38)
We now verify that E is a POVM. First, Ez is a positive operator because

hψ|Ez |ψi = hΨ(tf )|PBz |Ψ(tf )i ≥ 0 (19.39)

127
P
for every ψ. Second, z Ez = I because
X X
Ez ψ = hφ|U † PBz U (ψ ⊗ φ)iy (19.40)
z z
X
= hφ|U † PBz U (ψ ⊗ φ)iy (19.41)
z

= hφ|U IU (ψ ⊗ φ)iy (19.42)

= hφ|I(ψ ⊗ φ)iy = ψ . (19.43)

Here, we have used that X


PBz = I , (19.44)
z

that U U = I, and that the partial inner product of ψ ⊗ φ with φ returns ψ. Eq. (19.44)
follows from the fact that the sets Bz form a partition of configuration space R3N (i.e.,
they are mutually disjoint and together cover the entire configuration space, ∪z Bz =
R3N ). This, in turn, follows from the assumption that the calibration function ζ is
defined everywhere in R3N .63 Thus, the proof is complete. 

Proof from GRWf. Let F = {(T1 , X 1 , I1 ), (T2 , X 2 , I2 ), . . .} be the set of flashes (of
both S and A) from ti onwards. We know from Example 19.7 that the distribution of F
(i.e., the joint distribution of all flashes after ti ) is given by Ψ(ti ) and some POVM G:

P(F ∈ B) = hΨ(ti )|G(B)|Ψ(ti )i . (19.45)

Since the outcome Z of the experiment is read off from A after ti , it is a function of F ,

Z = ζ(F ) . (19.46)

(Z is a function of F because the flashes define where the pointers point, and what the
shape of the ink on a sheet of paper is. It would even be realistic to assume that Z
depends only on the flashes of the apparatus, but this restriction is not needed for the
further argument.)
Let Bz = {f : ζ(f ) = z}, the set of flash patterns having outcome z. Then,

P(Z = z) = P F ∈ Bz (19.47)
= hΨ(ti )|G(Bz )|Ψ(ti )i (19.48)
= hψ|EzGRW |ψi (19.49)

with
EzGRW ψ = hφ|G(Bz )|ψ ⊗ φiy . (19.50)
63
The physical meaning of this asumption is that the experiment always has some outcome. You
may worry about the possibility that the experiment could not be completed as planned due to power
outage, asteroid impact, or whatever. This possibility can be taken into account by introducing a
further element f for “failed” into the set Z of possible outcomes.

128
In fact, EzGRW may be different from Ez obtained from Bohmian mechanics as in (19.38),
in agreement with the fact that the same experiment (using the same initial wave func-
tion of the apparatus, etc.) may yield different outcomes in GRW than in Bohmian
mechanics. (However, since we know the two theories make very very similar predic-
tions, EzGRW will usually be very very close to Ez .) To see that EzGRW is a POVM, we
note that
hψ|EzGRW |ψi = hΨ(t1 )|G(Bz )|Ψ(t1 )i ≥ 0 (19.51)
and
X X
EzGRW ψ = hφ| G(Bz )|ψ ⊗ φiy (19.52)
z z
= hφ|G(∪z Bz )|ψ ⊗ φiy (19.53)

= hφ|I|ψ ⊗ φiy = ψ (19.54)


using ∪z Bz = Z . This completes the proof. 

The main theorem about POVMs is equally valid in orthodox quantum mechanics
(OQM). However, since OQM does not permit a coherent analysis of measurement
processes (as it suffers from the measurement problem), we cannot give a complete
proof of the main theorem from OQM, but the same reasoning as given in the proof
from Bohmian mechanics would be regarded as compelling in OQM. At the same time,
the main theorem undercuts the spirit of OQM, which is to leave the measurement
process unanalyzed and to introduce observables by postulate. Put differently, the main
theorem about POVMs makes it harder to ignore the measurement problem.

19.3 Limitations to Knowledge


Corollary 19.8. There is no experiment with Z = ψ or Z = Cψ. That is, one cannot
measure the wave function of a given system, not even up to a global phase.
Proof. Suppose there was an experiment with Z = ψ. Then, for any given ψ, Z is
deterministic, i.e., its probability distribution is concentrated on a single point, P(Z =
φ) = δ(φ − ψ). The dependence of this distribution on ψ is not quadratic, and thus not
of the form hψ|Eφ |ψi for any POVM E. The argument remains valid when we replace
ψ by Cψ.
This fact amounts to a limitation to knowledge in any version of quantum mechanics
in which wave functions are part of the ontology, which includes all interpretations of
quantum mechanics that we have talked about: Suppose Alice chooses a direction in
space n, prepares a spin- 12 particle in the state |n-upi, and hands that particle over to
Bob. Then, by Corollary 19.8, Bob has no way of discovering n if Alice does not give the
information away. The best thing Bob can do is, in fact, a Stern–Gerlach experiment in
any direction he likes, say in the z-direction; then he obtains one bit of information, up
or down; if the result was “up” then it is more likely that n lies on the upper hemisphere
than on the lower.

129
Corollary 19.9. There is no experiment in Bohmian mechanics that can measure the
instantaneous velocity of a particle with unknown wave function.

Proof. Again, the distribution of the velocity Im∇ψ/ψ(Q) with Q ∼ |ψ|2 is not quadratic
in ψ.
In contrast, the asymptotic velocity can be measured, and its probability distribution
is in fact quadratic in ψ: Recall from (7.40) that it is given by (m/~)3 |ψ(mu/~)|
b 2
.
The impossibility of measuring instantaneous velocity goes along with the impossi-
bility to measure the entire trajectory without disturbing it. If we wanted to measure
the trajectory, for example by repeatedly measuring the positions every ∆t with inaccu-
racy ∆x, then the measurements will collapse the wave function, with the consequence
that the observed trajectory is very different from what the trajectory would have been
had we not intervened. Some authors regard this as an argument against Bohmian me-
chanics. Bell disagreed (Speakable and unspeakable in quantum mechanics, page 202):

“To admit things not visible to the gross creatures that we are is, in my
opinion, to show a decent humility, and not just a lamentable addiction to
metaphysics.”

So, Bell criticized the positivistic idea that anything real can always be measured. In-
deed, this idea seems rather dubious in view of Corollary 19.8. We will sharpen this
consideration in Section 21.3.

19.4 The Concept of Observable


The main theorem about POVMs suggests that POVMs form the natural generalization
of the notion of observables. It also allows us to explain what an observable ultimately
is. Here is the natural general definition:

Definition 19.10. Two experiments (that can be carried out on arbitrary wave func-
tions ψ ∈ H with norm 1) are equivalent in law iff for every ψ ∈ H with kψk = 1,
they have the same distribution of the outcome. (Thus, they are equivalent in law iff
they have the same POVM.) A corresponding equivalence class of experiments is called
an observable.

If E1 and E2 are equivalent in law and a particular run of E1 has yielded the outcome
z1 , it cannot be concluded that E2 would have yielded z1 as well. The counterfactual
question, “what would z2 have been if we had run E2 ?” cannot be tested empirically, but
it can be analyzed in Bohmian mechanics; there, one sometimes finds z2 6= z1 (for the
same QS and ψ in both experiments, but different QA and φ). For example, let E1 be a
Stern–Gerlach experiment in the z direction and E2 the Stern–Gerlach experiment with
inverted polarity as depicted in Figure 9.3 and described in Section 9.9. Then E1 and E2
are equivalent in law, although in Bohmian mechanics, the two experiments will often
yield different results when applied to the same 1-particle wave function and position.

130
This situation illustrates why the term “observable” can be rather misleading: It is
intended to suggest “observable quantity,” but an observable is not even a well-defined
quantity to begin with (as the outcome Z depends on QA and φ), it is a class of
experiments with equal probability distributions.
This point is connected to Wheeler’s fallacy. Recall the delayed choice experiment,
but now consider detecting the particle either directly at the slits or far away, ignoring
the interference region. As E1 , we put detectors directly at the slits and say that the
outcome is Z1 = +1 if the particle was detected in the upper slit and Z1 = −1 if in the
lower one. This is a kind of position measurement that can be represented in the 2d
Hilbert space formed by wave functions of the form

ψ = c1 |upper sliti + c2 |lower sliti , (19.55)

so P(Z1 = +1) = |c1 |2 . Relative to the basis {|upper sliti, |lower sliti}, the POVM is
the spectral PVM of σ3 . As E2 , we put the detectors far away and say that Z2 = +1 if
the particle was detected in the lower cluster and Z2 = −1 if in the upper cluster. ψ
evolves to
ψ 0 = c1 |lower clusteri + c2 |upper clusteri , (19.56)
so P(Z2 = +1) = |c1 |2 . So, Z1 and Z2 have the same distribution, E1 and E2 have the
same POVM, and the two experiments are equivalent in law, although we know that
the Bohmian particle often passes through the lower slit and still ends up in the lower
cluster.
Now comes the point that has confused a number of authors64 : Since E1 measures the
“position observable,” and since E1 and E2 “measure” the same observable, it is clear
that E2 also measures the position observable. People concluded that E2 “measures
through which slit the particle went”—Wheeler’s fallacy! People concluded further
that since the Bohmian trajectory may pass through the upper slit while Z2 = −1,
Bohmian mechanics must somehow disagree with measured facts about which slit the
particle went through. Bad, bad Bohm, they concluded. (Some authors called Bohm’s
trajectories “surrealistic,” perhaps alluding to the dreamlike, absurd content of Salvador
Dalı́’s paintings, to brand the realist view as absurd.) Of course, it is the other way
around: the “measurement” did not at all measure which slit the particle went through.
Here is a variant of the example, due to Englert et al.. In a double-slit experiment
with a spin- 21 particle, arrange that a pure spin-up wave emanates from the upper slit,
and a pure spin-down wave from the lower slit,

ψ = c1 | ↑i|upper sliti + c2 | ↓i|lower sliti . (19.57)

At the screen, after measuring the position, also measure the z-spin. (Practically, we
could make a hole in the screen, so that only the part of the wave function arriving
at a certain position can move on, and place a Stern–Gerlach experiment behind the
hole.) There will be no interference pattern in the position results because the partial
64
For example (using a different but similar setup), B.-G. Englert, M.O. Scully, G. Süssmann, and
H. Walther: Surrealistic Bohm Trajectories. Zeitschrift für Naturforschung A 47: 1175–1186 (1992)

131
waves from each slit do not interfere with each other. Now let Z be the result of the
spin measurement. In the 2d Hilbert space formed by the wave functions of the form
(19.57), the POVM is again the spectral PVM of σ3 , and this motivated people to
say that the spin measurement was really a measurement of the position observable at
the moment of passing the slits, and that if Z = +1 then the particle went through the
upper slit—which is not true according to Bohm’s equation of motion (9.17) for a spin- 12
particle.

132
20 Time of Detection
20.1 The Problem
Suppose we set up a detector, wait for the arrival of the particle at the detector, and
measure the time T at which the detector clicks. What is the probability distribution
of T ? This is a natural question not covered by the usual quantum formalism because
there is no self-adjoint operator for time. But from the main theorem about POVMs it
is clear that there must be a POVM E such that

P(T ∈ B) = hψ0 |E(B)|ψ0 i . (20.1)

That is, time of detection is a generalized observable. In this section we take a look at
this POVM E.

ψ0

Figure 20.1: A quantum particle in a region Ω surrounded by a surface Σ = ∂Ω made


out of detectors (symbolized by ⊥’s), each of which is connected to a pointer. In part,
the figure depicts the situation before the experiment, as the initial wave function ψ0 is
symbolized by a wave, and in part the situation after the experiment, as the location
of detection is indicated by one pointer in the triggered position. Figure adapted from
page 347 of D. Dürr and S. Teufel: Bohmian mechanics, Springer-Verlag (2009)

Suppose that we form a surface Σ ⊂ R3 out of little detectors so we can measure the
time and the location at which the quantum particle first crosses Σ. Suppose further
that, as depicted in Figure 20.1, Σ divides physical space R3 into two regions, Ω and
its complement, and the particle’s initial wave function ψ0 is concentrated in Ω. The
outcome of the experiment is the pair Z = (T, X) of the time T ∈ [0, ∞) of detection

133
and the location X ∈ Σ of detection; should no detection ever occur, then we write
Z = ∞. So the value space of E is Z = [0, ∞) × Σ ∪ {∞}, and E acts on L2 (Ω). We
want to compute the distribution of Z from ψ0 .
Let us compare the problem to Born’s rule. In Born’s rule, we choose a time t0
and measure the three position coordinates at time t0 ; here, if we take Ω to be the half
space {(x, y, z) : x > x0 } and Σ its boundary plane {(x, y, z) : x = x0 }, then we choose
the value of one position coordinate (x0 ) and measure the time as well as the other
two position coordinates when the particle reaches that value. Put differently in terms
of space-time R4 = {(t, x, y, z)}, Born’s rule concerns measuring where the particle
intersects the spacelike hypersurface {t = t0 }, and our problem concerns measuring
where the particle intersects the timelike hypersurface {x = x0 }. We could say that we
need a Born rule for timelike hypersurfaces.
I should make three caveats, though.

• I have used language such as “particle arriving at a surface” that presupposes the
existence of trajectories although we know that some theories of quantum me-
chanics (GRWm and GRWf) claim that there are no trajectories, and still these
theories are approximately empirically equivalent to Bohmian mechanics, so the
time and location of the detector click would have approximately the same dis-
tribution as in Bohmian mechanics. Our problem really concerns the distribution
of the detection events, and we should keep in mind that in some theories the
trajectory language cannot be taken seriously.

• Even in Bohmian mechanics, there is a crucial difference between the case with
the spacelike hypersurface and the one with the timelike hypersurface: The point
where the particle arrives on the timelike hypersurface {x = x0 } may depend on
whether or not detectors are present on that hypersurface. A detector that does
not click may still affect ψ and thus the future particle trajectory. That is why
I avoid the expression “time of arrival” (which is often used in the literature) in
favor of “time of detection.” In contrast, the point where the particle arrives at
the spacelike hypersurface {t = t0 } does not depend on whether or not detectors
are placed along {t = t0 }.

• The exact POVM E is given by (19.38) (with tf some late time at which we read
off the values of T and X recorded by the apparatus) and will depend on the exact
wave function of the detectors, so different detectors will lead to slightly different
POVMs. Of course, we expect that these differences are negligible. What we want
is a simple rule defining the POVM for an ideal detector, Eideal . That, of course,
involves making a definition of what counts as an ideal detector. So the formula
for Eideal is in part a matter of definition, as long as it fits well with the POVMs
E of real detectors.

134
20.2 The Absorbing Boundary Rule
The question of what Eideal is is not fully settled; I will describe the most plausible
proposal, the absorbing boundary rule.65 Such a rule was for a long time believed to be
impossible because of the quantum Zeno effect and Allcock’s paradox (see homework
exercises). Henceforth I will write E instead of Eideal . Let Σ = ∂Ω, ψ0 be concentrated
in Ω, kψ0 k = 1, and let κ > 0 be a constant of dimension 1/length (it will be a parameter
of the detector). Here is the rule:

Absorbing Boundary Rule. Solve the Schrödinger equation

∂ψ ~2 2
i~ =− ∇ ψ+Vψ (20.2)
∂t 2m
in Ω with potential V : Ω → R and boundary condition
∂ψ
(x) = iκψ(x) (20.3)
∂n
at every x ∈ Σ, with ∂/∂n the outward normal derivative on the surface, ∂ψ/∂n :=
n(x) · ∇ψ(x) with n(x) the outward unit normal vector to Σ at x ∈ Σ. Then, the rule
asserts,
  Zt2 Z
Pψ0 t1 ≤ T < t2 , X ∈ B = dt d2 x n(x) · j ψt (x) (20.4)
t1 B

for any 0 ≤ t1 < t2 and any set B ⊆ Σ, with d2 x the surface area element and j ψ the
probability current vector field (2.20). In other words, the joint probability density of T
and X relative to dt d2 x is the normal component of the current across the boundary,
jnψt (x) = n(x) · j ψt (x). Furthermore,
Z∞ Z
Pψ0 (Z = ∞) = 1 − dt d2 x n(x) · j ψt (x) . (20.5)
0 Ω

This completes the statement of the rule. 

Let us study the properties of the rule. To begin with, the boundary condition (20.3)
implies that the current vector j at the boundary is always outward-pointing: For every
x ∈ Σ,

∗ ∂ψ
    ~κ

n(x) · j(x) = ~
m
Im ψ(x) (x) = ~
m
Im ψ(x) iκψ(x) = |ψ(x)|2 ≥ 0 . (20.6)
∂n m
65
R. Werner: Arrival time observables in quantum mechanics. Annales de l’Institut Henri Poincaré,
section A 47: 429–449 (1987)
R. Tumulka: Distribution of the Time at Which an Ideal Detector Clicks. (2016) http://arxiv.
org/abs/1601.03715

135
For this reason, (20.3) is called an absorbing boundary condition: It implies that there
is never any current coming out of the boundary. In particular, the right-hand side of
(20.4) is non-negative.
So the rule invokes a new kind of time evolution for a 1-particle wave function as
an effective treatment of the whole system formed by the 1 particle and the detec-
tors together. It is useful to picture the Bohmian trajectories for this time evolution.
Eq. (20.6) implies that the Bohmian velocity field v(x) is always outward-pointing at
the boundary, n(x) · v(x) > 0 for all x ∈ Σ; in fact, the normal velocity is prescribed,
n(x)·v(x) = ~κ/m. In particular, Bohmian trajectories can cross Σ only in the outward
direction; when they do, they end on Σ, as ψ is not defined behind Σ. Put differently,
no Bohmian trajectories begin on Σ, they all begin at t = 0 in Ω with |ψ0 |2 distribu-
tion. In fact, the right-hand side of (20.4) is exactly the probability distribution of the
space-time point at which the Bohmian trajectory reaches the boundary. That is not
surprising, as in a Bohmian world we would expect the detector to click when and where
the particle reaches the detecting surface. As a further consequence, the right-hand side
of (20.5) is exactly the probability that the Bohmian trajectory never reaches Σ. In
particular, (20.4) and (20.5) together define a probability distribution on Z . Had we
evolved ψ0 with the Schrödinger equation on R3 without boundary condition on Σ, then
some Bohmian trajectories may cross Σ several times in both directions; this illustrates
that the trajectory in the presence of detectors can be different from what it would have
been in the absence of detectors.
Since probability can only be lost at the boundary, never gained,
Z
2
kψt k = d2 x |ψt (x)|2 (20.7)

can only decrease with t, never increase. So here we are dealing with a new kind
of Schrödinger equation whose time evolution is not unitary as the norm of ψ is not
conserved. The time evolution operators Wt , defined by the property Wt ψ0 = ψt , have
the following properties: First, they are not unitary but satisfy kWt ψk ≤ kψk; such
operators are called contractions. Second, Ws Wt = Ws+t and W0 = I; a family (Wt )t≥0
with this property is called a semigroup. Thus, the Wt form a contraction semigroup.
Using the Hille-Yosida theorem from functional analysis, one can prove66

Theorem 20.1. For every κ > 0, the Schrödinger equation (20.2) with the boundary
condition (20.3) defines a contraction semigroup (Wt )t≥0 , Wt : L2 (Ω) → L2 (Ω).

In fact, kψt k2 is the probability that the Bohmian particle is still somewhere in Ω
at time t, that is, has not reached the boundary yet. In particular, as an alternative to
(20.5) we can write
P(Z = ∞) = lim kψt k2 . (20.8)
t→∞

66
S. Teufel and R. Tumulka: Existence of Schrödinger Evolution with Absorbing Boundary Condition.
(2019) http://arxiv.org/abs/1912.12057

136
The conclusions from our considerations about Bohmian trajectories can also be
obtained from the Ostrogradski–Gauss integral theorem (divergence theorem) in 4 di-
mensions: The 4-vector field j = (ρ, j) has vanishing 4-divergence, as that is what the
continuity equation (2.19) expresses. Integrating the divergence over [0, t] × Ω yields
Z t Z
0= dt0 d3 x div j(t0 , x) (20.9)
0 Ω
Z Z Z t Z
= 3 3
d x ρ(t, x) − d x ρ(0, x) + dt0 d2 x n(x) · j(t0 , x) (20.10)
Ω Ω 0 Σ
Z t Z
2
= kψt k − 1 + dt0 d2 x n(x) · j(t0 , x) . (20.11)
0 Σ

Since the last integrand is non-negative, kψt k2 is decreasing with time and equals 1−
the flux of j into the boundary during [0, t]. In particular,
Z∞ Z
0
2
lim kψt k = 1 − dt d2 x n(x) · j(t0 , x) , (20.12)
t→∞
0 Σ

so (20.5) is non-negative, and (20.4) and (20.5) together define a probability distribution.
So what is the POVM E? It is given by
 ~κ †
E dt × d2 x = W |xihx| Wt dt d2 x (20.13)
m t
E({∞}) = lim Wt† Wt . (20.14)
t→∞

Since the E(dt) are not projections, there are in general no eigenstates of detection time.
Variants of the absorbing boundary rule have been developed for moving surfaces,
systems of several detectable particles, and particles with spin.67
Here is why one should expect the absorbing boundary rule in the presence of a
detector. For simplicity, let Ω be an interval in 1d, let x be the coordinate of the particle
P , and let y be the configuration of the detectors D. The whole system S = P ∪ D
evolves unitarily with initial wave function Ψ0 = ψ0 ⊗ ϕ0 . Let A be the region of
y-configurations in which the detectors have not clicked (where the “ready state” ϕ0
is concentrated), B where the left detector has fired, and C the right one. So, Ψ0 is
concentrated in Ω × A, see Figure 20.2.
The interaction between P and D occurs, not in the interior of Ω × A, but only
near the boundary ∂Ω × A: Any probability current in Ω × A that reaches ∂Ω × A will
be transported quickly to ∂Ω × B or ∂Ω × C and then remain in R × B or R × C,
regions of configuration space that are macroscopically separated from Ω × A. Due to
this separation, parts of Ψ that have reached B or C wil not be able to propagate back
to A and interfere there with parts of Ψ that have not yet left A; that is, the detection is
67
R. Tumulka: Detection Time Distribution for Several Quantum Particles. (2016) http://arxiv.
org/abs/1601.03871

137
y C

detector
B

x

Figure 20.2: Region in configuration space where one should expect the wave function
to propagate in, as explained in the text.

practically irreversible, resulting in decoherence between the parts of the wave function
in A, B, and C, and the motion of the Bohmian configuration from A to B or C is
one-way. As a consequence, the x-component of the current at ∂Ω × A should point
outward. We are thus led to the following picture: (i) The Schrödinger equation (2.1)
holds for ψ inside Ω. (ii) Something happens at ∂Ω, which should not depend sensitively
on the details of the initial detector state ϕ0 . (iii) The evolution of ψt in Ω is still linear,
but no longer unitary because ψt corresponds to only a part of the full wave function Ψt ,
i.e., the part in A. (iv) The current j ψt (x) at x ∈ ∂Ω always points outward. (v) The
evolution of Ψt in A is autonomous, i.e., not affected by whatever Ψt looks like in R × B
or R × C, as those parts cannot propagate back to R × A. (vi) Thus, the evolution
of ψt in Ω should be autonomous, depending only on few parameters (“κ”) encoding
properties of the detectors. These features suggest an absorbing boundary condition at
∂Ω for ψt .
Another remark concerns the fact that while the Bohmian particle is sure to be
absorbed when it reaches ∂Ω, part of the wave arriving at the boundary will be reflected.
For example, in 1d with Ω = (−∞, 0] and the detector at the origin, suppose we start
with a wave packet ψ0 in the left half axis that is close to a plane wave with k > 0
(i.e., sharply peaked in momentum space around k). Then part of the packet will be
absorbed at the origin, and part be reflected. A quick recipe to compute the absorption
coefficient Ak ∈ [0, 1] goes as follows.

Exercise. Consider an eigenfunction ψ : (−∞, 0] of the Hamiltonian of the form

ψ(x) = eikx + ck e−ikx (20.15)

(consisting of an incoming plane wave eikx and a reflected wave ck e−ikx ). Use the
boundary condition (20.3) to compute ck for every k > 0. 

138
Since the strength of the reflected wave is |ck |2 , that is the fraction of the wave that
gets reflected, while the fraction Ak := 1 − |ck |2 gets absorbed; Figure 20.3 shows a plot
of Ak . The maximum occurs at k = κ, so the value κ characterizes the energy ~2 κ2 /2m
at which the detector is maximally efficient.

Figure 20.3: Graph of the absorption strength Ak of the ideal detecting surface as a
function of wave number k in units of κ. The maximum attained at k = κ is equal to 1,
corresponding to complete absorption.

139
21 Density Matrix and Mixed State
In this chapter we prove a limitation to knowledge in quantum mechanics that follows
from the main theorem about POVMs. Let

S(H ) = {ψ ∈ H : kψk = 1} (21.1)

denote the unit sphere in Hilbert space. Suppose that we have a mechanism that gener-
ates random wave functions Ψ ∈ S(H ) with probability distribution µ on S(H ). Then
it is impossible to determine µ empirically. In fact, there exist different distributions
µ1 6= µ2 that are empirically indistinguishable, i.e., they lead to the same distribution of
outcomes Z for any experiment. We call such distributions empirically equivalent (which
is an equivalence relation) and show that the equivalence classes are in one-to-one cor-
respondence with certain operators known as density matrices or density operators.
To describe these matters, we need the mathematical concept of trace.

21.1 Trace
Definition 21.1. The trace of a matrix A = (Amn ) is the sum of its diagonal elements.
The trace of an operator T is defined to be the sum of the diagonal elements of its
matrix representation Tnm = hn|T |mi relative to an arbitrary ONB {|ni},

X
tr T = hn|T |ni . (21.2)
n=1

Every positive operator either has finite trace or has trace +∞, and the value of the
trace does not depend on the choice√of ONB. The trace class is the set of those operators
T for which the positive operator T † T has finite trace. For every operator from the
trace class, the trace is finite and does not depend on the ONB.
The trace has the following properties for all operators A, B, . . . from the trace class:
(i) The trace is linear:

tr(A + B) = tr A + tr B , tr(λA) = λ tr A (21.3)

for all λ ∈ C.

(ii) The trace is invariant under cyclic permutation of factors:

tr(AB · · · Y Z) = tr(ZAB · · · Y ) . (21.4)

In particular tr(AB) = tr(BA) and tr(ABC) = tr(CAB), which is, however, not
always the same as tr(CBA).

(iii) If an operator T can be diagonalized, i.e., if there exists an orthonormal basis of


eigenvectors, then tr(T ) is the sum of the eigenvalues, counted with multiplicity
(= degree of degeneracy).

140
(iv) The trace of the adjoint operator T † is the complex-conjugate of the trace of T :
tr(T † ) = tr(T )∗ .

(v) The trace of a self-adjoint operator T is real.

(vi) If T is a positive operator then tr(T ) ≥ 0.

21.2 The Trace Formula in Quantum Mechanics


Exercise: If kψk = 1, then |ψihψ| is the projection to Cψ.

Suppose that (by whatever mechanism) we have generated a random wave function
Ψ ∈ S(H ) with probability distribution µ on S(H ). Then for any experiment E with
POVM E, the probability distribution of the outcome Z is
Z
P(Z ∈ B) = EhΨ|E(B)|Ψi = µ(dψ) hψ|E(B)|ψi = tr(ρµ E(B)) , (21.5)
S(H )

where E means expectation, and


Z
ρµ = E|ΨihΨ| = µ(dψ) |ψihψ| (21.6)
S(H )

is called the density operator or density matrix (rarely: statistical operator ) of the
distribution µ. Eq. (21.5) is called the trace formula. It was discovered by John von
Neumann in 1927,68 except that von Neumann did not know POVMs and considered
only PVMs. In case the distribution µ is concentrated on discrete points on S(H ),
(21.6) becomes X
ρµ = E|ΨihΨ| = µ(ψ) |ψihψ| . (21.7)
ψ

In order to verify (21.5), note first that


 
tr |ψihψ| E = hψ|E|ψi (21.8)

because, if we choose the basis {|ni} in (21.2) such that |1i = ψ, then the summands
in (21.2) are hn|ψihψ|E|ni, which for n = 1 is hψ|E|ψi and for n > 1 is zero because
hn|1i = 0. By linearity, we also have that
X  X
tr µ(ψj )|ψj ihψj | E = µ(ψj ) hψj |E|ψj i , (21.9)
j j

68
J. von Neumann: Wahrscheinlichkeitstheoretischer Aufbau der Quantenmechanik. Göttinger
Nachrichten 1(10): 245–272 (1927). Reprinted in John von Neumann: Collected Works Vol. I,
A.H. Taub (editor), Oxford: Pergamon Press (1961)

141
which yields (21.5) for any µ that is concentrated on finitely many points ψj on S(H ).
One can prove (21.5) for arbitrary probability distribution µ by considering limits.
Now let us draw conclusions from the formula (21.5). It implies that the distribution
of the outcome Z depends on µ only through ρµ . Different distributions µa , µb can
have the same ρ = ρµa = ρµb ; for example, if H = C2 then the uniform distribution
over S(H ) has ρ = 12 I, and for every orthonormal basis |φ1 i, |φ2 i of C2 the probability
distribution
1
δ + 21 δφ2
2 φ1
(21.10)
also has ρ = 12 I. Such two distributions µa , µb will lead to the same distribution of
outcomes for any experiment, and are therefore empirically equivalent.

21.3 Limitations to Knowledge


We can turn this result into an argument showing that there must be facts we cannot
find out by experiment: Suppose I choose between two options, I choose µ to be either
µa or µb . Suppose that each µ is of the form (21.10), µa for the eigenbasis of σ3 and µb
for that of σ1 . Then I choose n = 10, 000 points ψi on S(H ) at random independently
with µ, then I prepare n systems with wave functions ψi , and then I hand these systems
to you with the challenge to determine whether µ = µa or µ = µb . As a consequence
of (21.5), you cannot determine that by means of experiments on the n systems. On
the other hand, nature knows the right answer, as I will argue now. I have kept records
of each ψi , so I can make a list of the m ≈ 5, 000 systems that I prepared in φ1 . I tell
you that I did choose µb , I give you the list and predict that for all m systems on the
list, a quantum measurement of σ1 will yield +1, while for all others it will yield −1.
By the laws of quantum mechanics, you will find my prediction confirmed. But had I
prepared half of all systems in |z-upi and the other half in |z-downi, then all outcomes
of σ1 -measurements would have had to be random with equal probability for +1 and −1,
so my predictions would have been wrong in about half of the cases. Thus, nature must
remember at least whether it was a mixture of σ1 -eigenvectors or of σ3 -eigenvectors. (In
fact, nature must remember much more, viz., which systems exactly must yield +1 upon
measurement of σ1 .) There is a fact in nature (viz., whether µ = µa or µ = µb ) that we
cannot discover empirically. Nature can keep a secret. Limitations to knowledge are a
fact of quantum mechanics, regardless of which interpretation we prefer.

21.4 Density Matrix and Dynamics


If the random vector Ψ evolves according to the Schrödinger equation, Ψt = e−iHt/~ Ψ,
the distribution changes into µt and the density matrix into

ρt = e−iHt/~ ρeiHt/~ . (21.11)

In analogy to the Schrödinger equation, this can be written as a differential equation,


dρt
= − ~i [H, ρt ] , (21.12)
dt

142
known as the von Neumann equation. The step from (21.11) to (21.12) is based on the
fact that
d At
e = AeAt = eAt A . (21.13)
dt
A density matrix is also often called a quantum state. If ρ = |ψihψ| with kψk = 1,
then ρ is usually called a pure quantum state, otherwise a mixed quantum state. A
probability distribution µ has ρµ = |ψihψ| if and only if µ is concentrated on Cψ, i.e.,
Ψ = eiΘ ψ with a random global phase factor.
A density matrix ρ is always a positive operator with tr ρ = 1. Indeed, a sum (or
integral) of positive operators is positive, and µ(ψj )|ψj ihψj | is positive. Furthermore,
Z Z
tr(ρµ ) = tr(ρµ E(Z )) = µ(dψ) hψ|E(Z )|ψi = µ(dψ) = 1 . (21.14)
S(H ) S(H )

Conversely, every positive operator ρ with tr ρ = 1 is a density matrix, i.e., ρ = ρµ


for some probability distribution µ on S(H ). Here is one such µ: find an orthonormal
basis {|φn i : n ∈ N} of eigenvectors of ρ with eigenvalues pn ∈ [0, ∞). Then
X
pn = tr ρ = 1 . (21.15)
n

Now let µ be the distribution that gives probability pn to φn ; its density matrix is just
the ρ we started with.

143
22 Reduced Density Matrix and Partial Trace
There is another way in which density matrices arise, leading to what is called the
reduced density matrix , as opposed to the statistical density matrix of the previous
chapter. Suppose that the system under consideration is bipartite, i.e., consists of two
parts, system a and system b, so that its Hilbert space is H = Ha ⊗ Hb .
Theorem 22.1. In Bohmian mechanics, an experiment in which the apparatus interacts
only with system a but not with system b has a POVM of the form

E(B) = Ea (B) ⊗ Ib , (22.1)

where Ib is the identity on Hb .


Proof: homework exercise. A corresponding theorem holds in GRW theory. The
statement is regarded as true also in orthodox quantum mechanics, but there it is not
possible to give a clean proof because orthodox quantum mechanics does not permit an
analysis of measurement-like processes.

In the case (22.1), the distribution of the outcome is



P(Z ∈ B) = hψ|E(B)|ψi = tr ρψ Ea (B) (22.2)

with the reduced density matrix of system a

ρψ = trb |ψihψ| , (22.3)

where trb means the partial trace over Hb . The reduced density matrix and the trace
formula for it were discovered by Lev Landau in 1927.69

22.1 Partial Trace


This means the following. Let {φan } be an orthonormal basis of Ha and {φbn } an or-
thonormal basis of Hb . Then {φan ⊗ φbm } is an orthonormal basis of H = Ha ⊗ Hb . If T
is an operator on H then the operator S = trb T on Ha is characterized by its matrix
elements ∞
X
a a
hφn |S|φk i = hφan ⊗ φbm |T |φak ⊗ φbm i , (22.4)
m=1

where the inner products on the right hand side are inner products in Ha ⊗ Hb . We
will sometimes write ∞
X
S= hφbm |T |φbm i , (22.5)
m=1

where the inner products are partial inner products.


The partial trace has the following properties:
69
L. Landau: Das Dämpfungsproblem in der Wellenmechanik. Zeitschrift für Physik 45: 430–441
(1927)

144
(i) It is linear:
trb (S + T ) = trb (S) + trb (T ) , trb (λT ) = λ trb (T ) (22.6)

(ii) tr(trb (T )) = tr(T ). Here, the first tr symbol means the trace in Ha , the second
one the partial trace, and the last one the trace in Ha ⊗ Hb . This property follows
from (22.4) by setting k = n and summing over n.

(iii) trb (T † ) = (trb T )† . The adjoint of the partial trace is the partial trace of the
adjoint. In particular, if T is self-adjoint then so is trb T .

(iv) trb (Ta ⊗ Tb ) = (tr Tb )Ta .

(v) If T is a positive operator then so is trb T .


 
(vi) trb S(Ta ⊗ Ib ) = (trb S)Ta .
   
(vii) trb S(Ia ⊗ Tb ) = trb (Ia ⊗ Tb )S .

22.2 The Trace Formula (22.2)


From properties (vi) and (ii) we obtain that
   
tr S(Ta ⊗ Ib ) = tr (trb S)Ta . (22.7)

Setting S = |ψihψ| and Ta = Ea (B), we find that trb S = ρψ and


   
hψ|Ea (B) ⊗ Ib |ψi = tr |ψihψ|(Ea (B) ⊗ Ib ) = tr ρψ Ea (B) , (22.8)

which proves (22.2).


From properties (ii) and (v) it follows also that ρψ is a positive operator with trace
P operator ρ on Ha with trP
1. Conversely, every positive ρ = 1 arises as a reduced density
matrix. Indeed, if ρ = n pn |φn ihφn | with pn ≥ P √ n pn = 1 and orthonormal φn ,
0,
then choose any ONB {χm } of Hb and set ψ = n pn φn ⊗ χn . Then ψ ∈ Ha ⊗ Hb ,
kψk = 1, and trb |ψihψ| = ρ.

22.3 Statistical Reduced Density Matrix


Statistical density matrices as in (21.6) and reduced density matrices can be combined:
If Ψ ∈ Ha ⊗ Hb is random then set

ρ = E trb |ΨihΨ| = trb E |ΨihΨ| . (22.9)

145
22.4 The Measurement Problem Again
Statistical and reduced density matrices sometimes get confused; here is an example.
Consider again the wave function of the measurement problem,
X
Ψ= Ψα , (22.10)
α

the wave function ofPan object and an apparatus after a quantum measurement of
the observable A = αPα . Suppose that Ψα , the contribution corresponding to the
outcome α, is of the form
Ψα = cα ψα ⊗ φα , (22.11)
where cα = kPα ψk, ψ is the initial object wave function ψ, ψα = Pα ψ/kPα ψk, and φα
with kφα k = 1 is a wave function of the apparatus after having measured α. Since the
φα have disjoint supports in configuration space, they are mutually orthogonal; thus,
they are a subset of some orthonormal basis {φn }. The reduced density matrix of the
object is X X
ρΨ = trb |ΨihΨ| = hφn |ΨihΨ|φn i = |cα |2 |ψα ihψα | . (22.12)
n α

This is the same density matrix as the statistical density matrix associated with the
probability distribution µ of the collapsed wave function ψ 0 ,
X
µ= |cα |2 δψα , (22.13)
α

since X
ρµ = |cα |2 |ψα ihψα | . (22.14)
α

It is sometimes claimed that this fact solves the measurement problem. The argu-
ment is this: From (22.10) we obtain (22.12), which is the same as (22.14), which means
that the system’s wave function has distribution (22.13), so we have a random outcome
α. This argument is incorrect, as the mere fact that two situations—one with Ψ as
in (22.10), the other with random ψ 0 —define the same density matrix for the object
does not mean the two situations are physically equivalent. And obviously from (22.10),
the situation after a quantum measurement involves neither a random outcome nor a
random wave function. As John Bell once put it, “and is not or.”

It is sometimes taken as the definition of decoherence that the reduced density ma-
trix is (approximately) diagonal in the eigenbasis of the relevant operator A. In Sec-
tion 11.2 I had defined decoherence as the situation that two or more wave packets
Ψα are macroscopically disjoint in configuration space (and thus remain disjoint for the
relevant future). The connection between the two definitions is that the latter implies
the former if Ψα is of the form (22.11).

It is common to call a density matrix that is a 1-dimensional projection a pure state


and otherwise a mixed state, even if it is a reduced density matrix and thus does not

146
arise from a mixture (i.e., from a probability distribution µ). A reduced density matrix
ρψ is pure if and only if ψ is a tensor product, i.e., there are χa ∈ Ha and χb ∈ Hb such
that ψ = χa ⊗ χb .

22.5 The No-Signaling Theorem


The no-signaling theorem is a consequence of the quantum formalism: If system a is
located in Alice’s lab and system b in Bob’s, and if the two labs do not interact, then the
statistical reduced density matrix ρa of system a is not affected by anything Bob does.
To prove this, we will now verify that (i) ρa is not affected by any quantum mea-
surement Bob performs, and (ii) ρa does not depend on the Hamiltonian of system b
(and thus not on any external fields that Bob may apply to b). Moreover, a no-signaling
theorem holds also for GRW theory, and this conclusion is based on the further fact
that, if a and b do not interact, then ρa is (iii) not affected by any GRW collapse on b.
To verify (i), suppose that systems a and b together have wave function ψ ∈ Ha ⊗Hb ,
and that Bob measures the observable B, which is a self-adjoint operator on Hb . Let
β denote the eigenvalues of B and Pβ the projection to the eigenspace of eigenvalue β.
The probability that Bob obtains the outcome β is

P(Z = β) = hψ|Ia ⊗ Pβ |ψi . (22.15)

If Bob obtains β then ψ collapses to ψ 0 /Z, where ψ 0 = (Ia ⊗ Pβ )ψ and the normalization
factor is given by Z = kψ 0 k = hψ|Ia ⊗ Pβ |ψi1/2 . Thus, the statistical reduced density
matrix of system a is
X |ψ 0 ihψ 0 |
ρa = trb P(Z = β) (22.16)
β
Z2
X h i
= trb (Ia ⊗ Pβ )|ψihψ|(Ia ⊗ Pβ ) (22.17)
β
(vii) X h i
= trb |ψihψ|(Ia ⊗ Pβ ) (22.18)
β
h X i
= trb |ψihψ|(Ia ⊗ Pβ ) (22.19)
β

= trb |ψihψ| = ρψ , (22.20)

the same as what it was before Bob’s measurement.


To verify (ii), note that in the absence of interaction the unitary time evolution

147
operator is Ut = Ua,t ⊗ Ub,t . Thus, the reduced density matrix evolves according to

ρt = trb |Ut ψihUt ψ| (22.21)


h i
= trb Ut |ψihψ|Ut† (22.22)
h i
† †
= trb (Ua,t ⊗ Ub,t )|ψihψ|(Ua,t ⊗ Ub,t ) (22.23)
h i
† †
= trb (Ua,t ⊗ Ib )|ψihψ|(Ua,t ⊗ (Ub,t Ub,t )) (22.24)
h i

= trb (Ua,t ⊗ Ib )|ψihψ|(Ua,t ⊗ Ib ) (22.25)
† †
= Ua,t [trb |ψihψ|]Ua,t = Ua,t ρψ Ua,t , (22.26)

which does not depend on Ub,t . The argument extends without difficulty to statistical
reduced density matrices.
To verify (iii), suppose that ψ is a function of N = Na + Nb variables in R3 , and
that, at a particular time t, a GRW collapse hits particle i which belongs to b, so
Ci (X)ψt−
ψt+ = (22.27)
kCi (X)ψt− k
with random collapse center X chosen with probability density kCi (x)ψt− k2 at x. Then
the statistical reduced density matrix of a after the collapse is given by
Ci (x)|ψt− ihψt− |Ci (x)†
Z
ρt+ = trb d3 x kCi (x)ψt− k2 (22.28)
R3 kCi (x)ψt− k2
Z h i
= d3 x trb Ci (x)|ψt− ihψt− |Ci (x)† (22.29)
R3
Z h i
(vii) 3 †
= d x trb |ψt− ihψt− |Ci (x) Ci (x) (22.30)
R3 Z
h i
3 †
= trb |ψt− ihψt− | d x Ci (x) Ci (x) (22.31)
R3
= trb |ψt− ihψt− | (22.32)

because d3 x Ci (x)† Ci (x) = I as in (12.12)–(12.13). Notice the similarity of this rea-


R

soning with (22.16).


There is one aspect of this no-signaling argument that remains unsatisfactory: we
have considered GRW collapses and collapses for ideal quantum measurements, but
what about non-ideal quantum measurements? What about experiments that are not
associated with a self-adjoint operator but with a POVM? In fact, how do they collapse
the wave function? This will be discussed in the next section.

22.6 Completely Positive Superoperators


Let T RCL(H ) denote the trace class of H . A superoperator means a C-linear mapping
that acts on operators rather than vectors in H , particularly on density matrices; we

148
will here consider superoperators of the form C : T RCL(H1 ) → T RCL(H2 ) (including
the possibility H2 = H1 ).

Definition 22.2. A superoperator C is called completely positive if for every integer


k ≥ 0 and every positive operator ρ ∈ Ck×k ⊗ T RCL(H1 ), (Ik ⊗ C )(ρ) is positive, where
Ik denotes the identity operator on Ck×k .

Here, Ck×k means the space of complex k × k matrices. We note that Ck×k ⊗
T RCL(H1 ) = T RCL(Ck ⊗ H1 ), so Ik ⊗ C maps operators on Ck ⊗ H1 to operators
on Ck ⊗ H2 . For k = 0 the condition says that C maps positive operators (from the
trace class) on H1 to positive operators on H2 . (One might have thought that if C
maps positive operators on H1 to positive operators on H2 , then Ik ⊗ C maps positive
operators on Ck ⊗ H1 to positive operators on Ck ⊗ H2 . However, this is not the case,
which is why we demand it explicitly.)
Completely positive superoperators are also often called completely positive maps
(CPMs). They arise as a description of how a density matrix changes under the col-
lapse caused by an experiment: If ρ is the density matrix before the collapse, then
C (ρ)/tr C (ρ) is the density matrix afterwards. The simplest example of a completely
positive superoperator is
C (ρ) = P ρP , (22.33)
where P is a projection. Note that for a density matrix ρ, C (ρ) is not, in general, a
density matrix because completely positive superoperators do not, in general, preserve
the trace.
In order to establish the complete positivity of a given superoperator, the following
facts are useful: If ρ2 is a density matrix on H2 then the mapping C : T RCL(H1 ) →
T RCL(H1 ⊗ H2 ) given by C (ρ) = ρ ⊗ ρ2 is completely positive. Conversely, the
partial trace ρ 7→ tr2 ρ is a completely positive superoperator T RCL(H1 ⊗ H2 ) →
T RCL(H1 ). For any bounded operator R : H1 → H2 , ρ 7→ RρR† is a completely
positive superoperator T RCL(H1 ) → T RCL(H2 ). The composition of completely
positive superoperators is completely positive. Positive multiples of a completely positive
superoperator are completely positive. Finally, when a family of completely positive
superoperators is summed or integrated over, the result is completely positive.
A canonical form of completely positive superoperators is provided by the

Theorem 22.3. (Theorem of Choi and Kraus)70 For every bounded completely positive
superoperator C : T RCL(H1 ) → T RCL(H2 ) there exist bounded operators Ri : H1 →
H2 so that X
C (ρ) = Ri ρ Ri† , (22.34)
i∈I

where I is a finite or countable index set.


70
M. Choi: Completely Positive Linear Maps on Complex Matrices. Linear Algebra and its Applica-
tions 10: 285–290 (1975).
K. Kraus: States, Effects, and Operations. Berlin: Springer (1983).

149
Here is an analog of the main theorem about POVMs concerning the post-experiment
quantum state.

Main theorem about superoperators.71 With  every experiment E during [ti , tf ] with
finite value space Z is associated a family Cz z∈Z of completely positive superoperators
acting on T RCL(Hobj ) such that, whenever Z = z, the density matrix of the system
after the experiment is
Cz (ρti )
ρtf = . (22.35)
tr Cz (ρti )
Cz is related to the POVM Ez by

tr ρ Ez = tr Cz (ρ) .

(22.36)

Cz is trace-preserving. Explicitly, Cz is given by


P
In particular, z∈Z
 
Cz (ρ) = trapp [Iobj ⊗ Pzapp ]U [ρ ⊗ ρapp ]U † [Iobj ⊗ Pzapp ] , (22.37)

where Pzapp is the projection to the subspace of apparatus states in which the pointer is
pointing to the value z, U the unitary time evolution of object and apparatus together
from ti to tf , and ρapp the density matrix of the ready state of the apparatus.

In other words, the superoperator Cz is obtained by solving the Schrödinger equation


for the apparatus together with the system, then collapsing the joint density matrix as
if applying the collapse rule to a quantum measurement of the pointer position, and
then computing the reduced density matrix of the system.

Now we return to the question of how to prove no-signaling in generality. For this,
we need to consider how an experiment that interacts only with system b of a composite
system a∪b (as in Theorem 22.1 but with the roles of a and b interchanged) will collapse
the state:

Theorem 22.4. In Bohmian mechanics, an experiment in which the apparatus interacts


only with system b but not with system a has completely positive superoperators of the
form
Cz = Ua ⊗ Cb,z , (22.38)
where Ua (ρa ) = Ua ρa Ua† is the unitary evolution from ti to tf on T RCL(Ha ).

Proof. This follows from (22.37) by noting that Hobj = Ha ⊗Hb , as well as Iobj = Ia ⊗Ib
and U = Ua ⊗ Ub,app , and exploiting the rules of the partial trace.
71
For proof and discussion see S. Goldstein, R. Tumulka, and N. Zanghı̀: The Quantum Formalism
and the GRW Formalism. Journal of Statistical Physics 149: 142–201 (2012) http://arxiv.org/abs/
0710.0885

150
To complete the general proof of no-signaling, we now show (iv) that ρa is not affected
a
by any experiment Bob conducts on b. By Theorem 22.4, it suffices to show P that ρ is
not affected, other than through the unitary evolution of a, by applying z Cz with Cz
of the form (22.38). Indeed,
X
ρatf = trb Cz (ρ) (22.39)
z
X
= trb (Ua ⊗ Cb,z )(ρ) (22.40)
z
X
= Ua trb Cb,z (ρ) (22.41)
z
= Ua trb ρ (22.42)

= Ua ρati (22.43)

Cb,z is trace-preserving. This completes the proof.


P
because z

22.7 Canonical Typicality


This is an application of reduced density matrices in quantum statistical mechanics.
The main goal of quantum statistical mechanics is to derive facts of thermodynamics
from a quantum mechanical analysis of systems with a macroscopic number of particles
(say, N > 1020 ). One of the rules of quantum statistical mechanics asserts that if a
quantum system S is in thermal equilibrium at absolute temperature T ≥ 0, then it has
density matrix
1
ρcan = e−βHS , (22.44)
Z
where Hs is the system’s Hamiltonian, β = 1/kT with k = 1.38 · 10−23 J/K the
Boltzmann constant, and Z = tre−βH the normalizing factor; ρcan is called the canonical
density matrix with inverse temperature β.
While this rule has long been used, its justification is rather recent72 and goes as
follows. Suppose that S is coupled to another system B (the “heat bath”), and suppose
that S and B together have wave function ψ ∈ HS ⊗ HB and Hamiltonian H with
pure point spectrum (this comes out for systems confined to finite volume). Let Imc =
[E, E + ∆E] be an energy interval whose length ∆E is small on the macroscopic scale
72
This was discovered by several groups independently: J. Gemmer, G. Mahler, and M. Michel: Quan-
tum Thermodynamics: Emergence of Thermodynamic Behavior within Composite Quantum Systems.
Lecture Notes in Physics 657. Berlin: Springer (2004)
S. Popescu, A. J. Short, and A. Winter: Entanglement and the foundation of statistical mechanics.
Nature Physics 21(11): 754–758 (2006)
S. Goldstein, J.L. Lebowitz, R. Tumulka, and N. Zanghı̀: Canonical Typicality. Physical Review
Letters 96: 050403 (2006) http://arxiv.org/abs/cond-mat/0511091
Preliminary considerations in this direction can already be found in E. Schrödinger: Statistical Ther-
modynamics. Second Edition, Cambridge University Press (1952)

151
but large enough for Imc to contain very many eigenvalues of H; Imc is called a micro-
canonical energy shell. Let Hmc be the corresponding spectral subspace, i.e., the range
of 1Imc (H), and umc the uniform probability distribution over S(Hmc ).

Theorem 22.5. (canonical typicality, informal statement) If B is sufficiently “large,”


and if the interaction between S and B is negligible,

H ≈ HS ⊗ IB + IS ⊗ HB , (22.45)

then for most ψ relative to umc , the reduced density matrix of S is approximately canon-
ical for some value of β, i.e.,
trB |ψihψ| ≈ ρcan . (22.46)

In order to arrive at a typical ψ ∈ S(Hmc ) (and thus at thermal equilibrium between


S and B), it will be relevant to have some interaction between S and B. Large interaction
terms in H, however, will lead to deviations from the form (22.44). It is relevant for
(22.46) that S and B are entangled: If they were not, then the reduced density matrix
of S would be pure, whereas ρcan is usually highly mixed (i.e., has many eigenvalues
that are significantly nonzero).
Canonical typicality explains why we see canonical density matrices: Because “most”
wave functions of S ∪ B lead to a canonical density matrix for S.

152
23 Quantum Logic
The expression “quantum logic” is used in the literature for (at least) three different
things:

• a certain piece of mathematics that is rather pretty;

• a certain analogy between two formalisms that is rather limited;

• a certain philosophical idea that is rather silly.

Logic is the collection of those statements and rules that are valid in every conceivable
universe and every conceivable situation. Some people have suggested that logic simply
consists of the rules for the connectives “and”, “or,” and “not”, with “∀x ∈ M ” an
extension of “and” and “∃x ∈ M ” an extension of “or” to (possibly infinite) ranges M .
I would say that viewpoint is not completely right (because of Gödel’s incompleteness
theorem73 ) and not completely wrong. Be that as it may, let us focus for a moment on
the operations “and” (conjunction A∧B), “or” (disjunction A∨B), and “not” (negation
¬A), and let us ignore infinite conjunctions or disjunctions.
A Boolean algebra is a set A of elements A, B, C, . . . of which we can form A ∧ B,
A ∨ B, and ¬A, such that the following rules hold:

• ∧ and ∨ are associative, commutative, and idempotent (A∧A = A and A∨A = A).

• Absorption laws: A ∧ (A ∨ B) = A and A ∨ (A ∧ B) = A.

• There are elements 0 ∈ A (“false”) and 1 ∈ A (“true”) such that for all A ∈ A ,
A ∧ 0 = 0, A ∧ 1 = A, A ∨ 0 = A, A ∨ 1 = 1.

• Complementation laws: A ∧ ¬A = 0, A ∨ ¬A = 1.

• Distributive laws: A∧(B∨C) = (A∧B)∨(A∧C) and A∨(B∧C) = (A∨B)∧(A∨C).

It follows from these axioms that ¬(¬A) = A, and that de Morgan’s laws hold, ¬A ∨
¬B = ¬(A ∧ B) and ¬A ∧ ¬B = ¬(A ∨ B).
The laws of logic for “and,” “or,” and “not” are exactly the laws that hold in
every Boolean algebra, with A, B, C, . . . playing the role of statements or propositions
or conditions. Another case in which these axioms are satisfied is that A, B, C, . . . are
sets, more precisely subsets of some set Ω, A ∧ B means the intersection A ∩ B, A ∨ B
means the union A ∪ B, ¬A means the complement Ac = Ω \ A, 0 means the empty set
∅, and 1 means the full set Ω. That is, every family A of subsets of Ω that contains
Ω and is closed under complement and intersection (in particular, every σ-algebra) is
a Boolean algebra. (It turns out that also, conversely, every Boolean algebra can be
realized as a family of subsets of some set Ω.)
73
Gödel provided an exampe of a statement that is true about the natural numbers, so it follows
from the Peano axioms, but cannot be derived from the Peano axioms using the standard rules of logic,
thus showing that these rules are incomplete.

153
Now let A, B, C, . . . be subspaces of a Hilbert space H (more precisely, closed sub-
spaces, which makes no difference in finite dimension where every subspace is closed);
let A ∧ B := A ∩ B, A ∨ B := span(A ∪ B) (the smallest closed subspace containing
both A and B), and let ¬A := A⊥ = {ψ ∈ H : hψ|φi = 0 ∀φ ∈ A} be the orthogonal
complement of A; let 0 = {0} be the 0-dimensional subspace and 1 = H the full sub-
space. Then all axioms except distributivity are satisfied. So this structure is no longer
a Boolean algebra; it is called an orthomodular lattice or simply lattice. Hence, a dis-
tributive lattice is a Boolean algebra, and the closed subspaces form a non-distributive
lattice L(H ).
That is nice mathematics, and we will see more of that in a moment. The analogy I
mentioned holds between L(H ) and Boolean algebras, often understood as representing
the rules of logic. The analogy is that both are lattices. In order to emphasize the
analogy, some authors call the elements of L(H ) “propositions” and the operations
∧, ∨, and ¬ “and,” “or,” and “not.” They call L(H ) the “quantum logic” and say
things like, A ∈ L(H ) is a yes-no question that you can ask about a quantum system,
as you can carry out a quantum measurement of the projection to A and get result 0
(no) or 1 (yes).
Here is why the analogy is rather limited. Let me give two examples.
• First, consider a spin- 21 particle with spinor ψ ∈ C2 , and consider the words “ψ
lies in C|upi.” These words sound very much like a proposition, let me call it P,
and indeed they naturally correspond to a subspace of H = C2 , viz., C|upi. Now
the negation of P is, of course, “ψ lies in H \ C|upi,” whereas the orthogonal
complement of C|upi is C|downi. Let me say that again in different words: The
negation of “spin is up” is not “spin is down,” but “spin is in any direction but
up.”

• Second, consider the delayed-choice experiment in the form discussed at the end
of Section 19.4: forget about the interference region and consider just the two
options of either putting detectors in the two slits or putting detectors far away.
The first option has the PVM Pupper slit + Plower slit = I, the second to the PVM
U † Plower cluster U + U † Pupper cluster U = I, where U is the unitary time evolution from
the slits to the far regions where the detectors are placed. The two PVMs are
identical, as U † Plower cluster U = Pupper slit (and likewise for the other projection);
that is, we have two experiments associated with the same observable. If we
think of subspaces as propositions, then it is natural to think of the particle passes
through the upper slit as a proposition and identify it with the subspace A that is
the range of Pupper slit . But if we carry out the second option, detect the particle
in the lower cluster, and say that we have confirmed the proposition A and thus
that the particle passed through the upper slit, then we have committed Wheeler’s
fallacy.
The philosophical idea that I mentioned is that logic as we know it is false, that
it applies in classical physics but not in quantum physics, and that a different kind of
logic with different rules applies in quantum physics—a quantum logic. Why did I call

154
that a rather silly idea? Because logic is, by definition, what is true in every conceivable
situation. So logic cannot depend on physical laws and cannot be revised by empirical
science. As Tim Maudlin once nicely said:

“There is no point in arguing with somebody who does not believe in


logic.”

Bell wrote in Against “measurement” (1989, page 216 in the 2nd edition of Speakable
and unspeakable in quantum mechanics):

“When one forgets the role of the apparatus, as the word “measurement”
makes all too likely, one despairs of ordinary logic—hence “quantum logic.”
When one remembers the role of the apparatus, ordinary logic is just fine.”

Nevertheless, there is more mathematics relevant to L(H ), something analogous to


probability theory. Recall that a probability distribution on a set Ω is a normalized
measure, that is, a mapping µ from subsets of Ω to [0, 1] that is σ-additive and satisfies
µ(1) = µ(Ω) = 1. The domain of definition of µ is a σ-algebra, which is a Boolean
algebra with slightly stronger requirements. By analogy, we define that a normalized
quantum measure is a mapping µ̂ : L(H ) → [0, 1] that satisfies µ̂(1) = µ̂(H ) = 1 and
is σ-additive, i.e.,

_  X ∞
µ̂ An = µ̂(An ) (23.1)
n=1 n=1

whenever An ⊥ Am for all n 6= m. (The relation A ⊥ B can be expressed through lattice


operations as A ≤ (¬B), with A ≤ C defined to mean A ∨ C = C or, equivalently,
A ∧ C = A. In L(H ), A ≤ B ⇔ A ⊆ B.)

Theorem 23.1. (Gleason’s theorem74 ) Suppose the dimension of H is at least 3 and


at most countably infinite. Then the normalized quantum measures are exactly the map-
pings µ̂ of the form
µ̂(A) = tr(ρPA ) ∀A ∈ L(H ) , (23.2)
where PA denotes the projection to A and ρ is a density matrix (i.e., a positive operator
with trace 1).

This amazing parallel between probability measures and density matrices has led
some authors to call elements of L(H ) “events” (as one would call subsets of Ω). Again,
this is a rather limited analogy, for the same reasons as above.

74
A.M. Gleason: Measures on the closed subspaces of a Hilbert space. Indiana University Mathe-
matics Journal 6: 885–893 (1957)

155
24 No-Hidden-Variables Theorems
This name refers to a collection of theorems that aim at proving the impossibility of
hidden variables. This aim may seem strange in view of the fact that Bohmian mechan-
ics is a hidden-variable theory, is consistent and makes predictions in agreement with
quantum mechanics. So how could hidden variables be impossible? A first observa-
tion concerns what is meant by “hidden variables.” Most no-hidden-variable theorems
(NHVTs) address the idea that every observable A (a self-adjoint operator) has a true
value vA in nature (the “hidden variable”), and that a quantum measurement of A yields
vA as its outcome. This idea should sound dubious to you because we have discussed
already that observables are really equivalence classes of experiments, not all of which
yield the same value. Moreover, we know that in Bohmian mechanics, a true value
is associated with position but not with every observable, in particular not with spin
observables. Hence, in this sense of “hidden variables,” Bohmian mechanics is really a
no-hidden-variables theory.
But this is not the central reason why the NHVTs do not exclude Bohmian mechan-
ics. Suppose we choose, in Bohmian mechanics, one experiment from every equivalence
class. (The experiment could be specified by specifying the wave function and configu-
ration of the apparatus together with the joint Hamiltonian of object and apparatus as
well as the calibration function.) For example, for every spin observable n · σ we could
say we will measure it by a Stern-Gerlach experiment in the direction n and subsequent
detection of the object particle. Then the outcome Zn of the experiment is a function
of the object wave function ψ and the objection configuration Q, so we have associated
with every observable n · σ a “true value” which comes out if we choose to carry out the
experiment associated with n · σ. And it is this situation that NHVTs claim to exclude!
So we are back at an apparent conflict between Bohmian mechanics and NHVTs.
It may occur to you that even a much simpler example than Bohmian mechanics will
prove the possibility of hidden-variable theories. Suppose we choose, as a trivial model,
for every self-adjoint operator A a random value vA independently of all other vA0 with
the Born distribution,
P(vA = α) = kPα ψk2 . (24.1)
Then we have not provided a real theory of quantum mechanics as Bohmian mechanics
provides, but we have provided a clearly consistent possibility for which values the
variables vA could have that agrees with the probabilities seen in experiment. Therefore,
all NHVTs must make some further assumptions about the hidden variables vA that are
violated in the trivial model as well as in Bohmian mechanics. We now take a look at
several NHVTs and their assumptions.

24.1 Bell’s NHVT


Bell’s theorem implies a NHVT, or rather, the second half of Bell’s 1964 proof is a
NHVT. Let me explain. In the trivial model introduced around (24.1), we have not
specified how the vA change with time. They may change according to some law under

156
the unitary time evolution; more importantly for us now, they may change whenever ψ
collapses. That is, when a quantum measurement of A is carried out, we should expect
the vA0 (A0 6= A) to change. However, there is an exception if we believe in locality.
Then we should expect that Alice’s measurement of α · σ a (on her particle a) will not
alter the value of any spin observable β · σ b acting on Bob’s particle. But Bell’s analysis
shows that this is impossible.
We can sum up this conclusion and formulate it mathematically as the following
theorem. According to the hidden variable hypothesis, every observable A from a certain
collection of observables has an actual value vA . This value will be different in every
run of the experiment, it will be random; so vA is a random variable. Since in each
run each vA has a definite value, the random variables vA possess a joint probability
distribution. (Mathematicians also sometimes express this situation by saying that “the
random variables vA are defined on the same probability space.”)

Theorem 24.1. (Bell’s NHVT, 1964) Consider a joint distribution of random variables
vA , where A runs through the collection of observables

A ∪ B = α · σ a : α ∈ S(R3 ) ∪ β · σ b : β ∈ S(R3 ) .
 
(24.2)

Suppose that a quantum measurement of A ∈ A yields vA and does not alter the value
of vB for any B ∈ B, and that a subsequent quantum measurement of B ∈ B yields vB .
Then the joint distribution of the outcomes satisfies Bell’s inequality (17.30). In partic-
ular, it disagrees with the distribution of outcomes predicted by the quantum formalism.

In short, local hidden variables are impossible. (Here, “local” means that vA cannot
be affected at spacelike separation. As we have discussed in Section 18.2, it should be
kept in mind that Bell’s full proof, of which Bell’s NHVT is just a part, shows that all
local theories are impossible.)

24.2 Von Neumann’s NHVT


John von Neumann presented a NHVT in his 1932 book.75 It is clear that for a hidden-
variable model to agree with the predictions of quantum mechanics, every vA can only
have values that are eigenvalues of A, and its marginal distribution must be the Born
distribution. Von Neumann assumed in addition that whenever an observable C is a
linear combination of observables A and B,

C = αA + βB , α, β ∈ R , (24.3)

then vC is the same linear combination of vA and vB ,

vC = αvA + βvB . (24.4)


75
J. von Neumann: Mathematische Grundlagen der Quantenmechanik. Berlin: Springer-Verlag
(1932). English translation by R. T. Beyer published as J. von Neumann: Mathematical Foundation of
Quantum Mechanics. Princeton: University Press (1955)

157
Theorem 24.2. (von Neumann’s NHVT, 1932) Suppose 2 ≤ dim H < ∞ and ψ ∈
S(H ), let A be the set of all self-adjoint operators on H , and consider a joint dis-
tribution of random variables vA for all A ∈ A . Suppose that (24.4) holds whenever
(24.3) does. Then for some A the marginal distribution of vA disagrees with the Born
distribution associated with A and ψ.

As emphasized by Bell,76 there is no reason to expect (24.4) to hold. For example,


let H = C2 , A = σ1 , B = σ3 , and C the spin observable in the direction at 45◦ between
the x- and the z-direction; then C = √12 A + √12 B. However, the obvious experiment
for C is the Stern-Gerlach experiment in direction n = ( √12 , 0, √12 ), whereas those for A
and B would have the magnetic field point in the x- and the z-direction. Of course, the
experiment for C is not based on measuring A and B and then combining their results,
but is a completely different experiment. Thus, there is no reason to expect that its
outcome was a linear combination of what we would have obtained, had we applied
a magnetic field in the x- or the z-direction. So von Neumann’s assumption is not a
reasonable one.

24.3 Gleason’s NHVT


As a by-product of Andrew Gleason’s proof of Theorem 23.1, one can obtain a NHVT
that uses the following assumption that is more reasonable than von Neumann’s: When-
ever A and B commute and C = αA+βB is a real linear combination, then (24.4) holds,
vC = αvA + βvB .
The difference is that Gleason restricts the assumption to commuting A and B,
whereas von Neumann did not. I will explain in Corollary 24.5 below why that is more
reasonable. At this point, we note that Gleason makes a weaker assumption (as he
demands (24.4) in fewer cases), so his theorem is stronger (except in dimension 2). It
can also be formulated without talking about probabilities, and it suffices to consider
α = 1 = β.

Theorem 24.3. (Gleason’s NHVT, 1957) Suppose 3 ≤ dim H < ∞, and let A be the
set of all self-adjoint operators on H . There is no mapping v : A → R with the two
properties that (i) vA ∈ spectrum(A) for all A ∈ A and (ii) whenever AB = BA for
A, B ∈ A , then vA+B = vA + vB .

Put a little differently:

Corollary 24.4. Suppose 3 ≤ dim H < ∞ and ψ ∈ S(H ), let A be the set of all
self-adjoint operators on H , and consider a joint distribution of random variables vA
for all A ∈ A . Suppose that vA+B = vA + vB whenever AB = BA. Then for some A
the marginal distribution of vA disagrees with the Born distribution associated with A
and ψ.
76
J.S. Bell: On the problem of hidden variables in quantum mechanics. Reviews of Modern Physics
38: 447–452 (1966)

158
Proof. The corollary follows from the theorem because the Born distribution enforces
that vA ∈ spectrum(A) with probability 1, so the joint distribution of the vA must be
concentrated on those mappings v : A → R satisfying (i) and (ii), but that is the empty
set.
The motivation for believing in Gleason’s assumption comes from the idea that while
in general a quantum measurement of A may change the values of vB for B 6= A, this
should not happen if A and B can be “simultaneously measured.” It is contained in the
following corollary:

Corollary 24.5. Suppose 3 ≤ dim H < ∞ and ψ ∈ S(H ), let A be the set of all self-
adjoint operators on H , and consider a joint distribution of random variables vA for all
A ∈ A . Suppose that whenever A, B ∈ A commute, then a quantum measurement of A
yields vA and does not alter the value of vB , and that a subsequent quantum measurement
of B yields vB . Then the joint distribution of the outcomes disagrees for some commuting
observables with the distribution of the outcomes predicted by the quantum formalism
using ψ.

Proof. To see how this follows from Corollary 24.4, consider quantum measurements
of A, B, and A + B, where AB = BA. By Theorem 13.1 (the spectral theorem for
commuting self-adjoint operators), there is a ONB of joint eigenvectors φn of A and B.
But if Aφn = αn φn and Bφn = βn φn , then (A + B)φn = (αn + βn )φn . Since A + B
commutes with both A and B, the A-measurement (yielding vA ) does not change vB or
vA+B , the subsequent B-measurement (yielding vB ) does not change vA or vA+B , and
the final A + B-measurement yields vA+B . If the distribution of outcomes agreed with
the quantum prediction, we would have to have vA+B = vA + vB . But that is excluded
by Corollary 24.4.
Corollary 24.5 can also be obtained from Theorem 24.1 (Bell’s NHVT): indeed, any
α · σ a commutes with any β · σ b , so the assumption of Theorem 24.1 is satisfied under
the assumption of Corollary 24.5. In particular, the assumption of Corollary 24.5 is
violated in any nonlocal hidden-variable theory.
Gleason’s Theorem 24.3 is also often called the Kochen–Specker theorem because
Simon Kochen and Ernst Specker gave a proof of it in 196777 that is very different
from Gleason’s proof. (Kochen and Specker originally stated stronger assumptions than
Theorem 24.3, but their proof could be so formulated that it yields Theorem 24.3.)
Further proofs of Theorem 24.3 were given by Specker (1960)78 and Bell (1966, op.cit.);
simpler proofs by Mermin (1990)79 and Peres (1991).80
77
S. Kochen and E.P. Specker: The Problem of Hidden Variables in Quantum Mechanics. Journal
of Mathematics and Mechanics 17: 59–87 (1967)
78
E. Specker: Die Logik nicht gleichzeitig entscheidbarer Aussagen. Dialectica 14: 239–246 (1960)
79
N.D. Mermin: Simple unified form for the major no-hidden-variables theorems. Physical Review
Letters 65: 3373–3376 (1990)
80
A. Peres: Two simple proofs of the Kochen-Specker theorem. Journal of Physics A: Mathematical
and General 24: L175–L178 (1991)

159
The fact that the outcome of a quantum measurement of A may depend on which
other observable B commuting with A is “measured” simultaneously with A is some-
times called “contextuality” in the literature (cf. Section 9.9). Correspondingly, a theory
satisfying the assumption of Corollary 24.5 (and whose empirical predictions therefore
deviate from the quantum formalism) is called a theory of non-contextual hidden vari-
ables.

160
25 Special Topics
25.1 The Decoherent Histories Interpretation
Another view of quantum mechanics was proposed by Murray Gell-Mann, James Hartle,
Bob Griffiths, and Roland Omnès under the name decoherent histories or consistent
histories.81
Even before I describe this view, I need to say that it fails to provide a possible
way the world may be, or a realist picture of quantum mechanics. The ultimate reason
is, in my humble opinion, that the proponents of this view do not think in terms of
reality. Rather, they think in terms of words and phrases, and a central element of
their interpretation is to set up rules for which phrases are legitimate or justified. In
realist theories, words refer to objects in reality (like “particle” in Bohmian mechanics),
and phrases or statements can be true and justified because they express a fact about
reality. Now the spirit of decoherent histories is more that statements are just sequences
of words, and since you have rules for which statements to regard as justified, you do
not think about the situation in reality. The proponents of this view have no consistent
picture of reality in mind, and no such picture is in sight. So if you are looking for such
a picture, you will be disappointed.
The motivation for this view comes from the fact that quantum mechanics provides
via the Born rule a probability distribution over configurations (or the index set of
another ONB in Hilbert space H ) at a fixed time t but not over histories (such as
paths in configuration space). It may seem that if quantum mechanics provided a
probability distribution over histories, then the interpretation of quantum mechanics
would be straightforward: one of these histories occurs, and it occurs with the probability
dictated by quantum mechanics. Now “histories” is taken to mean not just paths in
configuration space, but the following broader concept: Consider, for simplicity, just a
finite set of times {t1 , t2 , . . . , tr }, and for each ti an ONB {φin : n ∈ N} of H (such as the
eigenbasis of an observable); now one talks of the ray Cφin as an “event” at time ti , and
a “history” now means a list (Cφ1n1 , . . . , Cφrnr ) of such events, or briefly just the indices
(n1 , . . . , nr ). Then, for some choices of ONBs, the Born rule does provide a probability
distribution over the set of histories (n1 , . . . , nr ): Suppose that the unitary time evolution
is given by Ut (t ∈ R), and that φi+1,n = Uti+1 −ti φin for all i ∈ {1, . . . , r − 1} and n ∈ N.
Then, trivially, an initial wave function agreeing with some φ1n at t1 would agree with
some basis vector at each ti . Furthermore, an arbitrary initial wave function ψ ∈ H
with kψk = 1 at t1 − defines a probability distribution over n and thus a probability
distribution over histories, i.e.,
2
P(n1 , . . . , nr ) = δn1 n2 δn2 n3 · · · δnr−1 nr hφ1n1 |ψi . (25.1)

One can be more general by allowing for coarse graining. Suppose we allow a sequence
of subspaces Y := (K1 , . . . , Kn ) of arbitrary dimension as a description of a history; let
81
See R.B. Griffiths: Consistent Quantum Theory. Cambridge University Press (2002) and references
therein.

161
Pi denote the projection to Ki and Ui := Uti −ti−1 . If we made quantum measurements
of Pi at all ti , then the probability of obtaining 1111...1 would be

kK(Y )ψk2 with K(Y ) = Pn Un · · · P1 U1 . (25.2)

The decoherent histories approach uses the same formula (25.2) to assign probabilities
to histories Y , but only to histories belonging to special families F of histories, so
called decoherent families, which are closed under coarse graining and for which (25.2)
is additive in Y . A calculation shows that that is the case whenever

Re hψ|K(Y )† K(Y 0 )|ψi = 0 ∀Y, Y 0 ∈ F , (25.3)

a condition called the decoherence condition.


Now the decoherent histories interpretation postulates that for decoherent families,
the statement “The history Y has probability (25.2)” is justified; for families that are
not decoherent, in contrast, it is postulated that there simply do not exist probabilities.
For example, in Wheeler’s delayed choice experiment (Section 6.5) with the screen in
the far position, consider as t1 the time when the electron passes through the double
slit, t2 the time when the electron arrives at the screen, φ11 the wave packet in the upper
slit and φ12 in the lower, φ21 the wave packet at the upper cluster on the screen and φ22
at the lower, and ψ = ψt1 = √12 φ11 + √12 φ12 . Since the unitary evolution is φ11 → φ22
and φ12 → φ21 , the decoherent histories view attributes

prob. 21 to “passed the upper slit and arrived at the lower cluster” (25.4)
prob. 0 to “passed the lower slit and arrived at the lower cluster” (25.5)
prob. 0 to “passed the upper slit and arrived at the upper cluster” (25.6)
prob. 12 to “passed the lower slit and arrived at the upper cluster.” (25.7)

You can see that the decoherent histories view commits Wheeler’s fallacy (see Sec-
tion 6.5). But this is not the main problem.
The main problem is that the decoherent histories view does not commit itself to any
particular decoherent family. If there was only one decoherent family, we could assume
that nature chooses one history from that family with the probabilities given above, and
that that history represents the reality. But since there are many decoherent families,
it remains unclear what the reality is supposed to be. Does nature choose one history
from each family? If the electron went through the upper slit in one family, does it have
to go through the upper slit in all other families containing this event? As Goldstein82
pointed out, the no-hidden-variables theorems imply that this is impossible. Then which
family is the one connected to our reality? (And, by the way, why bother about other
families?) The proponents of decoherent histories have no answer to this, and that is, I
think, because they do not think in terms of reality.
I also note that the motivation of the decoherent histories approach is problematical
as it takes for granted that events should correspond mathematically to eigenspaces of
82
S. Goldstein: Quantum Theory Without Observers. Physics Today, Part One: March 1998, 42–46.
Part Two: April 1998, 38–42.

162
observables. I have pointed out in Section 23 on quantum logic why that analogy is a bad
one. By relying on it, the decoherent histories approach takes the words “observable”
(and “measurement”) too literally, as if they really were observable quantities (and as
if “measurements” were procedures to find their values).

25.2 Nelson’s Stochastic Mechanics


A theory similar to Bohmian mechanics was proposed by Edward Nelson in 1966 and is
known under the name stochastic mechanics.83 The theory uses a particle ontology but
replaces Bohm’s deterministic equation of motion by a stochastic law of motion that
can be written in the form

dQt = uψt (Qt ) dt + σ dWt . (25.8)

Before I explain this equation, let me say something about the solution t 7→ Q(t). It is a
continuous curve in configuration space R3N , and it is a realization of a stochastic process,
which means that random decisions are made during the motion, in fact continuously
in time, so that Q(t1 ) does not fully determine Q(t2 ) at any t2 > t1 . This process is
designed in such a way that, at every time t,

Q(t) ∼ |ψt |2 . (25.9)

The type of motion of Q is called a diffusion process. The simplest and best known
diffusion process is the Wiener process Wt . The Wiener process in 1d can be obtained
as the limit ∆t → 0 of the following random walk Xt : In each time step ∆t, √ let Xt
move either upward or downward, each with probability 1/2, by the amount ∆t (see
Figure 25.1), so √
Xt+∆t = Xt ± ∆t (25.10)
for t ∈ ∆tZ. For times between t and t + ∆t, we may keep X constant (so it jumps at t
and t + ∆t) or define it to increase/decrease linearly (and thus be continuous but have
kinks at t and t+∆t as in Figure 25.1); both choices will converge to the same trajectory
t 7→ Wt in the limit ∆t → 0. It turns out that the trajectory t 7→ Wt is everywhere
continuous but nowhere differentiable; its velocity is, so to speak, always either +∞ or
−∞; the trajectory is a very jagged curve reminiscent of the prices at the stock market.
Now a diffusion process is a deformed Wiener process; it is the limit ∆t → 0 of a
deformed random walk given by

Xt+∆t = Xt + u(t, Xt ) ∆t ± σ(t, Xt ) ∆t , (25.11)
83
E. Nelson: Derivation of the Schrödinger Equation from Newtonian Mechanics. Physical Review
150: 1079 (1966)
E. Nelson: Quantum Fluctuations. Princeton University Press (1985)
S. Goldstein: Stochastic Mechanics and Quantum Theory. Journal of Statistical Physics 47: 645–667
(1987)

163
∆t
t

− ∆t

Figure 25.1: Realization of a random walk. The Wiener process is the limit ∆t → 0 of
such a process.

where the value σ ≥ 0, called the diffusion constant or volatility, characterizes the
strength of the random fluctuations, and u, called the drift, represents a further con-
tribution to the motion that would remain in the absence of randomness. A diffusion
process can thus have a tendency to move in a certain direction, and to fluctuate less in
some regions than in others. The realizations of diffusion processes are still very jagged
curves; in fact, stock prices are often modeled using diffusion processes. A diffusion
process is characterized by its drift and volatility, and therefore a common notation for
it is
dXt = u(t, Xt ) dt + σ(t, Xt ) dWt , (25.12)
where dXt = Xt+dt − Xt , and correspondingly dWt for a Wiener process. The Wiener
process and diffusion processes exist in any dimension d; then u becomes a d-dimensional
vector and σ a d×d matrix. Equations of the type (25.12) are called stochastic differential
equations. The probability density ρt of Xt evolves with time t according to the Fokker–
Planck equation
d d
∂ρ X   1 X h i
=− ∂i ui ρ + 2 ∂i ∂j σik σjk ρ (25.13)
∂t i=1 i,j,k=1

with ∂i = ∂/∂xi , a version of the continuity equation with probability current


X
ji = ui ρ − 12 ∂j [σik σjk ρ]. (25.14)
jk

In the case of Nelson’s stochastic mechanics, σ is chosen to be a constant multiple


of the identity matrix and
∇ψ ∇ψ
uψ = m~ Im + σ 2 Re . (25.15)
ψ ψ

164
It then follows that ji as in (25.14) is given by

ji = ~
m
Im(ψ ∗ ∂i ψ) + σ 2 Re(ψ ∗ ∂i ψ) − 12 σ 2 ∂i (ψ ∗ ψ) (25.16)
= ~
m
Im(ψ ∗ ∂i ψ) . (25.17)

It now follows further from the Schrödinger equation and the Fokker–Planck equation
(25.13) that |ψ|2 is preserved, (25.9). As a consequence, the empirical predictions of
stochastic mechanics agree with the quantum formalism. Furthermore, it solves the
quantum measurement p problem in the same way as Bohmian mechanics. Nelson pro-
posed the value σ = ~/m, but actually any value yields a possible theory, so stochastic
mechanics provides a 1-parameter family of theories. For σ = 0, we obtain Bohmian
mechanics, and in the limit σ → ∞, the fluctuations become so extreme that Qt+dt is
independent of Qt , so we obtain Bell’s second many-world theory (see Section 14.4).
A remarkable fact is that Bohmian mechanics and stochastic mechanics are two the-
ories that make exactly the same empirical predictions; they are empirically equivalent.
Well, one could also say that orthodox quantum mechanics and Bohmian mechanics
make exactly the same predictions; on the other hand, orthodox quantum mechanics
is not a theory in the sense of providing a possible way the world may be, so perhaps
the example of empirical equivalence provided by stochastic mechanics is a more serious
one. This equivalence means that there is no experiment that could test one theory
against the other—another limitation to knowledge.
So how can we decide between these two theories, if there is no way of proving one
of them wrong while keeping the other? There can still be theoretical grounds for a
decision. When we try to extend the theories to relativistic space-time, quantum elec-
trodynamics, or quantum gravity, one theory might fare better than the other. But
already in non-relativistic quantum mechnics, one might be simpler, more elegant, or
more convincing than the other. For example, solutions of ODEs are mathematically
a simpler concept than diffusion processes. Here is another example: In a macroscopic
superposition such as Schrödinger’s cat, the wave function is not exactly zero in con-
figuration space between the regions corresponding to a live and a dead cat, and as a
consequence for large diffusion constant σ, Nelson’s configuration Qt will fluctuate a lot
and also repeatedly pass through regions of small |ψ|2 ; in fact, it will repeatedly switch
from one packet to another, and thus back and forth between a live and a dead cat.
Resurrections are possible, even likely, even frequent, if σ is large enough. As discussed
in Section 14.4 in the context of Bell’s second many-worlds theory, we may find that
hard to believe and conclude that smaller values of σ are more convincing than larger
ones. In Bohmian mechanics, in fact, the configuration tends to move no more than
necessary to preserve the |ψ|2 distribution. These reasons contribute to why Bohmian
mechanics seems more attractive than stochastic mechanics.
Nelson’s motivation for stochastic mechanics was a different one. He defined a
“stochastic derivative” of the non-differentiable trajectories, found that it satisfies a
certain equation, and hoped that the process Qt could be characterized without the use
of wave functions and the Schrödinger equation. Nelson hoped that the Schrödinger
equation would not have to be postulated but would somehow come out. However,

165
these hopes did not materialize, and the only known way to make sense of stochastic
mechanics is to assume, as in Bohmian mechanics, that the wave function exists as an
independent object and guides the configuration. But then the theory has not much of
an advantage over Bohmian mechanics.

166
26 Identical Particles
There are two more rules of the quantum formalism that we have not covered yet:
the symmetrization postulate for identical particles, also known as the boson–fermion
alternative, and the spin-statistics rule. They are the subject of this section. We begin
by stating them.

26.1 Symmetrization Postulate


There are several species of particles: electrons, photons, quarks, neutrinos, muons, and
more. Particles belonging to the same species are said to be identical.

Symmetrization postulate. If particle i and j are identical, then

ψ...si ...sj ... (...xi ...xj ...) = ±ψ...sj ...si ... (...xj ...xi ...) , (26.1)

where the right-hand side has indices si and sj interchanged, variables xi and xj inter-
changed, and all else are kept equal. Some species, called bosonic, always have the plus
sign; the others, called fermionic, always minus.

Particles belonging to a bosonic species are called bosons, those belonging to a


fermionic species fermions.

Spin-statistics rule. Species with integer spin are bosonic, those with half-odd spin
are fermionic.

It follows that for a system of N identical fermions and any permutation σ of


{1, . . . , N } (i.e., any bijective mapping {1, . . . , N } → {1, . . . , N }),

ψsσ(1) ...sσ(N ) (xσ(1) ...xσ(N ) ) = (−)σ ψs1 ...sN (x1 ...xN ) , (26.2)

where (−)σ denotes the sign of the permutation σ, i.e., +1 for an even permutation
and −1 for an odd one. (In the following, the word “permutation” will always refer
to a permutation of the particles.) Since for any two permutations σ, ρ, (−)σ◦ρ =
(−)σ (−)ρ , and any transposition (i.e., exchange of two elements of {1, . . . , N }) is odd,
a permutation is even if and only if it can be obtained as the composition of an even
number of transpositions. A function on R3N = (R3 )N satisfying (26.2) is also said to
be anti-symmetric under permutations, while a function satisfying (26.2) without the
factor (−)σ , as appropriate for a system of N idential bosons, is said to be symmetric.
In L2 ((R3 )N , (Cd )⊗N ), the anti-symmetric functions form a subspace Hanti , and the
symmetric ones form a subspace Hsym ; it is easy to see that for N > 1, Hanti ∩ Hsym =
{0}. In fact, Hanti ⊥ Hsym . The projection Panti to Hanti can be expressed as

1 X
Panti = (−)σ Πσ , (26.3)
N ! σ∈S
N

167
where SN denotes the group of all permutations of {1, . . . , N } (which has N ! elements)
and Πσ is the unitary operator on L2 ((R3 )N , (Cd )⊗N ) that carries out the permutation
σ,
(Πσ ψ)s1 ...sN (x1 ...xN ) = ψsσ(1) ...sσ(N ) (xσ(1) ...xσ(N ) ) . (26.4)
Likewise, the projection Psym to Hsym is
1 X
Psym = Πσ . (26.5)
N ! σ∈S
N

The Pauli principle is another name for the statement for any fermionic species such
as electrons that the wave function of N identical particles has to be anti-symmetric.
Sometimes people express it by saying that “two fermions cannot occupy the same state”;
this is a very loose way of speaking that would not convey the situation to anyone who
does not understand it already, as a particle belonging to an N -particle system does not
have a state (i.e., a wave function) of its own, only the system has a wave function.
It may seem surprising that not every wave function on (R3 )N is physically possible.
On the other hand, it may seem natural that wave functions of identical particles have
to be symmetric, and thus surprising that they can also be anti-symmetric. In fact, it
seems surprising that there can be two different kinds of identical particles!
To some extent, explanations of these facts are known; this is what we will talk about
in this chapter. The core of the reasoning concerns topology and can be generalized
to arbitrary connected Riemannian manifolds; this will be described in Appendix A.
Another question that arises and will be discussed in this chapter is whether and how
theories such as Bohmian mechanics and GRW are compatible with identical particles.

26.2 Schrödinger Equation and Symmetry


If the initial wave function satisfies the symmetrization postulate, and if the Hamiltonian
is invariant under permutations of particles of the same species, then the wave function
automatically satisfies the symmetrization postulate at every other time. Specifically,
for a system of N identical particles, a Hamiltonian H on L2 ((R3P )N ) is permutation
invariant if and only if it commutes with every Πσ . For example, − N 2 2
i=1 (~ /2mi )∇i is
permutation invariant if all masses are equal; a multiplication operator V is permutation
invariant if the function V : (R3 )N → R is, i.e.,
V (xσ(1) , . . . , xσ(N ) ) = V (x1 , . . . , xN ). (26.6)
The sum and exponentials of permutation invariant operators are permutation invariant;
permutation invariant operators commute with Panti and Psym ; therefore, they map Hanti
to Hanti and Hsym to Hsym , as claimed.

26.3 The Space of Unordered Configurations


Another basic observation in this context is that the elements of the space R3N = (R3 )N
that we usually take as configuration space are ordered configurations (x1 , . . . , xN ), i.e.,

168
N -tuples of points in R3 . In nature, of course, electrons are not ordered; that is, they
are not numbered from 1 to N , and there is no fact about which electron is electron
number 1. So in reality, there is an unordered configuration {x1 , . . . , xN }, a set of N
points in R3 . The set of all unordered configurations of N particles will henceforth be
denoted
N 3
R := q ⊂ R3 : #q = N .

(26.7)
Another way of representing an unordered configuration mathematically is to con-
sider ordered configurations but declare that two configurations that are permutations
of each other are equivalent. Then an unordered configuration, and thus a physical
configuration, corresponds to an equivalence class of ordered configurations. Such an
equivalence class has N ! elements unless the ordered configuration contains two particles
at the same point in 3-space (“collision configuration”). Since the collision configura-
tions are exceptions (they form a set of measure zero in R3N ) and will not play a role in
the following, we will remove them from the ordered configuration space and consider
the set of collision-free configurations,

R3,N 3 N

6= := (x1 , . . . , xN ) ∈ (R ) : xi 6= xj ∀i 6= j (26.8)
[
= (R3 )N \ ∆ij (26.9)
1≤i<j≤N

with ∆ij ⊂ (R3 )N the set where xi = xj , a codimension-3 subspace (the ij-“diagonal”).
So, the “forgetful mapping”

π : (x1 , . . . , xN ) 7→ {x1 , . . . , xN } , (26.10)

which forgets the ordering, maps R3,N 6= to N R3 ; it is many-to-one, in fact always N !-to-
one.
The unordered configuration space N R3 inherits a topology via the mapping π. For
readers familiar with manifolds, I mention that π carries the manifold structure from
R3,N
6= to N R3 , as well as the metric; as a consequence, N R3 is a Riemannian manifold.84
It has curvature zero but is topologically non-trivial. We will investigate its topology
more closely in Appendix A.

26.4 Identical Particles in Bohmian Mechanics


In view of the fact that in reality, particle configurations are unordered, the Bohmian
configuration Q(t) should be an element of N R3 , but Bohmian mechanics as we defined
it in Section 6 leads to curves t 7→ Q(t)
b in R3N . But that is not a problem, for the
84
Indeed, it is known that for a discrete group G acting on a manifold M by diffeomorphisms, the
quotient space M/G is again a manifold if the action is “properly discontinuous,” a property equivalent
to that any two points x 6= y of M have open neighborhoods Ux and Uy such that there are only a
finite number of group elements g with g(Ux ) meeting Uy . This is always satisfied if G is finite. When,
in addition, M is a Riemannian manifold and G acts by isometries, then M/G inherits a Riemannian
metric. This is the case with the action of the permutation group on M = R3,N 6= .

169
following reason. Given any initial unordered configuration Q(0) ∈ N R3 , there are N !
possible orderings Q(0)
b ∈ R3N of it. They lie in the set π −1 (Q(0)), where

π −1 (q) := q̂ ∈ R3,N

6= : π(q̂) = q . (26.11)

If Q
b1 (0) and Q
b2 (0) are two orderings, then they are related through some permutation
σ,
Qb2 (0) = σ Q
b1 (0) , (26.12)
where we used the notation

σ(x1 , . . . , xN ) = (xσ(1) , . . . , xσ(N ) ) . (26.13)

Suppose that H is permutation invariant and the wave function is either symmetric or
anti-symmetric at t = 0 and thus at any t. If we solve Bohm’s equation of motion on
the ordered configuration space R3N ,

dQ
b ψ ∗ ∇ψ
= v̂ ψ (Q)
b with v̂ ψ = ~
m
Im , (26.14)
dt ψ∗ψ

we obtain curves t 7→ Q b1 (t) and t 7→ Q


b2 (t) with the property that Q
b1 (t) and Q
b2 (t) are
still, for any t ∈ R, related through the same permutation σ,

Q
b2 (t) = σ Q
b1 (t) . (26.15)

Indeed, that follows from


v̂(σ Q)
b = σv̂(Q)
b , (26.16)
a consequence of the (anti-)symmetry of ψ.
As a consequence of (26.15), π(Q b2 (t)) = π(Q
b1 (t)). That is, Bohmian mechanics
defines for every Q(0) ∈ N R3 a unique trajectory t 7→ Q(t) in N R3 . Put differently, the
arbitrary choice of ordering does not affect the motion of the particles.
A different perspective on this fact is that Bohm’s law of motion can also be formu-
lated directly on the unordered space N R3 in the form
dQ
= v ψ (Q(t)) (26.17)
dt
where v ψ is a vector field on the manifold N R3 obtained from the vector field v̂ on R3,N
6=
by “projecting down” using the projection mapping π. Technically speaking, it is the
tangent mapping Dπ (also known as the differential or total derivative of π), applied to
v̂(q̂), that yields v(q),
v(q) := Dπ|q̂ (v̂(q̂)) . (26.18)
It is crucial that different orderings q̂ ∈ π −1 (q) of q yield the same vector v(q); this
fact is expressed by the formula (26.16), and in words it means that although different
orderings assign different numbers to each particle, they agree, if one of the particles in
located at x ∈ R3 , about the 3-velocity of the particle at x.

170
Conversely, this fact can be regarded as an explanation of the boson–fermion alter-
native: Since for a general ψ on R3N that is neither symmetric nor anti-symmetric, the
vector field v̂ ψ defined by (26.14) violates (26.16), it fails to define a vector field v on
N 3
R (or, for that matter, trajectories t 7→ Q(t) in N R3 ). Thus, a wave function ψ of N
identical particles should be either symmetric or anti-symmetric.
It may seem surprising that Bohmian mechanics can get along at all with identical
particles, for the following reason. Some authors have proposed that the reason why
general, asymmetric wave functions on R3N are unphysical is the impossibility to decide
which of the electrons at time t1 is which of the electrons at time t2 ; if electrons had
trajectories, then that would define which electron at t1 is which electron at t2 ; since in
orthodox quantum mechanics, electrons don’t have trajectories, there is a symmetriza-
tion postulate in quantum mechanics but not in classical mechanics. We have seen why
this reasoning is questionable.
In fact, there is a sense in which Bohmian mechanics gets along better with identical
particles than orthodox quantum mechanics: While the space Q of physically possible
configurations is the unordered one N R3 , the space Q b on which the wave function is
3N
defined is the ordered one R . In Bohmian mechanics, it is not necessary that Q be
the same as Q, b as long as Q belongs to Q and ψ on Q b still defines a vector field v ψ on
Q. In orthodox quantum mechanics, in contrast, there is no element of the ontology
that could bridge between Q and Q, b so it remains unintelligible how ψ could be defined
on any other space than Q.

26.5 Identical Particles in GRW Theory


GRW theory as formulated in Section 12 has the following problem with the symmetriza-
tion postulate: A collapse of the wave function, corresponding to multiplication by a
Gaussian function in one of the xj as in (12.14) and (12.15), usually leads to a wave
function ΨT + that no longer obeys (26.1) (in particular, for identical particles, no longer
is symmetric or anti-symmetric). Thus, to accommodate the symmetrization postulate,
the equations of GRW theory need to be adjusted. Here is how.85
For a universe with N particles, collapses occur with rate N λ. If the number r of
different species of particles is greater than 1, then the species I of a collapse is chosen
randomly with
P(I = i) = Ni /N , (26.19)
where Ni is the number of particles of species i. Equivalently, for each species i, collapses
occur with rate Ni λ. Define the collapse operator for species I and center X by
!1/2
X
CI (X)Ψ(x1 , . . . , xN ) = gX,σ (xj ) Ψ(x1 , . . . , xN ) , (26.20)
j∈II

85
C. Dove and E. Squires: Symmetric versions of explicit wavefunction collapse models. Foundations
of Physics 25: 1267–1282 (1995)
R. Tumulka: On Spontaneous Wave Function Collapse and Quantum Field Theory. Proceedings of
the Royal Society A 462: 1897–1908 (2006) http://arxiv.org/abs/quant-ph/0508230

171
where II is the set of the labels of all particles belonging to species I. For a collapse at
time T , choose the location X of the flash randomly with density

ρ(X = x) = kCI (x)ΨT − k2 (26.21)

and collapse the wave function according to

CI (X)ΨT −
ΨT + = . (26.22)
kCI (X)ΨT − k

For example, for N identical particles, instead of multiplying by a Gaussian function


in one xj , we multiply by the square root of the sum of N Gaussians, all with the
same center and width but applied to different variables. Since thePcollapse operator
is a multiplication operator by a permutation invariant function ( j g(xj ))1/2 , it is
a permutation invariant operator and thus maps Hsym → Hsym and Hanti → Hanti .
Likewise, ΨT + given by (26.22) still satisfies (26.1). The empirical predictions of this
symmetrized version of the GRW theory are not exactly the same as those of the version
described in Section 12, but both are close to those of the quantum formalism.

172
A Topological View of the Symmetrization Postu-
late
This appendix is mathematically heavier. A modern view of the reasons behind the
symmetrization postulate is based on the topology of the unordered configuration space
N 3
R and goes back particularly to the work of J. Leinaas and J. Myrheim.86 The
symmetrization postulate is then a special case of a more general principle, according
to which for any given topologically non-trivial manifold Q, there are several quantum
theories on Q corresponding to the 1-dimensional unitary representations of the so-called
fundamental group of Q. This will be briefly summarized in this appendix. (There is
also a vector bundle view of the symmetrization postulate, but that is a different story
and will be told elsewhere.)
A manifestation of the non-trivial topology of N R3 is the fact that it is not simply
connected. A topological space Q is said to be simply connected if every closed curve is
contractible, i.e., can be continuously deformed into a point. A space that is not simply
connected is also said to be multiply connected. For example, Rd is simply connected
for every d ≥ 1, whereas Q = R2 \ {0} is not: a curve encircling the origin cannot be
contracted to a point without crossing the origin and thus leaving Q. R3 \ {0} is again
simply connected because when we need to cross the origin we can dodge it by going
into the third dimension. But R3 without the z-axis is multiply connected. The sphere

Sd = {v ∈ Rd+1 : |v| = 1} (A.1)

is simply connected for d ≥ 2. On a cylinder R × S1 , closed curves that “go around the
tube” can’t be contracted, whereas others can; in fact, a closed curve is contractible if
and only if its so-called winding number is zero. (The winding number is the number of
times, possibly negative, that the curve goes around the tube counterclockwise.) Closed
curves are also called loops.

Example A.1. The following loop in N R3 is not contractible: q(t) = {x1 (t), . . . , xN (t)}
for 0 ≤ t ≤ π with

x1 (t) = (cos t, sin t, 0) (A.2)


x2 (t) = (− cos t, − sin t, 0) (A.3)
xj (t) = const. ∀j > 2 , (A.4)

say with xj3 > 0 so collisions cannot occur. It is a loop because q(π) = q(0) as
{e1 , −e1 } = {−e1 , e1 }.
A contraction (i.e., continuous deformation to a constant path) is impossible for
the following reason. Every loop in N R3 , beginning and ending at (say) y ∈ N R3 ,
defines a permutation of y because the particles need to arrive in the same locations but
may switch places. A continuous deformation will not change the permutation, as the
86
J. Leinaas and J. Myrheim: On the theory of identical particles. Il Nuovo Cimento 37B: 1–23
(1977)

173
permutation would have to jump in order to change. Therefore, loops with non-trivial
permutation cannot be deformed into ones with trivial permutation, in particular cannot
be contracted.

Example A.2. We show that R3,N 6= is simply connected. To begin with, R3N is simply
connected. Suppose an attempt to contract a loop in R3,N 6= intersects the collision set
∪i<j ∆ij . Then we can dodge the intersection: Since a curve can be pulled past a point
in R3 without intersecting it, and since ∆ij has codimension 3, a curve can be pulled
past ∆ij in R3N without intersecting it.

For a manifold Q that is not simply connected, one can algebraically characterize
its way of being multiply connected by means of its fundamental group. It is defined as
follows. Choose a point q ∈ Q and consider all loops that start and end at q. Two such
loops are called homotopic if they can be continously deformed into each other. For
example, a loop is homotopic to the constant path at q if and only if it is contractible.
For example, two loops in the circle S1 are homotopic if and only if they have the same
winding number. Homotopy is an equivalence relation; the set of equivalence classes [g]
of paths g is denoted π1 (Q, q) and becomes a group with the following operations. The
group multiplication [g][h] = [gh] is concatenation of the paths, i.e., the path obtained
by first following h and then g, and the inverse of [g] is obtained by following g in the
opposite direction.
(If we replace g by a homotopic path g 0 and h by h0 then the concatenation of h0
and g 0 is homotopic to that of h and g, so that the product of the equivalence classes is
independent of the choice of representative from each class. Concatenation is automati-
cally associative (up to homotopy, where the relevant homotopy is re-parameterization).
The neutral element of the group is the class of contractible loops. One can verify that
the path obtained by first following g and then following it backwards is contractible,
thereby confirming that we have correctly identified the inverse element in the group.)
This group is called the first homotopy group or the fundamental group of Q based
at q. For different choices q1 , q2 of q, the groups π1 (Q, q1 ) and π1 (Q, q2 ) are isomorphic
to each other, and any curve γ from q1 to q2 defines an isomorphism by first following γ
backwards, then any chosen loop starting at q1 , and then γ; this yields a loop starting
at q2 . For example, the fundamental group of the circle S1 is, for any base point q, given
by the additive group of the integers. In fact, the integer is the winding number, and
when concatenating loops then their winding numbers add. We report without proof
that
N
Proposition A.3. The fundamental group of R3 is, for any base point q, isomorphic
to the permutation group SN .

Now we need to turn again to the forgetful mapping π : R3,N 6= → N R3 . A dif-


feomorphism is a bijective mapping that is smooth in both directions; π is locally a
diffeomorphism: Every q̂ ∈ R3,N
6= has a neighborhood U ⊂ R3,N
6= such that π restricted
N 3
to U is a diffeomorphism to its image π(U ) in R . Even more, π is a covering map. A
covering map is a smooth map p : A → B between manifolds such that for every b ∈ B

174
there exists an open neighborhood V of b such that p−1 (V ) is a union of disjoint open
sets in A, each of which is mapped diffeomorphically onto V by p. For π this means
that if a neighborhood V of an unordered configuration q is small enough, its pre-image
π −1 (V ) = {q̂ ∈ R3,N
6= : π(q̂) ∈ V } consists of N ! disjoint neighborhoods, each one a
neighborhood of one ordering of q.

Example A.4. The mapping p : R → S1 given by

p(θ) = (cos θ, sin θ) (A.5)

is a covering map. If we picture the real line as a helix above the circle (i.e., draw θ ∈ R
as the point (cos θ, sin θ, θ)), then p is the projection to the {z = 0} plane. For every
interval on the circle of length less than 2π, the pre-image consists of an interval in R
and all its translates by integer multiples of 2π.

The set p−1 (q) is called the covering fiber of q. Relative to a given covering map
p : A → B, a deck transformation is a mapping ϕ : A → A is a diffeomorphism such
that p ◦ ϕ = p. That is, ϕ does not leave the covering fiber. For example, a deck
transformation ϕ relative to π : R3,N
6= → N R3 as in (26.10) must be such that at every
ordered configuration q̂, it can change the ordering but not the N points in R3 involved
in q̂. We report without proof that

Proposition A.5. The deck transformations of π as in (26.10) are exactly the permuta-
tion mappings ϕσ (x1 , . . . , xN ) = σ(x1 , . . . , xN ) with σ ∈ SN . The deck transformations
of p : R → S1 as in (A.5) are the mappings of the form ϕk (θ) = θ + 2πk with k ∈ Z.

The deck transformations relative to a given covering map p : A → B form a group


(the product is composition), called the covering group. If A is simply connected, then
p is called a universal covering and A a universal covering space.

Proposition A.6. For a universal covering, the covering group is isomorphic to the
fundamental group of B at any b ∈ B. For every connected manifold B there exist a
universal covering space A and universal covering p : A → B, and they are unique up
to diffeomorphism (i.e., if p : A → B and p0 : A0 → B are both universal coverings, then
there is a diffeomorphism χ : A → A0 such that p0 ◦ χ = p).

In the following, the universal covering space of B will be denoted B.


b Intuitively, B
b
is the “unfolding” of B that looks locally like B but removes the multiple connectedness.
For example, the universal covering space of S1 is R and can be thought of as obtained
by piecing together pieces of S1 in such a way that by going around the circle, you
don’t return to the same point. Instead, like on a spiral staircase, you arrive at the
corresponding location on a different level.

Corollary A.7. The universal covering space of N R3 is R3,N


6= , and the universal covering
map is π.

175
For identical particles, the wave function is defined on the universal covering space
of the configuration space. Now we consider the general situation of that kind: the
configuration space Q is a multiply connected Riemannian manifold, and ψt is defined
on the universal covering space, ψt : Q b → C. We call the covering map π : Q b → Q;
Qb automatically becomes a Riemannian manifold, and the deck transformations are
automatically isometries (i.e., preserve the metric). On every Riemannian manifold M ,
there is a natural way to define the Laplacian operator ∆ and the gradient ∇ψ of a
scalar function ψ; ∇ψ(x) is a complexified tangent vector at x ∈ M , ∇ψ(x) ∈ CTx M .
The relevant condition on ψ becomes particularly clear from a Bohmian perspective: we
need that the velocity field v of the Bohmian equation of motion
dQ
= v ψt (Q(t)) (A.6)
dt
is a vector field on Q, but Bohm’s formula for it,
∇ψ
v̂ ψ = ~
m
Im , (A.7)
ψ

defines a vector field v̂ on Q.


b In order to be able to define

v(q) := Dπ|q̂ (v̂(q̂)) (A.8)

in an unambiguous and consistent way, we need that Dπ|q̂ (v̂(q̂)) is the same vector at
every q̂ in the covering fiber π −1 (q). This will be the case if and only if

v̂(ϕ(q̂)) = Dϕ|q̂ (v̂(q̂)) (A.9)

for all q̂ and all deck transformations ϕ. A natural sufficient condition on ψ ensuring
(A.9) is
ψ(ϕ(q̂)) = γϕ ψ(q̂) , (A.10)
where γϕ is a phase factor, a complex constant of modulus 1 (that depends on ϕ but not
on q̂) called a topological factor. Relation (A.10) means that the values of ψ at different
points in the covering fiber are not independent of each other; it is called a periodicity
condition. It entails that ∇ψ(ϕ(q̂)) = γϕ Dϕ|q̂ (∇ψ(q̂), so the factor γϕ cancels out of
(A.7), and (A.9) follows.
The periodicity condition (A.10) can only hold if

γϕ1 ◦ϕ2 ψ(q̂) = ψ(ϕ1 ◦ ϕ2 (q̂)) = γϕ1 ψ(ϕ2 (q̂)) = γϕ1 γϕ2 ψ(q̂) , (A.11)

so the gammas need to satisfy


γϕ1 ◦ϕ2 = γϕ1 γϕ2 . (A.12)
Together with γid = 1, this means that the gammas form a 1-dimensional group rep-
resentation of the covering group. Since we assumed |γϕ | = 1, it is in fact a unitary
representation (by unitary 1 × 1 matrices). Such representations are also called char-
acters. Since the covering group is isomorphic to the fundamental group of Q, the

176
characters of the covering group can be translated to the characters of the fundamental
group. The upshot of the reasoning can be summarized in the following principle.

Character Quantization Principle.87 For quantum mechanics on a multiply-connected


Riemannian manifold Q, there are several possible types of wave functions, each corre-
sponding to a character γ of the fundamental group π1 (Q).

Now let us apply this to the case of identical particles.

Proposition A.8. For N ≥ 2, the permutation group SN (which is the fundamental


group of N R3 ) has exactly two characters: the trivial character γϕσ = 1 and the alter-
nating character γϕσ = (−)σ .

For identical particles, the character quantization principle applied to the unordered
configuration space yields that there are two possible theories of identical particles, one
requiring
ψ(σ q̂) = ψ(q̂) (A.13)
and one requiring
ψ(σ q̂) = (−)σ ψ(q̂) . (A.14)
Obviously, (A.13) is a bosonic wave function and (A.14) a fermionic one. So, we obtain
the boson-fermion alternative as a special case of the character quantization principle.

87
J. Leinaas and J. Myrheim, op.cit.
D. Dürr, S. Goldstein, J. Taylor, R. Tumulka, and N. Zanghı̀: Quantum Mechanics in Multiply-
Connected Spaces. Journal of Physics A: Mathematical and Theoretical 40: 2997–3031 (2007) http:
//arxiv.org/abs/quant-ph/0506173

177
Contents
1 Course Overview 2

2 The Schrödinger Equation 5

3 Unitary Operators in Hilbert Space 9


3.1 Existence and Uniqueness of Solutions of the Schrödinger Equation . . . 9
3.2 The Time Evolution Operators . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Unitary Matrices and Rotations . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 Abstract Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Classical Mechanics 14
4.1 Definition of Newtonian Mechanics . . . . . . . . . . . . . . . . . . . . . 14
4.2 Properties of Newtonian Mechanics . . . . . . . . . . . . . . . . . . . . . 15
4.3 Hamiltonian Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 The Double-Slit Experiment 18


5.1 Classical Predictions for Particles and Waves . . . . . . . . . . . . . . . . 18
5.2 Actual Outcome of the Experiment . . . . . . . . . . . . . . . . . . . . . 19
5.3 Feynman’s Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Bohmian Mechanics 24
6.1 Definition of Bohmian Mechanics . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3 Equivariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.4 The Double-Slit Experiment in Bohmian Mechanics . . . . . . . . . . . . 28
6.5 Delayed Choice Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 30

7 Fourier Transform and Momentum 33


7.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.2 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.3 Momentum Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4 Tunnel Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

8 Operators and Observables 42


8.1 Heisenberg’s Uncertainty Relation . . . . . . . . . . . . . . . . . . . . . . 42
8.2 Self-adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.3 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.4 Conservation Laws in Quantum Mechanics . . . . . . . . . . . . . . . . . 49

178
9 Spin 50
9.1 Spinors and Pauli Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.2 The Pauli Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3 The Stern–Gerlach Experiment . . . . . . . . . . . . . . . . . . . . . . . 52
9.4 Bohmian Mechanics with Spin . . . . . . . . . . . . . . . . . . . . . . . . 53
9.5 Is an Electron a Spinning Ball? . . . . . . . . . . . . . . . . . . . . . . . 54
9.6 Is There an Actual Spin Vector? . . . . . . . . . . . . . . . . . . . . . . . 55
9.7 Many-Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.8 Representations of SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
9.9 Inverted Stern–Gerlach Magnet and Contextuality . . . . . . . . . . . . . 58

10 The Projection Postulate 61


10.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
10.2 The Projection Postulate . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
10.3 Projection and Eigenspace . . . . . . . . . . . . . . . . . . . . . . . . . . 62
10.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

11 The Measurement Problem 65


11.1 What the Problem Is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
11.2 How Bohmian Mechanics Solves the Problem . . . . . . . . . . . . . . . . 66
11.3 Decoherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.4 Schrödinger’s Cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
11.5 Positivism and Realism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

12 The GRW Theory 72


12.1 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
12.2 Definition of the GRW Process . . . . . . . . . . . . . . . . . . . . . . . 73
12.3 Definition of the GRW Process in Formulas . . . . . . . . . . . . . . . . . 74
12.4 Primitive Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
12.5 The GRW Solution to the Measurement Problem . . . . . . . . . . . . . 76
12.6 Empirical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
12.7 The Need for a Primitive Ontology . . . . . . . . . . . . . . . . . . . . . 79

13 The Copenhagen Interpretation 83


13.1 Two Realms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
13.2 Positivism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
13.3 Impossibility of Non-Paradoxical Theories . . . . . . . . . . . . . . . . . 84
13.4 Completeness of the Wave Function . . . . . . . . . . . . . . . . . . . . . 85
13.5 Language of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 85
13.6 Complementarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
13.7 Complementarity and Non-Commuting Operators . . . . . . . . . . . . . 86
13.8 Reactions to the Measurement Problem . . . . . . . . . . . . . . . . . . . 89

179
14 Many Worlds 92
14.1 Schrödinger’s Many-Worlds Theory . . . . . . . . . . . . . . . . . . . . . 92
14.2 Everett’s Many-Worlds Theory . . . . . . . . . . . . . . . . . . . . . . . 94
14.3 Bell’s First Many-Worlds Theory . . . . . . . . . . . . . . . . . . . . . . 95
14.4 Bell’s Second Many-Worlds Theory . . . . . . . . . . . . . . . . . . . . . 96
14.5 Probabilities in Many-World Theories . . . . . . . . . . . . . . . . . . . . 96

15 Special Topics 99
15.1 The Mach–Zehnder Interferometer . . . . . . . . . . . . . . . . . . . . . . 99
15.2 Path Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
15.3 Point Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

16 The Einstein–Podolsky–Rosen Argument 104


16.1 The EPR Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
16.2 Further Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
16.3 Bohm’s Version of the EPR Argument Using Spin . . . . . . . . . . . . . 105
16.4 Einstein’s Boxes Argument . . . . . . . . . . . . . . . . . . . . . . . . . . 106
16.5 Too Good To Be True . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

17 Proof of Nonlocality 108


17.1 Bell’s Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
17.2 Bell’s 1964 Proof of Nonlocality . . . . . . . . . . . . . . . . . . . . . . . 112
17.3 Bell’s 1976 Proof of Nonlocality . . . . . . . . . . . . . . . . . . . . . . . 113
17.4 Photons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

18 Discussion of Nonlocality 116


18.1 Nonlocality in Bohmian Mechanics, GRW, Copenhagen, Many-Worlds . . 116
18.2 Popular Myths About Bell’s Proof . . . . . . . . . . . . . . . . . . . . . . 118
18.3 Bohr’s Reply to EPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

19 POVMs: Generalized Observables 122


19.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
19.2 The Main Theorem about POVMs . . . . . . . . . . . . . . . . . . . . . 126
19.3 Limitations to Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 129
19.4 The Concept of Observable . . . . . . . . . . . . . . . . . . . . . . . . . . 130

20 Time of Detection 133


20.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
20.2 The Absorbing Boundary Rule . . . . . . . . . . . . . . . . . . . . . . . . 135

21 Density Matrix and Mixed State 140


21.1 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
21.2 The Trace Formula in Quantum Mechanics . . . . . . . . . . . . . . . . . 141
21.3 Limitations to Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 142
21.4 Density Matrix and Dynamics . . . . . . . . . . . . . . . . . . . . . . . . 142

180
22 Reduced Density Matrix and Partial Trace 144
22.1 Partial Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
22.2 The Trace Formula (22.2) . . . . . . . . . . . . . . . . . . . . . . . . . . 145
22.3 Statistical Reduced Density Matrix . . . . . . . . . . . . . . . . . . . . . 145
22.4 The Measurement Problem Again . . . . . . . . . . . . . . . . . . . . . . 146
22.5 The No-Signaling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 147
22.6 Completely Positive Superoperators . . . . . . . . . . . . . . . . . . . . . 148
22.7 Canonical Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

23 Quantum Logic 153

24 No-Hidden-Variables Theorems 156


24.1 Bell’s NHVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
24.2 Von Neumann’s NHVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
24.3 Gleason’s NHVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

25 Special Topics 161


25.1 The Decoherent Histories Interpretation . . . . . . . . . . . . . . . . . . 161
25.2 Nelson’s Stochastic Mechanics . . . . . . . . . . . . . . . . . . . . . . . . 163

26 Identical Particles 167


26.1 Symmetrization Postulate . . . . . . . . . . . . . . . . . . . . . . . . . . 167
26.2 Schrödinger Equation and Symmetry . . . . . . . . . . . . . . . . . . . . 168
26.3 The Space of Unordered Configurations . . . . . . . . . . . . . . . . . . . 168
26.4 Identical Particles in Bohmian Mechanics . . . . . . . . . . . . . . . . . . 169
26.5 Identical Particles in GRW Theory . . . . . . . . . . . . . . . . . . . . . 171

A Topological View of the Symmetrization Postulate 173

181

You might also like