Hughes, Structure and Interpretation of Quantum Mechanics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 382
At a glance
Powered by AI
The document discusses the structure and interpretation of quantum mechanics. It covers topics such as vector spaces, states, observables, density operators, measurement, properties, probability and causality.

The book discusses topics related to the structure of quantum theory such as vector spaces, states and observables in quantum mechanics, physical theory and Hilbert spaces. It also covers the interpretation of quantum theory including the problem of properties, quantum logic, probability, causality and explanation.

The introductory chapter discusses the Stern-Gerlach experiment.

-'/IIJlllliJlif"

5403923795 I

'The Structure and Interpretation of


Quantum Mechanics
R. I. G. HUGHES

()os. \-,~.L.
oG\'~y:;-1

IIMvard Univer ily Press


Ca mbridge, Mnssa hu s 'tts, and London, England ] 989
Copyright © 1989 by the President and Fellows
of Harvard College
All rights reserved
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1

This book is printed on acid-free paper, and its binding


materials have been chosen for strength and durability.

Library of Congress Cataloging in Publication Data


Hughes, R. I. G.
The structure and interpretation of quantum mechanics/
R. I. G. Hughes
p. em.
Bibliography: p .
Includes index.
ISBN 0-674-84391 -6
1. Quantum theory. 2. Physics-Philosophy. I. Title.
QC174.12.H82 1989 88-16551
530.1'2-dc19 CIP
To Nicholas and Catharine
(a.k.a. Nick and Kate)
Contents

Preface xi

III traduction. The Stern-Gerlach Experiment 1

Part I The Structure of Quantum Theory


1 Vector Spaces 11
Vectors 12
Operators 14
Eigenvectors and Eigenvalues 21
Inner Products of Vectors in 1R2 26
Complex Numbers 28
The Space C2 31
The Pauli Spin Matrices 36
Mathematical Generalization 38
Vector Spaces 40
Linear Operators 42
Inner Products on 'V 43
Subspaces and Projection Operators 45
Orthonormal Bases 48
Operators with a Discrete Spectrum 49
Operators with a Continuous Spectrum 51
Hilbert Spaces 55

2 States and Observables in Quantum Mechanics 57


Classical Mechanics: Systems and Their States 57
Observables and Experimental Questions 59
States and Observables in Quantum Theory 63
Probabilities and Expectation Values 70
The Evolution of States in Classical Mechanics 72
Determinism 74
The Evolution of States in Quantum Mechanics 77
Theories and Models 79
mil ' (I/I/('/I/ Ii

3 PltysicalTheory alld Ililbert Spact's 83


Minimal Assumptions for Physical Theory 85
The Representation of Outcomes and Events 86
The Representation of States 88
Determinism, Indeterminism, and the Principle of Superposition 91
Mixed States 93
Observables and Operators 97
Relations between Observables: Functional Dependence and Compatibility 99
Incompatible Observables 104
The Representational Capacity of Hilbert Spaces 107
The Schrodinger Equation 113

4 Spin and Its Representation 119


Symmetry Conditions and Spin States 120
A Partial Representation of Spin in 1R2 123
The Representation of {Sa } in C2 127
Conclusion 131

5 Density Operators and Tensor-Product Spaces 136


Operators of the Trace Class 136
Density Operators 138
Density Operators on C 2 139
Pure and Mixed States 141
The Dynamical Evolution of States 145
Gleason's Theorem 146
Composite Systems and Tensor-Product Spaces 148
The Reduction of States of Composite Systems 149

Part II The Interpretation of Quantum Theory


6 The Problem of Properties 155
Properties, Experimental Questions, and the Dispersion Principle 155
The EPR Argument 158
Bohm's Version of the EPR Experiment 159
The Statistical Interpretation 162
Kochen and Specker's Example 164
Generalizing the Problem 168
The Bell-Wigner Inequality 170
Hidden Variables 172
Interpreting Quantum Theory: Statistical States and Value States 175

7 Quantum Logic 178


The Algebra of Properties of a Simple Classical System 178
Boolean Algebras 182
Posets and Lattices 186
The Structure of S('H) 190
The Algebra of Events 194
A Formal Approach to Quantum Logic 201
CO"'I' ''' ,~ i:r

I\ n U,wxn'p llo l1 l1b h' 1,, 11" pn'l,' lio n of Qua ntulll Logi ' 207
l' lIln ;:lIn on Q Ull lllll'" Ll.gic 209
Properties a nd I (:vi.lIll Logic 212

8 Probability, Causality, and Explanation 218


Probability Generalized 219
Two Uniqueness Results 220
The Two-Slit Experiment: Waves and Particles 226
The Two-Slit Experiment: Conditional Probabilities 232
The Bell-Wigner Inequality and Classical Probability 237
Bell Inequalities and Einstein-Locality 238
Bell Inequalities and Causality 245
Coupled Systems and Conditional Probabilities 248
Probability, Causality, and Explanation 255

9 Measurement 259
Three Principles of Limitation 259
Indeterminacy and Measurement 265
Projection Postulates 271
Measurement and Conditionalization 275
The Measurement Problem and Schrodinger's Cat 278
Jauch's Model of the Measurement Process 281
A Problem for Internal Accounts of Measurement 284
Three Accounts of Measurement 288

10 An Interpretation of Quantum Theory 296


Abstraction and Interpretation 296
Properties and Latencies: The Quantum Event Interpretation 301
The Copenhagen Interpretation 306
The Priority of the Classical World 310
Quantum Theory and the Classical Horizon 312

Appendix A. Gleason's Theorem 321


Appendix B. The Liiders Rule 347
Appendix C. Coupled Systems and Conditionalization 349
References 351
Index 363
Preface

I take it to be an unassailable truth that what Taoism, Confucianism, Zen


Buddhism, and the writings of Carlos Castaneda have in common, they
hilVe in common with quantum mechanics. As truths go, however, this one
isn't very illuminating. Quantum mechanics, one of the two great and
r 'volutionary theories of physics to appear during the first thirty years of
(h is century, is essentially a mathematical theory; one will gain little genuine
insight into it without some awareness of the mathematical models it em-
ploys.
That is one of the two beliefs which have guided the writing of this book.
The other is that the requisite mathematical knowledge is not, after all,
( arsomely difficult to acquire. In fact, one kind of reader I have in mind is
the reader who, while not seized by paralysis at the sight of a mathematical
(ormula, does not happen to have a working knowledge of vector-space
(heory. In this respect the book is self-contained; the mathematical back-
ground it assumes is that of high school mathematics, and the additional
mathematics needed, the mathematics of vector spaces, is presented in
ha pters 1 and 5.
Another kind of reader has taken physics courses, and solved textbook
problems in quantum mechanics, but, like most of us, continues to find the
theory deeply mysterious. Perhaps rashly, this reader hopes that a philo-
sophical account will clarify matters. Between these two ideal types there is,
if not a continuous spectrum, at least a considerable diversity of readers to
whom the book will prove accessible.
In presenting the mathematics, the strategy I have used is to treat finitely
dimensional spaces, particularly two-dimensional spaces, in some detail,
and then to indicate in general terms how the same ideas are applied in the
infinitely dimensional case. Correspondingly, the quantum-mechanical
quantities I deal with are usually spin components, rather than position and
xii PrefflcI'

momentum. It turns out that most of the problematic features of quantum


theory can be presented in terms of the behavior of the spin-t particle,
whose representation requires no more than the two-dimensional space C2.
The first half of the book not only sets out the mathematics of vector
spaces; it also shows how elegantly these structures can model a probabilis-
tic world. But if, as quantum mechanics suggest, the world they represent is
the actual world, then we face deep problems of interpretation. Defying as it
does any "natural" interpretation, whether in terms of causal processes or of
systems and their properties, quantum theory challenges some of our most
basic metaphysical assumptions.
These issues of interpretation are the subject of the second half of the
book (Chapters 6-10). Most of the well-known problems are aired-like
those of Schrodinger's cat, the two-slit experiment, and the EPR
correlations - but certain topics, such as hidden-variable theories, get very
scanty treatment, and others, like the Pauli exclusion principle, are not
mentioned at all. I confine myself to orthodox (nonrelativistic) quantum
mechanics; I do not discuss, for example, Dirac's relativistic account of the
hydrogen atom, nor do I deal with quantum field theories. This book is in no
sense an encyclopedia of the interpretation of quantum theory.
30th my exposition of the structure of the theory and the positive sugges-
tions I make concerning its interpretation are, in the broadest sense, quan-
tum-logical. My account of the theory's structure is essentially that given by
John von Neumann and by George Mackey. The interpretation I lean to, and
which I call the" quantum event interpretation," is in many respects conso-
nant with that advocated by Jeffrey Bub, by William Demopoulos, and by
Allen Stairs. The general account of physical theories which acts as a back-
drop to these specific discussions of quantum theory is the semantic view
associated with Patrick Suppes and Bas van Fraassen: theories are seen as
supplying models for the phenomena they deal with.
Having thus outlined my program and declared my allegiances, I leave
the reader to decide whether to proceed further, or to open another beer, or
both.

* * *
Among those to whom lowe thanks are Malcolm McMillan and M. H. L.
Pryce of the University of British Columbia, in whose physics classes I
learned about quantum mechanics; I hope that in the pages that follow they
can recognize the beautiful theory that they taught me. For what I learned
when I came to teach courses myself lowe a debt to my students at the
University of British Columbia, at the University of Toronto, at Princeton,
1'1'1'/(/('/' iii

.llld dt Yak. Allen I'llk hllhln, in parti ular, read most of the manuscript in
Ills sl'nior y ar al Yale and would return to me weekly, politely drawing my
.I ttl'ntion to obs urities, fallacies, and simple errors of fact. Roger Cooke,
Michael Keane, and W. Moran have kindly allowed me to reprint their
" l'l 'mcntary proof" of Gleason's theorem, and my commentary on it has
he 'n much improved by Roger Cooke's suggestions. For detailed comments
on a late draft of the book I am also indebted to Jon Jarrett, while for specific
.Idvice, encouragement, and appropriate reproof I would like to thank
Steven Savitt, Michael Feld, Clark Glymour, David Malament, and Lee
Smolin .
Sections of the book were written in railway carriages, airport lounges,
and theatrical dressing rooms, but for more tranquil environments I am
grateful to Sue Hughes and Paul Schleicher, to Susan Brison, and to Margot
Livesey, in whose houses whole chapters took shape. The final manuscript
was typed up swiftly and accurately by Caroline Curtis, and the diagrams
l'legantly rendered by Mike Leone; Patricia Slatter is even now at work on
t he index. At Harvard University Press, Lindsay Waters has been a source of
grea t encouragement over several years, and Kate Schmit edited the manu-
script with great care and sensitivity. My thanks to all of them.
I saved my two greatest personal debts till last. I met Ed Levy within days
of my arrival at U.B.C.; he it was who first stimulated my interest in the
philosophical foundations of quantum mechanics, who later supervised my
dissertation in that area, and who has continued to help me to clarify my
Ihoughts on the subject. Bas van Fraassen and I met at the University of
Toronto; since then we have discussed the problems quantum theory raises
(along with the architecture of the Renaissance and the plays of Friedrich
Durrenmatt) on two continents and in half a dozen countries. Both have
helped me more, perhaps, than they know.
I would also like to express my gratitude to the following firms and
institutions:
To Addison-Wesley Publishing Company, Reading, Massachusetts, for
permission to reprint a diagram from The Feynman Lectures on Physics
(1965), by R. P. Feynman, R. B. Leighton, and M. Sands.
To Kluwer Academic Publishers, Dordrecht, Holland, for permission to
reprint a diagram from J. Earman's A Primer on Determinism (1986).
To Cambridge University Press, Cambridge, U.K., for permission to re-
print in Appendix A "An Elementary Proof of Gleason's Theorem," by R.
ooke, M. Keane, and W. Moran, from Mathematical Proceedings of the
Ca mbridge Philosophical Society (1985).
To the Frederick W. Hilles Publication Fund of Yale University, with
whose assistance the manuscript was prepared.
I seik about this warld unstabille
To find ane sentence convenabille,
Bot I can nocht in all my wit
Sa trew ane sentence fynd off it
As say, it is dessaveabille.
-WILLIAM DUNBAR
INTRODUCTION

The Stern-Gerlach Experiment

Quantum mechanics is at once one of the most successful and one of the
most mysterious of scientific theories. Its success lies in its capacity to clas-
sify and predict the behavior of the physical world; the mystery resides in
the problem of what the physical world must be like to behave as it does.
The theory deals with the fundamental entities of physics-particles like
protons, electrons, and neutrons, from which matter is built; photons,
which carry electromagnetic radiation; and the host of "elementary parti-
les" which mediate the other interactions of physics. We call these " parti-
cles" despite the fact that some of their properties are totally unlike the
properties of the particles of our ordinary, macroscopic world, the world of
billiard balls and grains of sand. Indeed, it is not clear in what sense these
" particles" can be said to have properties at all.
Physicists have been using quantum mechanics for more than half a
century; yet there is still wide disagreement about how the theory is best
understood. On one interpretation, the so-called state functions of quantum
mechanics apply only to ensembles of physical systems, on another, they
describe individual systems themselves; on one view we can say something
useful about a particle only when it interacts with a piece of measuring
equipment, on another such a particle can be perfectly well described at all
times, but to do so we need a language in which the ordinary laws of logic do
not hold.
Are we, perhaps, foolish to seek an interpretation of this theory? Maybe
we should take the advice Richard Feynman (1965, p. 129) offers in one of
his lectures on The Character of Physical Law:
I am going to tell you what nature behaves like .. . Do not keep saying to yourself,
if you ca n possibly avoid it, " But how can it be like that?" because you will get
" dow n the drain," into a blind alley from which nobody has yet escaped. Nobody
knows how it can be like that.
2 [lIl rv lil/ cl io "

Feynman illustrates this pessimistic concl usion with a well -known


thought-experiment, the two-slit experiment to show interference effects
with electrons (see also Feynman, Leighton, and Sands, 1965, vol. 3, lecture
1). To the problems raised by this experiment I will return in Chapter 8; in
the meantime, another example will allow us to taste the peculiar flavor of
quantum theory.
In late 1921 Otto Stem and Walther Gerlach performed the first of a series
of experiments on the magnetic properties of various atoms (see Jammer,
1966, pp. 134-136). They vaporized silver in an enameling oven and al-
lowed some of the atoms to escape, collimating them into a narrow beam by
means of diaphragms. The beam then passed between the poles of a spe-
cially shaped magnet and, some distance further on, it struck a glass plate.
The trace the atoms left on the plate showed that they had been deflected as
they traversed the magnetic field, and that the beam had been split into two,
one half of it being deflected downward and the other upward (Figure 1.1). '
How was this simple result to be explained?
Clearly an interaction between the atoms and the magnetic field was
responsible for their behavior. It seemed that each atom acted as a tiny
magnet (or, more formally, each had a magnetic moment), and that the
splitting was due to the nonuniforrnity of the field. The DuBois magnets
used in the experiment were designed to give a very intense field near the
V-shaped pole piece and a less intense field near the other. In a uniform field
a small magnet (a compass needle, say) feels no overall force in anyone
direction: if, like a compass needle, it is constrained by a pivot, it will tend to
rotate until it aligns itself with the field; if not so constrained it will precess

diaphragm/,
--, "'

o~D--tt+~~~: ga,,~a'e
",magnet~
' - .,.

I
/ OJ '
"".--- ......
"' DuBois
\ magnet
\ viewed

\"' o:J
I
\
I obliquely

II

'- --.,.;' " /

Figure 1.1 The Stem-Gerlach appara tus (Expcrimc nt V).


'f'I1t' Stall C/'rillcl, 1:.~/I('r;IIIi'lIt J

round lhe dir' lion o( Ill\.' field, like a spinni ng top precessing round the
vertical. In a nonuniform field, on the other hand, the magnet will feel a net
force in one direction or the other, depending on which pole is in the
stronger part of the field.
But to picture a silver atom as a tiny compass needle would be wrong. For
if the atoms behaved in that way we would expect to find their magnetic
axes oriented randomly as they entered the field; that being so, those de-
flected most in one direction would be those with their axes aligned parallel
with the field gradient, and those deflected most in the other would be those
with their axes antiparallel with the field . In addition, however, there would
be large numbers of atoms that were not aligned exactly upward or down-
ward and that would suffer deflections intermediate between these two
extremes. In other words, instead of two spots of silver on the glass plate,
Stern and Gerlach would have seen a smeared line.
Of course, we could take the two spots they observed to show that the
magnetic axes of the atoms were oriented either upward or downward but
nowhere in between. When the magnets were rotated 90°, however, the
beam was again split into two, but now one part was deflected to the left and
the other to the right; by parallel reasoning, this would show that the
magnetic axes of all the atoms were oriented horizontally. Clearly, the
simple compass-needle model of a silver atom will not do.
More formally, we can contrast the behavior of these atoms with that of a
compass needle as follows. A classical magnet has a magnetic axis; its
magnetic moment is directed along this axis, but this moment has a compo-
nent in any direction we choose, whose value ranges continuously from a
maximum in the direction of the axis through zero along a line perpendicu-
lar to it, to a maximum negative value in the opposite direction (see Figure
1.2). However, it seems that the components in any direction in space of the
magnetic moment of a silver atom can have only one of two values; these are
numerically the same as each other, but one is positive with respect to that
direction and the other is negative.
At first the experiment was explained in terms of the "magnetic core"
hypothesis of Sommerfeld and Lande, a hypothesis long since discarded,
which attributed the deflection of the beam to the magnetic properties of the
nucleus and inner electrons of an atom. In 1925 an alternative explanation
was to hand, proposed by Goudsmit and Uhlenbeck, and it is this explana-
tion which was incorporated into quantum theory as we now know it.
The explanation is roughly this. An electron possesses an intrinsic angular
momentum, known as "spin," which gives rise to a magnetic moment. A
component of the spin in any direction has one of two values, + t h or - t h ;
hence we list electrons among the "spin-t particles." (The constant h is the
4 illirotiu clioll

/
/
/
/
I

I
I
I
I
I
I

Figure 1.2 If the magnetic moment J1 of the compass needle is directed along the dotted
line, then the component of J1 along AB will equal J1cosO.

so-called natural unit of action, the omnipresent constant of quantum


theory; it is now usually referred to as Planck's constant, though it would be
historically more accurate to reserve that term for h, equal to 27th.) Like the
direction given to a magnetic field, the positive and negative signs are
attached conventionally. A silver atom contains 47 electrons; 46 of these are
arranged in pairs, with the result that the effects of their spin cancel out and
the observed effect is due to the electron left over. In the experiment shown
in Figure 1.1, which I will call "Experiment V" (for vertical), it is the magnetic
moment due to the vertical component of spin of this unpaired electron
which the Stem-Gerlach apparatus measures: a positive value for this com-
ponent means that the silver atom will enter one beam (the "spin-up" beam)
while a negative value means that it will enter the other ("spin-down")
beam. Incidentally, the protons within the nucleus of the atom are also
charged spin-t particles, but because of their comparatively large mass, their
magnetic moment is much smaller than that of the electrons. The nucleus
does contribute to the total magnetic moment of the atom, but to a negligible
extent. In this discussion no problem arises when we talk of an atom as a
whole, rather than the unpaired electron within it, as having a particular
component of spin.
The account just given is, in its broad outlines, correct. But in at least one
respect it is seriously misleading. From it we might infer that, when the
atoms entered the magnetic field of the apparatus, some were aligned spin-
up and some spin-down and that the device just sorted them into two
'('//1' Slrll' (,l'rlll l'II J; IJcrill/ clI1 5

Hcpa rat' beams ,lC 'ordingly. This oncI usion would be confirmed were we
10 blo k off one of th ' beams as it left the apparatus, the spin-down beam,
let us say. The emerging atoms would now all be spin-up, as we can verify
by placing a second magnet in tandem with the first (Figure 1.3). No further
splitting of the beam would take place, though the beam as a whole would
be deflected further upward. Let us call this "Experiment VV."
Now consider a different experiment (Experiment VH) in which the sec-
ond apparatus is rotated 90° (Figure 1.4). The incoming beam-that is, the
spin-up beam from the first apparatus-will be split into two horizontally
separated beams, spin-left and spin-right. (So far our account and quantum
theory are entirely in harmony.) However, now let us block off the spin-
right beam. What are we to say of the atoms which now emerge? Our
account suggests that they have been through two filters: they have passed
the first by virtue of having a spin-up vertical component of spin, and the
second by virtue of a spin-left horizontal component of spin. In other words,
it suggests that we can specify both the horizontal and the vertical compo-
nents of spin these atoms possess: were a third apparatus set up to receive
this beam, whether the magnetic field gradient were vertical or horizontal,
no further splitting would occur.
Unfortunately, this is not the case (Feynman, Leighton, and Sands, 1965,
vol. 3, lecture 5). With the axis of the third apparatus set in any direction but
horizontal, the emergent beam will be split into two. With it vertical (Exper-
iment VHV, Figure 1.5) the two parts of the beam will be equal in intensity.
This, at least, is what quantum theory predicts for idealized experiments of
this kind, and all the evidence from actual experiments, some of which are
very close in principle to those described, confirms its predictions. It seems
that, somewhere along the line, there is a divergence between the quantum-
theoretic analysis of what happens and our account of it: at some point an

Figure 1.3 Experiment VV.


6 IlIlmtill l' l;tllI

Figure 1.4 Experiment VH.

unwarranted assumption or two has found its way into the latter. Within it
we find at least four separate assumptions at work:
(1) That when we assign a numerical value to a physical quantity for a
system (as when we say that the vertical component of spin of an electron is
+tn), we can think of this quantity as a property of the system; that is, we
can talk meaningfully of the electron having such and such a vertical compo-
nent of spin.
(2) That we can assign a value for each physical quantity to a system at
any given instant-for example, that we can talk of a silver atom as being
both spin-up and spin-left.
(3) That the apparatus sorts out the atoms according to the values of one
particular quantity (such as the values of the vertical component of spin), in
other words, according to the properties they possess.
(4) That as it does so the system's other properties remain unchanged.
The evidence of Experiments Wand VH is consistent with all of these
assumptions; that of Experiment VHV with (I), (2), and (3), but not (4). It
looks as though the spin-left, spin-right measurement effected by the sec-
ond apparatus disturbs the values of the vertical components of spin. But,
oddly enough, it disturbs only half of them. According to our interpretation
of Experiment W, all the atoms entering the second apparatus of VH have
spin-up vertical components of spin, but as they emerge half of them are
spin-up and half spin-down. Or so Experiment VHV informs us, as inter-
preted on the basis of assumptions (I), (2), and (3).
On this analysis, quantum theory owes us an explanation for the selectiv-
ity displayed by the second apparatus. Why is it, we may ask, that half the
atoms entering this apparatus are tipped upside down, while the other half
journey on undisturbed? Quantum theory declines to tell us. Rather, it
suggests that we not only abandon (4) but also look severely at the other
principles involved. Assumption (2) may be the first casualty: there may be
'1'111' Sit''''' (.r r/llcil [ ;xllt'ri lllL'1I1 7

distinct properties which are in ompatible. These properties would not just
bl' mutually ex lusive values of one quantity, like spin-up and spin-down,
but also properties associated with two different quantities, vertical and
horizontal components of spin, for example, or the possibly more familiar
pair, position and momentum. The possession of a well-defined value for
one such quantity would rule out its possession for the other: to say that an
.. 10m was spin-up would rule out our saying that it was also spin-left. But if
this is the case, then measurement will not be the simple process suggested
by (3); if the vertical and horizontal components of spin are incompatible,
and a system has a well-defined vertical component, then a measurement of
I he horizontal component will not merely reveal what value of the latter the
system possesses. The measurement process may have to be seen as in some
sense bringing this value about. To say that properties are not revealed by
measurement, however, serves to point out an oddity, not only in the
quantum concept of measurement but also in the notion of a property at
work here. If we accept assumption (I)-that is, the identification of a
property with a particular value of a physical quantity-then we may find
ourselves dealing with properties of a very peculiar kind. All four assump-
tions, not just the last of them, need careful scrutiny.

The Stem-Gerlach experiment was highly significant: it supplied the first


nonspectroscopic evidence for the quantization of physical quantities. Only
discrete values of the components of magnetic moment of a system were
permissible, compared with the continuum of values possible on the classi-
al view. The importance of the work was immediately recognized, Som-
merfeld stating that, "With their bold experimental method Stem and Ger-
lach demonstrated not only the existence of space quantization, they also

Figure 1.5 Expcrimcnt VHV.


8 Int rodll ctio/l

proved the atomistic nature of the magncli moment, its quantum-theoretic


origin and its relation to the atomic structure of electricity" (Jammer, 1966,
p . 134).
Yet, notwithstanding its success, the Stem-Gerlach experiment immedi-
ately presents us with some of the problems to which Feynman alluded.*
These problems arise when we try to reconcile quantum theory, and the
experimental results with which it deals, with intuitively appealing princi-
ples of interpretation. To see these problems more clearly, we need to
become better acquainted with the theory itself. As for the principles with
which the theory conflicts, they may be just the legacy of an outmoded
physics, old bottles into which, with the usual result, we are pouring new
wine. It may be that our way of describing the world is inadequate, and the
metaphysical notions implicit in it inappropriate, for dealing with a realm so
far removed in scale from our everyday experience. This, I take it, is what
Feynman suggests, albeit with a more graphic tum of phrase .

• For a contemporary view of the problems it raises, see Einstein and Eh renfest (1922),
reprinted in Ehrenfest (1959), pp. 452 - 455 .
I
The Structure of
Quantum Theory
1
Vector Spaces

No real insight into quantum theory is possible without an acquaintance


with the mathematics it employs. Luckily it isn't hard to get some feeling for
this mathematics; in fact, apart from some supplementary material in Chap-
ter 5, the present chapter contains virtually all the background material
drawn on in the rest of this book.
The mathematics in question is the theory of vector spaces (sometimes
called "Hilbert spaces"). In this chapter I give a three-part sketch of
the vector-space theory developed, with quantum mechanics in mind, by
P. A. M. Dirac and John von Neumann in the early 1930s (Dirac, 1930; von
Neumann, 1932; see Bub, 1974, pp. 3-B, on differences between the two).
Sections 1.1-1.4 deal with a simple geometrical example of a vector space,
the plane. This is a two-dimensional real space which we call1R 2; that is to
say, each point on it can be specified by two real numbers (x and y coordi-
nates in a standard Cartesian system). Sections 1.5 -1.7 generalize this
material to the case of a two-dimensional complex space, (?, in which each
point is represented by two complex numbers. The remaining sections (l.B-
1.16) carry the generalization a stage further and present an abstract charac-
terization of a vector space. These spaces can be of any, even infinite,
dimensionality; however, it so happens that a great many of the problematic
features of quantum theory can be presented in terms of electron spin, and
the quantum theory of the spin--l particle involves just the two-dimensional
space (? For this reason - and of course for the delectation of the reader-
Section 1.7 comprises a set of problems dealing with 1(2. Apart from one
exercise in Section 1.14, these are the only problems set in the chapter, but
many of the results quoted without proof in other sections can be obtained
by a few minutes work with paper and pencil. I have indicated the approxi-
mate level of difficulty of each proof by stars (*): the harder the proof, the
greater the number of stars.
12 '['li e ' /rll /lIrl' of QIIII/I/11111 '1'h1'I/ly

1.1 Vectors
Consider the two-dimensional real space of the plane of the paper, II~F, We
pick a particular point in 1R2 and call it the zero vector, 0, The other vectors in
1R2 are arrows of finite length which lie within the plane with their tails at
zero; any arrow of this kind is a nonzero vector of 1R2. (See Figure 1.1.)
We can define vector addition, the operation by which we add two vectors
to form a third, as follows, Given two vectors u and v, we construct a
parallelogram with u and v as adjacent sides (see Figure 1,2). The diagonal
of this parallelogram, which passes through 0, will also be a vector: call it w,
This vector is the vector sum of u and v, We write,

w=u+v

We also define scalar multiplication, that is, the operation of multiplying a


vector by a number. The vector 2v, for instance, is the arrow like v but twice
as long. Multiplying v by a negative number, -1.5 say, yields an arrow in
the opposite direction to v and half as long again as v (see Figure 1.3). It
follows that, for any v,

v + (-l)v = 0

Note that 0 here denotes the zero vector, not the number zero,
So far we have proceeded entirely geometrically, using a geometrical
construction to obtain u + v and giving a geometrical meaning to av (where
a is a real number). However, an alternative, arithmetical approach is open
to us. We may impose a coordinate system on our space and then refer to
each vector by the coordinates of its tip. Each vector will be designated by a
pair of numbers, which we write as a column, thus:

(;)
The numbers x and y by which we denote a particular vector v will of course

/V
o
Figure 1.1
VI'C/lIr '"aces 1J

o
Figure 1.2

vary according to the coordinate system we have chosen (see Figure 1.4), or,
as we say, according to the basis we use. Provided we are consistent and
don't switch haphazardly from one to another, in principle it doesn't matter
what basis we choose, though one may be more convenient than another. In
every basis the zero vector is represented by

Unless otherwise stated, we shall assume from now on that we are using a
single (arbitrarily chosen) fixed basis.
Corresponding to the operations of vector addition and scalar multiplica-
tion carried out on vectors, we can perform very simple arithmetical opera-
tions on representations of vectors, It is easily shown th~t,

if v = (:) and u = (::), then

v+u= ( y+y'
X+ X') and av = (::)

Figure 1.3
14 Tlte S/mclllrt' (If (Jill/II/II III '1'/11'111.'1

\ y
\
y\--
\

Figure 1.4 v = (~) in basis 1; v = (;) in basis 2; thus the same vector can be repre-
sented in many different ways.

Note that

as required.

1.2 Operators
We now consider operators on our set of vectors. An operator transforms any
vector in the space into another vector; one example is a rotation operator,
which swings any vector round through a certain angle without altering its
length. We will denote operators by boldface capital letters and write Av for
the vector which results when the operator A acts on the vector v . In Figure
1.5 v' is the vector we get by swinging v round through an angle () (counter-
. clockwise), and so we write,

v' = ReV

where Re is the operator which produces this rotation.


Another kind of operator is a reflection operator, which, as the name
indicates, produces the reflection of any vector on the other side of a given
line. Figure 1.6 shows the effect of the reflection operator Sy which reflects
vectors about the y-axis. Note that if u is a vector lying along this axis, then
SyU = u. In other words, u is mapped onto itself by Sy.
Vt'l'/or SPIICt'S 15

Figure 1.5

Neither reflection nor rotation operators change the lengths of vectors,


but some operators do. We can consider, for instance, the operator which
just doubles the length of each vector, or the one which reduces its length by
half. There is also a zero operator, which transforms every vector in the space
into the zero vector.
One of the most important classes of operators we deal with is that of
projection operators. An example of such an operator is shown in Figure 1.7.
This is the projection operator Px , which, as we say, projects a vector onto the
x-axis. This takes any vector

(;)
and transforms it into the vector

U=SyU
V'=S V - - -
Y V

Figure 1.6
16 '/'III' ' Irtlrlllrt ' llf )1It//l1 II III '1'/1 t'll rt/

Figure 1.7

lying along the x-axis with its tip immediately below

(;)
Notice that, while for a given vector v there is only one vector PxV (or else Px
would not be an operator), nevertheless we may well have distinct vectors u
and v such that PxU = PxV (see Figure 1.8). In this way projection operators
differ from rotation and reflection operators.
We may, of course, perform a series of operations on a given vector v. We
may, for instance, rotate it through an angle () to produce Rov and then
project the resulting vector onto the x-axis, producing Px(Rov). Now, pro-
vided () is neither 0 nor 180 this vector is different from the one we get if
0 0
,

we perform the operations in the reverse order. In other words, Ro(Pxv)"*


P,.(Rov) (unless v is the zero vector). To see this, consider that any vector
Px(Rov) must lie along the x-axis, while any vector Ro(Pxv) must be along a
line at an angle () to this axis.
We can define the operator AB as that operator which, when applied to
an arbitrary vector v, yields the vector A(Bv). In other words, the operator
AB is effectively an instruction to apply first the operator B and then the
operator A. We have just shown that, in general, AB "* BA, but the equality
may hold for particular operators A and B; for example, if A and B are both
rotation operators, then AB = BA. We say then that A and B commute. On a
point of notation: in the concatenation AB, both A and B are operators, and
AB, which is also an operator, is called the product of A and B. In the
concatenation Av, however, A is an operator but v is a vector, and one
3hould not think of what is happening as multipli cation .
We saw in the previous section that any vector o fIR2 a n be represented by
VI'I '/lIr S,lIIl'l'Il 17

.1 p.lir of numbers, DOt'll .I silllil.lr arilhm ' Ii al representation of operators


l'xist? Y , " provid 'd w • rL'Htri l ourselves to linear operators. An operator A is
lil/cnr provided that, for all vectors u and v and for any number e,

( 1.1) A(u+v)=Au+Av
( 1. 2) A(ev) = e(Av).

All the operators discussed in this book are linear.


Any linear operator on ~2 may be represented by a 2 X 2 matrix of real
numbers

such that if

v= (~)
then

(1 .3) Av = (ae db) (x)


y
= (ax + bY)
ex + dy

It is trivial to prove the converse, that any such matrix represents a linear
operator.
To perform the manipulations in (1.3), think of taking the top line (a b) of
the matrix, rotating it so that it matches up with the vector v (multiply a by x
ax
and b by y), and then adding to by to get the top entry of the vector Av.

Figure 1.B
'ff TIl!' . IIIII'II/n' 0/l)I/""/,,,,, I II"PII/

The bottom entry is obtained by dOIlI/j Ih (' S,\n1l' with the bottom linc of the
matrix. (For help with the proof of this th 'orcm, or for a fuller account of
elementary vector-space theory, consult any book on linear algebra, such as
Lang, 1972.)
The operators we have looked at all have simple matrix representations.
For instance,

A little thought should show why these operators have the representations
shown, and it is a useful exercise to show that

R = (COSO - SinO)
9 sinO cosO

We include in our class of operators the identity operator

I=(~ n
This is the operator that leaves any vector as it found it: Iv = v, for all v.
If we have two operators A and B, with representations

( ~ ~) and (; h)
wh.11 j, 1111' rl'pres nta tion of their product AB? It is the matrix which, when
We' 1I1l\'1",1 1\' wi l h il on !lll'V tor v according to the rule (1.3), yields the same
1\·: IIlt -, III IlPt""l ling wi lh IJ .lnd A, in that order. A little brisk manipulation

s how:-. lh .1 1

AB = (oe I bg af -I iJlr) (*)


ce + dg cf + dlt

(We obtain the top left-hand entry by matching the top line of A with the left
column of B, the top right entry by matching the top line of A with the right
column of B, and so on.)
It's worth going into this in more detail. Let
VI't lllr S,,(I /'I'S 19

Then if

(AB)v = (PX+ qy)


rx + sy

we will know that the matrix representation of AB is

Now

and so

A(Ev) = (~ b) (ex + fy) _ (a(ex + fy)


d gx + hy - e(ex + fy)
+ b(gx + hY»)
+ d(gx + hy)
_ ((ae + bg)x + (af + bh)Y)
- (ee + dg)x + (ef + dh)y)

Since, by definition, (AB)v = A(Bv), it follows that,

AB = (ae + bg af + bh)
ee + dg ef + dh

We can now confirm a previous result. Examining the matrix representa-


tions of RoPr and PrR o we find that

R ()'-p r = (cose
sine
0)
0 an
d PR
rOo
= (cose -soine) (*)

Thus, as we showed before, in general Pr and Ro do not commute. (Note,


however, what happens when = 0 or = 180 e 0
e 0
.)

This is a convenient point at which to start what will be a successive


generalization of the notion of a projection operator. So far we have just
considered projections onto the x and y axes; however, we can project onto
any line through O. That is, given any line L, at an angle to the x-axis, say, e
and an arbitrary vector v, we can think of v as the sum of two other vectors
V L and VLJ. , such that V L is in Land V L' is in the line U at right angles to L (see
20 TIlL' Im clllrt' IIf QIIIIIII II III '1'1/1'11 111

Figure 1.9 P 8 projects onto L.

Figure 1.9). That is, the vector v can be written as the sum: v = VL + vu. We
now define the projection operator P IJ onto the line L as the operator P IJ such
that PIJV = V L • As an exercise it's worth showing that
2
(1.4) P = (cos 0 cosO . SinO)
IJ cosO, sinO sin 2 0

The addition of two linear operators is easily defined: we write, for all
vectors v,

(A + B)v = Av + Bv

We obtain the matrix representation of A + B by simply adding corre-


sponding entries; if, as before,

A=(~ n and B= (eg f)h ' then

A +B= + e b + f)
(ac+g d+h

We can also define the multiplication of the operator A by the scalar a: we


specify that, for all vectors v, (aA)v = a(Av). Each element in the matrix of
aA is then a times the corresponding entry in the matrix for A.
Alternatively, we could regard this multiplication as a special case of the
multiplication of operators; to multiply any vector by the scalar a is effec-
tively to operate on it with the matrix
VI'r /ll r 8,' " ('1'9 27

l ien 'w' hay ' lhl..' OP t' ... ll or t'q uation:

Note finally that we write A - B for the operator A + (-1)B.

1.3 Eigenvectors and Eigenvalues


Consider the operator A with matrix representation

acting on the vector

v=G)
In this case

Av = (:) = 2v

That is, the vector which results is just a multiple of the vector we started
with . This is not always so with this operator; if we evaluate Au, where

Thus A does not simply double the length of all vectors. In fact, if we
interpret A geometrically, we find that it corresponds to the operation of
first doubling the length of a vector, then rotating it 90° counterclockwise,
and finally reflecting the result about the y-axis. (To see this, check that

using the method of matrix multiplication given in Section 1.2.)


22 'f'IIt' Slm clllrl' oj )111/11111111 'J'III'O'.l1

~2 ~1

x
/
/

Figure 1.10 v and ware eigenvectors of A, where A = (~ ~).

However, for any vector v', lying along the same line as

(that is, any vector lying along the line Ll at 45° to our axes-Figure 1.10),
and hence of the form

(;)
we find that Av' = 2v' (*l Also, we can check very quickly that if w is a
vector of the form

that is, a vector lying along L2 , then

Aw = ( 2X) =
-2x
- 2w

Vectors of the form v' and ware known as eigenvectors of A, and the
eigenvalues corresponding to them are 2 and - 2, respectively. More for-
mally,
( I .',) v is sa id Lo b~·0111 I'IXI ' II ///'('/o r of a linear operator A, with correspond-
ing eige ll va /u l' fI, i( v i-" 0 and Av = av o

NoLe that we do not allow the zero vector to be an eigenvector of any vector.
Not all operators have eigenvectors. For instance, the rotation operator Ro
has, in general, no eigenvectors, since, unless () = 0° or () = 180°, the vector
Rov cannot lie along the same line as the vector V. When () = 0°, Ro is the
identity operator I; for the two special cases, I and R 180 , every vector is an
eigenvector; the corresponding eigenvalues are + 1 and -1, respectively.
Likewise, any purely multiplicative operator has every vector in the space as
an eigenvector.
Now consider the reflection operator Sy and the projection operator Po.
Do these admit eigenvectors, and, if so, how many? In general, obviously,
Lhe vector SyV does not lie along the same line as V. However, it does so in
Lwo special cases: first, when v lies along the y-axis (so that SyV = v); and,
second, when v lies along the x-axis (so that SyV = -v). Thus we have two
classes of eigenvector and two eigenvalues, + 1 and -1. The projection
operator Po maps all vectors onto the line Lo at an angle () to the x-axis (see
Figure 1.11). Thus any vector in this line is an eigenvector of Po, with
eigenvalue 1. Now consider a vector v along the line L8+90' at right angles to
Lo. For this vector we have P Ov = 0 = Ov. As always, the symbol 0 denotes
the zero vector, while the symbol 0 denotes the number zero. In fact we have
here a special case of the eigenvector equation Av = av in which a = O.
(Note that, although the zero vector is not an admissible eigenvector, the
number zero is a perfectly good eigenvalue.) We see that the eigenvectors of

Y
\
L!l+90\
./'Ls
6
X

/
./'
\
\

figure 1.11 Eigenvectors o f Sy lie along y-axis (eigenvalue 1) or along x-axis (eigenvalue
- 1); eigenvectors of PI/ lie along L" (eigenvalue 1) or along L"+9o (eigenvalue 0).
24 Th e 'tru e/llre of Q lllllltlllll '/,11/ '/ 11 '/

P ()lie along either Loor Lo+ 90 , and the correspond ing eigenvalues are 1 and 0
in the two cases.
These examples suggest some very general conclusions. The operators we
have looked at fall into three classes. One class has no eigenvectors at all:
this class includes all the rotation operators except Ro and R 180 . In the
second class we find the projection and reflection operators and the example
A used at the beginning of this section. In each of these cases all the
eigenvectors lie along one or the other of two lines. With each set of eigen-
vectors (that is, with all the eigenvectors lying along a particular line) is
associated a particular eigenvalue; thus to each operator of this type we can
associate a pair of distinct eigenvalues. Now, in the examples we looked at,
the two lines containing the eigenvectors are at right angles one to the other.
While this is not the case for all the operators on 1R2 that (as we say) admit
two eigenvectors with distinct eigenvalues, nevertheless this result holds for
a very significant subclass of such operators, namely those among them
which are symmetric. The matrix representing a symmetric operator on 1R2
may be recognized by the fact that its top right element is equal to its bottom
left element.
The third class contains operators like I and R 180 • They admit all the
vectors in the space as eigenvectors, all sharing a common eigenvalue. Note
that these operators are also symmetric.
We can now use the operator

A = (~ ~)
to illustrate an important result. Clearly A is symmetric: further, it has two
eigenvalues, al and a2 , where al = 2 and a2 = -2. The eigenvectors lie
0 0
along two lines, at 45 and at 135 to the x-axis; to the eigenvalue al
corresponds the line LI in Figure 1.10, and to the eigenvalue a2 corresponds
L2 • We can find the matrices which represent the projection operators onto
these lines. Using (1.4), we obtain:

pI = (1 t)
1.
2
1.
2
and P2 = ( -1.t 2
-1) 1.
2
Vc r /(!/' 'pflces 25

1
I) + (-1
1-11)
=(~ ~)=A
This result turns out to be quite general. That is, if we take a symmetric
operator A which admits two eigenvectors with distinct eigenvalues a l and
a2 , then the eigenvectors corresponding to a l all lie within a line L I , and
those corresponding to a2 all lie within a line L2 (LI ..L L2 ). If PI projects onto
L. and P2 onto L2 , then, as in the case above,

It's worth approaching these ideas in a slightly different way. Any linear
operator A on ~2 may be "decomposed" into the sum of other linear opera-
tors, as follows. Let

A = (~ ~)

Then

However, we are interested in a more narrowly defined sense of" decompo-


sition."

( /. 7) If A is a symmetric operator on ~2, then there exists a pair of projec-


tion operators PI and P2I projecting onto mutually perpendicular
lines, such that A = alP I + a2 P2 •

This is called the spectral decomposition theorem for ~2. There are two cases:
(i) a. =1= a2 , (ii) al = a2 •
In both cases (that is, whenever A is symmetric), A admits eigenvectors.
When a l =1= a2 , the decomposition of A into the weighted sum of two projec-
tion operators is unique. Furthermore, all eigenvectors of A lie either along
the line onto which PI projects or along the line onto which P2 projects.
Those in the first line have corresponding eigenvalue aI' while those in the
26 Ti,e 51ru elll n' of QIIIII/IIIIII '1'111'0/1/

second have corresponding eigenvalue n2 . Wh en a. = a2 , all vectors of /R 2


are eigenvectors of A, and the decomposition is not unique. For any pair of
projection operators, PI and P 2 , projecting onto mutually perpendicular
lines, we have A = alP I + a2 P2 •

1.4 Inner Products of Vectors in ~2

The introduction of the notion of the inner product of two vectors, also called
the dot product or the scalar product, enables us to give numerical expression
to such geometrical ideas as the length of a vector and the orthogonality of
vectors. (Vectors at right angles one to the other are said to be orthogonal.)
Using the notation introduced by Dirac, we denote the inner product of two
vectors u and v by (ulv). We define it for /R 2 as follows:

(1.8) If u = (~:) and v = (~~), then (ulv) = XI X 2 + YIY2 '

The + here is the plus sign of ordinary arithmetical addition. We see that,
although u and v are vectors, their inner product is just a number. For

instance, let u = (i) and v = (!); then (ulv) = (2)(3) + (1)(4) = 10.
How does this number acquire a geometrical significance? Consider the
case when

u=v=(~)
In this case (ulv) = (ulu) = X2 + y2. Here the geometrical significance is
clear; by Pythagoras' theorem, (ulu) is equal to the square of the length of
the vector u . We denote the length of the vector v by Ivl and observe that

(1 .9) Ivl = "'(vlv)

If Ivl = 1, then we say that v is normalized. Given any vector v, we can


always produce a normalized vector collinear with v by dividing v by its
own length; in other words v /Ivlis a normalized vector along the same line
as v.
In general, the inner product of two vectors yields a number proportional
to the length of each and to the cosine of the angle between them. Thus,
surprisingly, the inner product of two vectors is independent of our choice
of x and y axes; I say "surprisingly" b cause the way we calculate inn er
Vcr/llr SI'III'I'.~ 27

prodllcts (using x <l ilt! y coordi nates) involves ref 'rence to a particular coor-
dinate system. The g ' Il era l result, for two vectors u and v at an angle ¢ to
ea h other, is:

( 1.10) (ulv)=lullvlcos¢

I will not prove this general result, but will show that it yields the right
answer in the case of two normalized vectors, u and v, such that u lies along
the x-axis and v is at an angle ¢ to it (see Figure 1.12). In this case

u= (1)o and v = (C?S¢)


sm¢

(v is normalized, since cos 2¢ + sin2¢ = 1, for all ¢). We obtain (ulv) =


(l)(cos¢) + (O)(sin¢) = cos¢, as (1.10) requires.
(To prove the more general result, first see what the effect on (ulv) would
be if u and v were not normalized but were along the same lines as before
r
and then compare the inner products (ulv) and (RouIRov) for an arbitrary
angle e.)
Equation (1.10) tells us that when cos¢ = 0, then (ulv) = O. Thus for two
vectors at right angles, the inner product is zero. Such vectors are said to be
orthogonal to each other.
Now consider a normalized vector v and a line L at an angle ¢ to v. If P is
the projection operator onto L, then

(1.11) (vIPv) = cos 2¢

COS<\»
V= ( sin<\>

Figure 1.12
8 '1'111' S /IIII'I/lfI' tlJ (.>/11111111111 '1'111'0111

Before we obtain a gen 'ral proof of thi. , ('III1Hid('r th e a e w hen P = p .... In


this case

cos(I»
v = (cos¢)
. A. and so p ...v = ( 0
sm'f'

(See Figure 1.13.) It follows that (vIP...v) = cos 2¢ + (sin¢)(O) = cos 2¢, in
accordance with (1.11). In the general case, we see from trigonometry that
IPvl = Ivl cos¢. By (1.10),

since v is normalized. It is also clear that (vIPv) = IPvl 2 = (PvIPv) .


Further, if PI and P 2 are projection operators onto two perpendicular lines
(see Figure 1.14), then Ivl2 = IP 1vl 2 + IP 2v1 2, for any vector v. Both these
considerations show that, if v is normalized, then 0 :5IPvI 2 :5 1, for any
projection operator P. The significance of these and analogous results will
be evident when we look at quantum theory. Within the theory, the proba-
bilities of events are given by expressions of the form IPvI 2, hence the
importance of showing that when v is normalized this expression can take
values only between zero and one.

1.5 Complex Numbers


In the vector space ~2 we have permitted multiplication of vectors by real
numbers. We say that ~2 is a vector space over the field of the reals. The next
step is to consider a vector space over the field of complex numbers. The
generalization is straightforward, once we have a grasp of what such num-
bers are.

Figure 1.13
V/ '/'Ior SP" /,/'S 2(

\
\

\-----
\
\

\
Figure 1.14

As is well known, the number 36 is the square of 6, and also the square of
- 6; there is no real number x such that x 2 = - 36. However, we can imagine
the sort of properties such a number would have, if it existed. It would be
twice the square root of - 9, for instance, so that it would conform to the
eq uation (X/2)2 = x 2/ 4, and it would be a root of the equation x 2 + 36 = O.
In fact, if we included "imaginary" numbers like x in our set of numbers,
then every quadratic equation would be capable of solution. Equations like
x 2 - 36 = 0 would have real solutions, whereas those like x 2 + 36 = 0
would have imaginary solutions, the square roots of negative numbers. If
the inclusion of imaginary numbers is worrying, it is worth considering the
ense in which a negative number, -6 say, is real-or, come to that, the
ense in which 6 itself is real. Of course, the sum of your worries may not be
decreased by such considerations.
From what has been said, if a is a positive real number, then - a is
negative, and r-a will be imaginary. Our imaginary numbers are to con-
form to the same rules as the real numbers, so

Because a is positive, Fa is a real number. Thus any imaginary number can be


written as the product of a real number and the square root of -1. (We can in
fact define an imaginary number in this way if we wish.) We denote the
square root of - 1 by i; thus J- 36 = 6i.
30 TI, e S/ru cillre of Q IIIIIIIIIIII "'" 1'1".1/

We can add and subtract imaginary numbers:

2i + 0.5i = 2.5i = 4i - 1.5i

We can multiply them by real numbers:

2(3i) = 6i = (3i)2

We can also multiply two imaginary numbers together; if we do so the


answer is real. For example,

(2i)(3i) = 6i 2 = (6)(-1) =-6

When we add a real number a to an imaginary number ib we obtain a


complex number, a + ib. This expression for a complex number cannot be
further simplified.
Notice that a and b are both real numbers; in this section and the next, I
will use a, b, d, e, . . . to denote real numbers; when I wish to talk of
complex numbers I will use c1 , c2 , . • •
We add and subtract complex numbers in a straightforward way:

(a + ib) + (d + ie) = (a + d) + i(b + e)


The sum of two complex numbers is thus the sum of their real parts plus the
sum of their imaginary parts, and is again a complex number.
Similarly, consider the product of two complex numbers:

(a + ib)(d + ie) = ad + a(ie)+ (ib)d + (ib)(ie)


= ad + iae + ibd - be
= (ad - be) + i(ae + bd)

This is again a complex number.


We may remark that both real and imaginary numbers are special cases of
complex numbers. The complex number a + ib is real provided b = 0, and it
is imaginary whenever a = O.
Now consider the product of a + ib and a - ib. The formula for the
product yields

(a + ib)(a - ib) = a 2 + b2

This is both real and positive. We call a - ib the complex cOllju8nle of n + ib,
VI'cior S,II/c('s 31

LInd conversely. W~' d\' l\o l \' by c· the complex conjugate of the complex
number e. Thus (a ilit II - ib, and (a - ib)* = a + ib. For all e, (e*)* =
e, but e* = e if and onl y if e is real. We have just shown that (e*)(e) is always
real and positive.
Observe that the use of complex numbers enables us to factorize expres-
sions like a 2 + b 2, which previously resisted factorization.
The quantity lei = ,j(e)(e*) is known as the norm of e. We are often inter-
ested in complex numbers of norm 1; from the definition of the norm, it
follows that if e = a + ib and lei = 1, then a 2 + b 2 = l. This in tum implies
that there is an angle 0 such that a = cosO and b = sinO. Thus any complex
number of norm 1 can be written in the form:

e = cosO + isinO = e i8

(The number e is the base of so-called natural logarithms; it is the sum of the
infinite series:

For any x,

For our purposes, however, we can think of e i8 simply as a notational


convenience for cosO isinO.)
Finally, I should add that many mathematicians find this approach to
complex numbers, which emphasizes the role of Fl, faintly disreputable.
They define complex numbers as ordered pairs (a,b) of reals which obey
certain algebraic relations, so that, for instance, (a, b) + (d,e) = (a + d,b + e),
and (a,b)* = (a, - b). The reader is invited to reconstruct the material of this
section along these lines.

1.6 The Space ([2


Let us now return to the study of vectors, and look at the complex space (:2.
Whereas in 1R2 each vector was represented by a pair of real numbers

(:)
32 Th t' /ru c/llre of Q I/(/II/UIII 'I'Ilt'O IY

to each vector of (:2 we associa te a pair of ompl ex numbers

In contrast with the situation in O~F, however, no direct geometrical repre-


sentation of a vector in (:2 is possible. For present purposes we say that the
pair of numbers is the vector.
Vector addition and scalar multiplication go on much as before.

as in the case when our vectors were pairs of reals. We allow scalar multipli-
cation by any complex number:

if v = (Cc 2I), then cv = (CCcC2I)


The number 0 is a complex number, and so, as before, the zero vector is
given by

0= (~)
An operator on a complex space is like an operator on a real space: it is an
instruction to transform a vector into some other vector. As in O~F, to each
linear operator on (:2 there corresponds a 2 X 2 matrix of numbers, but in
this case the numbers are complex. For instance, a typical operator on (:2 is
represented by the matrix

1-
3
i) =A
The algorithm for determining Av, given A and v, is the same as before, as is
the procedure for finding the matrix AB, the product of A and B, given those
matrices. For example, let A be as above, and let
Vec/o/' ' ,lIll' I' S JJ

Th en

_ (2
Av - 1+;
1-
3
i) (l+i
1 ) _ (2)(1) + (1 - i)(1 + i»)
- (1+i)(1)+(3)(1+i)

= (4(1 ~ i») = 4v
We see that v is an eigenvector of A, with corresponding eigenvalue 4. It
turns out that the vector

is also an eigenvector of A. In this case the corresponding eigenvalue is 1


(thus Au = u).
These eigenvalues are real. In this respect the operator A is not quite
typical; it is a member of a particular class of operators known as Hermitian
operators. These operators are going to play an important role when we look
at quantum theory: they will represent physical quantities, and their eigen-
values will be the possible values of those quantities; clearly it befits a
measurable quantity that its possible values should be real. I will postpone a
formal definition of a Hermitian operator until Section 1.12; it is the ana-
logue in complex space of a symmetric operator on a real space, and it has
similar identifying characteristics. Whereas the off-diagonal elements of a
symmetric matrix on /R 2were equal, those of a Hermitian operator on C2 are
complex conjugates of each other. (The diagonal elements of a 2 X 2 matrix
are the top left and bottom right elements; the off-diagonals, therefore, are
the bottom left and top right elements.) In the example above, the elements
in question are 1 - i and 1 + i. We also require that the diagonal elements
(2 and 3 in this case) be real. As in this example, the sum of the diagonal
elements of a Hermitian operator is always equal to the sum of its eigen-
values.
For operators on complex spaces, as for those on real spaces, the maxi-
mum number of distinct eigenvalues is equal to the dimensionality of the
space. In the case of a real space we also found that eigenvectors of symmet-
ric operators whose eigenvalues were distinct were always at right angles to
each other. The Hermitian operators on C2 have the same property, but it's
not informative to say so until we know what meaning we can give to "at
right angles" in the complex case. We approach this via the notion of inner
product.
34 rite Structure of QUf/lltUIII '1'/11'11111

It is in the definition of inner product that th e first important amendment


to our computational rules appears.

Let u = G:) and v = G:) . Then

(1.12) (ulv) = CI*C3 + cz*c4


and so, in general, (ulv) is not equal to (vlu). In fact we can prove that one
is the complex conjugate of the other:

(1 .13) (vlu) = (ulv)*

But now consider the inner product (ulu) . From (1.12), (ulu) = CI*C I +
C2*c 2 • We know that, for any complex number c, c*c is a positive real number;
thus, for any vector u of (?, (ulu) is real and positive.
This means that even in complex space we can talk of the length of a
vector; we define it by writing

(1.14) Ivl = ~(vlv)

without the risk of having a length tum out to be either an imaginary


number or the square root of such a number.
As before, we say that v is normalized if Ivl = I, and that u and v are
orthogonal if (ulv) = 0 = (vlu). This gives us a definition of orthogonality
for use in complex spaces, where the geometrical idea of a right angle is
inappropriate. If u and v are orthogonal, we write u 1.. v.
Armed with this definition, we can return briefly to the topic of eigen-
vectors and eigenvalues. We find that, if VI and V2 are both eigenvectors
of a Hermitian operator A on a complex space, such that AVI = alv l ,
AV2 = a2v 2, and al =1= a2' then VI 1.. v 2. In the example given,

We see that

(v l lv2) = (1 *)(1 - i) + (1 + i)*(-I)


= (1)(1 - i) + (1 - i)(-I)
=0
as required.
Vl'rltlr S"IlCI'~ 35

As with th ' d('finilioIlHof inn 'r product, length, and orthogonality, wher-
ever po ible we find suitable generalizations in 1[2 of the concepts familiar
from the real space II~F . For instance, the analogues in 1[2 of the lines ofIR 2 are
the one-dimensional subspaces of 1[2. If two vectors of 1R 2 lie along the same
line, then one is a multiple of the other; similarly, if two vectors v and v' lie
within the same one-dimensional subspace of 1[2, then v' = cv, where cis
some (complex) number, and conversely. We usually use the term ray in-
stead of the cumbersome one-dimensional subspace.
Let us now generalize the notion of a projection operator. We do this by
following the route taken in Section 1.2; we can usefully use a diagram
(Figure 1.15), provided that we remember that what we see in the diagram is
only the analogue in 1R2 of what we have in 1[2.
Let L be any ray of 1[2, and v be any vector of 1[2. Then there exist two
vectors V L and VLJ. , such that (i) VL + VLJ. = v, (ii) VL lies within L, and (iii)
VLJ. 1- VL' Further, for a given vector v and ray L, V L and VLJ. are unique. As in
the analogous case in 1R2 (see Figure 1.15), we can define the projection
operator Ponto L by writing, for any vector v, Pv = v L , where v, Vu and VLJ.
stand in the relations given by (i), (ii), and (iii) above.
As in 1R2 we find that, for every vector v and every projection operator P,
(vIPv) = IPvI 2 = (PvIPv)
This is always real and positive; furthermore, whenever v is normalized,

0:5 (vlPv) :5 1

\ \
\
\
\
\\~
VL= PV

Figure 1.15
36 'J'1,e lru elure of Q utlllllllll '/'//1 '01 1/

A Hermitian operator on ( ? , like a symmctri operator on II~F, ca n be


decomposed into a weighted sum of projection operators. Consid r the
Hermitian operator A on (? with distinct (real) eigenvalues a1 and a2 • Let v I
and v 2 be corresponding eigenvectors, and PI and P 2 be projection opera tors
onto the rays containing VI and V 2 , respectively. (We call these rays the
subspaces spanned by the vectors.) Then, as before, the spectral decomposi-
tion theorem gives us:

For example, consider again the operator

The projection operator PI onto the ray containing

1 )
( l+i
.
IS
1 (1
"3 l+i
1-
2
i)
and the projection operator P 2 onto the ray containing

. 1(2 -1 + i)
IS -
3 - 1-i 1

The eigenvalu es are 4 and I, respectively, and a brisk calculation shows that
(1.15) holds. As an exercise (**), it is worth considering how one could
show that PI and P 2 are given by these particular matrices.

1.7 The Pauli Spin Matrices


To echo Eco (1979), every book defines the role of its ideal reader. Even so,
some do it less subtly than others. Here are a few problems on the space (?
These problems are not simply mathematical exercises; in later chapters
the results will be applied to a particular physical example, namely to the
quantum theory of the fermion or spin-t particle. As indicated in the pre-
vious section, in quantum mechanics physical quantities like momentum
and energy are represented by Hermitian operators; Chapter 2 shows how
these operators enter into theoretical calculations. The exercises below in-
volve three operators used to represent the components of spin of a fermion
VI'I' /or S,IIlI'I'!l 37

lh ' qll Ll nliti 'S W • 1lIl'! in Ih l' di LI S ion of tern and Gerlach's results in
the Introdu tion. The 111 , trix representations of these operators are shown
be low.

S=.!. (0i -i)


y 2 0
1(1 0)
Sz ='2 0 -1

The three matrices involved are known as the Pauli spin matrices.

PROBLEMS

1. Show that S" and Sy do not commute, and evaluate SxSy - SyS" . Express
this difference in terms of Sz, and show that this relation holds cyclically
among the three operators.
2. Let

y- = '21 ( -1-
1-
i
i)
Show that x+ and x_ are eigenvectors of S", and that y+ and y_ are
eigenvectors of Sy. In each case, what are the corresponding eigenvalues?
3. Show (i) that x+ and y+ are both normalized; and (ii) that x+ is orthogonal
tox_, and that y+ is orthogonal toy_. Why might one expect (ii) to be the
case?
4. Determine the eigenvectors and corresponding eigenvalues of Sz.
S. Let P x+ be the projection operator onto the one-dimensional subspace of
(:2 containingx+. We extend the notation in an obvious way to P x- , P y+,
and so on. Then

-0
(i) Show that P x-"- = x_ and Py+Y+ = y+. (ii) Determine the vector
P y+ Y_. (ill) Show that P y+ is indeed the projection operator onto the ray
containing y+ . (iv) Evaluate Px+Px- and Py+Py+ (= P;+). Why are these
results predictable?
6. Evaluate (x+lPy+~), (x_IPy+"-), (y+lPy+y+), and (y-lPy+Y-). Confirm
that these are equal to /Py+x+/ 2, lPy¥_/2, /Py+y+/2, and /Py+y_/2, respec-
tively.
38 Ti,e Stru c/llre of Q llall/11111 '/'111 '1".1/

7. Given Sy and P y + , use the decomposition th eorem to determine P y


Confirm that py_Y_ = Y_ and py_Y+ = O.
8. Evaluate ·tpx+ - tP x-. Why is this result predictable?

1.8 Mathematical Generalization


The comparatively simple mathematics of 1(2, where no manipulation more
difficult than multiplying complex numbers is involved, enables us to an-
swer basic questions about electrons and their components of spin. To deal
with quantum systems in full generality requires an extension and general-
ization of these ideas. As a preliminary to this generalization, this section is
devoted to the topic of mathematical structures; we will return to the subject
of vector spaces in Section 1.9.
A mathematical structure consists of a collection of objects (usually math-
ematical"objects"), together with the relations between those objects, and
the operations we can perform on them. Consider, for instance, the set of the
rotation operators on ~2 whose effect is to rotate vectors through multiples
of 90 Call this set Rot,
0

0
There are just four members of this set, since a rotation of 360 is equivalent
to no rotation at all; in fact, we have R360 = Ro = I. Now, given any pair of
linea r operators A and B, we can form their product AB. A distinguishing
feature of the set Rot is that the product of any pair of these operators
(including the product of anyone of them with itself) is again a mem-
ber of the set. We have, for example: ~OR180 = R 270 , Ro~o = ~o,
R 270R2 70 = R 180 , and so on. We say that multiplication is a binary operation
on the set.
Consider now the set of four numbers, two real and two imaginary,
{I,i,-l,-i}. We can of course multiply any two of them together in the
usual way, and if we do so we find that they have the same property that we
observed in the set of four rotation operators: the product of any two of
them is also a member of the set: (i)(-I) = - i, (l)(i) = (i), (- i)(- i) = -I,
and so on. Observe that these equations involving numbers exactly match
up with the earlier equations involving operators.
We find that each operator from the first set, Rot, corresponds to a number
from the second; furthermore, to the operation of operator multiplication on
Rot corresponds the operation of arithmetical multiplication on the set of
numbers. The two sets are alike, not in the objects they contain, but in their
mathematical structure. When, as in this case, the slru tural similarity is
VI'f /or SJlIlc/'.~ .19

su h that a p ,r( 'cl (l1l(' 10 (IIW co rrespondence exists between two sets, we
say that th ey arc iso lll orl'"ic.
However, we're interested in a weaker kind of similarity. We want to
specify, for instance, the way in which Rot is similar to the set of rotations
through multiples of 60 · . In fact, this set and the two we started with,
together with the product operations on them, are all examples of a very
general kind of mathematical structure known as a group, and defined as
follows.

0.16) g is said to be a group if g comprises a set G of elements and a binary


operation ° on G; one of the elements of G has particular properties
and is known as the identity element of the group; for all elements, a,
b, c, of G,

(1.16a) (aob)oc=ao(boc)
0 .16b) aoI=a=Ioa

where I is the identity element. Additionally, for any element a of G,


there is a unique element a' of G such that

0.16c) a ° a' = I = a' °a

We may regard' as a singulary operation on G, mapping each member of G


into another. (For a general introduction to groups, see Eddington, 1935a,
rpt. Newman, 1956, vol. 3, pp. 1558 -1573; see also MacLane and Birkhoff,
1979, chap. 2.)
In defining a group I have abstracted certain features of these structures
while ignoring others; it is not part of the general definition, for instance,
that a group just have four elements. It is not even required that the opera-
tion ° be commutative-that, for all a and b in G, a ° b = boa-although
this was the case with both of the groups described here. These two groups
belong to a special subclass of groups, the commutative, or Abelian, groups.
I hope I have said enough to indicate what is involved when we say of a
set that it is a structure of a certain kind, like a group. In saying this we are
not saying what sorts of objects are in the set, nor how many of them there
are (though we may specify this to some degree, as we do when talking of
finite groups). We are merely saying that we have a set of objects and that we
can perform operations on these objects, or on pairs or perhaps trios of these
objects, to yield others of the same kind; further, that these operations
conform to certain rules, like (1.16a) - (1.16c), above. We may also require
4() '/'II/' S /I'/I C/II /'(' oj (JIIIIIIIIIIII '/,11 ,''',1/

that there ex ist obj ts in th - /let wit h parlielll ••!' prop -rlies: (or exa mple,
every group has to con tain an identity -I 'm -nt. To dcscrib a stru cture, we
make a list: first on the list is the relevant set of objects, then come the
operations performed on that set, and finally we list the elements with
specific properties. Thus the first structure we looked at in this section is the
group (Rot, 0, " Ro), which includes the set Rot, the binary operation of
multiplication, the singulary operation which gives the inverse of any ele-
ment of Rot, and the identity element of Rot.
I should emphasize that our present concern is not the mathematical
theory of groups per se. It so happens that a group is a particularly simple
form of mathematical structure and that by talking about groups we can see
what is meant by an operation on a set, by one structure being isomorphic to
another, and so on. Armed with these ideas, we can move to a more compli-
cated structure, that of a vector space-recalling, as we go, Russell's (1917,
p. 59) suggestion that "Mathematics may be defined as the subject in which
we never know what we are talking about, nor whether what we are saying
is true."

1.9 Vector Spaces


What, then, are the defining properties of a vector space, considered ab-
stractly in this way? We have already seen examples of such spaces: the set
of arrows in a plane radiating from a given point, and the set of pairs of
numbers. In fact, if the numbers are real numbers, the two sets are effec-
tively the same, in that we can translate talk of arrows into talk of pairs of
real numbers, and conversely. To use the vocabulary of Section 1.8, the set
of arrows in the plane and the set of pairs of real numbers are isomorphic.
When the spaces 1R2 and C2 were introduced, in both cases the operations
we met first were vector addition and scalar multiplication. The latter in-
volves multiplying a vector by a scalar, that is, by a number; thus prior to any
definition of a vector space we need an analysis of the structure of the set of
numbers. The relevant operations on this set are the binary operations of
addition and multiplication and the operation which takes us from the
number a to the number 1/a. The elements with special properties are 0 and
1. The structure in question is that of afield, ;; = (F, +, . ,-1,0,1), where the
operations and designated elements have familiar properties, such as
a + 0 = a and (a)(a- I ) = 1, for all a in F (see MacLane and Birkhoff, 1979,
chaps. 3 and 8).
Both the set of real numbers and the set of complex numbers have the
structure of a field. Omitting the formal definition of a field, I will move
straight on to the definition of a vector space over a field .
Vl'tlor S//(/('('S 4J

L 't :1 (J', I ", .", ', 0 ", 1) be a field . (The binary operations and zero
clement arc tagg , I wilh tI 'gree signs to distinguish them from the opera-
tions on the ve tor spa e, which are customarily represented by the same
symbols.) The elements of F will be called scalars. Then

(1.17) cy is called a vector space over ':J if cy = (V, +, . ,0), where


V is a nonempty set whose elements are called vectors;
+ is an operation which takes any pair of vectors and yields a
vector (that is, + is a binary operation on V);
. is an operation which takes a scalar and a vector and yields a
vector;
o is a member of V (the zero vector);
for all u, v, and w in V, and for all a and b in F, the following
identities hold:

(l.17a) (u + v) + w = u + (v + w)
(l.17b) u+v=v+u
(l.17c) v+O=v
(l.17d) a . (b . v) = (a . ° b) . v

(1 .17e) (a +0 b) . v = a . v + b . v

(U7f) a . (v + u) = a . v + a . u

(l .17g) 0° . v = 0
(1 .17h) 1· v=v

Clearly the examples of vector spaces we have looked at satisfy these


axioms. However, so do various other sets of mathematical objects. Con-
sider the set of infinite sequences of numbers; let x and y be members of this
set, so that x = (Xl,X2' . . . )andY=(Yl,Y2, . . . ),wherex1'Yl,xvY2are
numbers. We can define vector addition and scalar multiplication by

if we do so, then, because all the clauses of (1.17) are satisfied, we have
defined a vector space in which the sequences are the vectors.
Alternatively, consider the set of all complex-valued functions of a real
number. Examples are the squaring function, which maps a real number
onto it:; square, the function which maps a real number x onto the complex
number x + ix, the function which yields the cosine of x, and so on. This set
42 'I'!Il' ' I rrt (' 1II rt' tlJ (JIIIIIIIIIIII '1'111 '0 11/

can be made into a vector spa cas (ollows. The (u nctions themselves ar th'
vectors in the space; given any two function, q> and 'II, we define their sum
¢ + If/, so that, for all real numbers x,

(¢ + ",)(x) = ¢(x) +0 If/(x)

and we define scalar multiplication by the equation

(a . ¢)(x) = a .0 ¢(x)

If we do so (1.17) is satisfied and again the result is a vector space. (The


symbol used to distinguish operations performed on numbers from those
0

performed on vectors has now served its purpose. It may be omitted without
loss of clarity; however, it is still a useful exercise to see which operation is
being referred to by a particular symbol on any given occasion.)
Unless otherwise stated, in what follows vector spaces are assumed to be
over the field of the complex numbers.

1.10 Linear Operators


The definition of a linear operator given earlier may be repeated without
change. An operator A when applied to any vector v of vector space 'V
yields another vector v'; we write A v = v'. We use the general term mapping
to indicate the application of an operator to a vector; a mapping is a rule that
associates every element of a set with an element of another set, or, as in this
case, with an element in the same set. An operator is just a special kind of
mapping, and so is a function. We denote a mapping as we would an
operator or function.

(1.18) A mapping A of the set V into itself is called a linear operator if for all
vectors u and v and for any scalar a,

(1.18a) A(v + u) = Av + Au
(1.18b) A(av) = a(Av)

Examples of linear operators on the space of functions of x are easy to


come by. For instance, given a function ¢(x), the expression x . ¢(x) also
represents a function of x, whose value is obtained by multiplying the value
of ¢(x) by x. Thus x is here an operator, and since (LISa) and (1.1Sb) hold
VI'c/or Split/'S 43

(writing x for A, </) .lI)d 1/1 for v and u), it is a linea r opera tor. Again, the
differential opera tor tI/rix is a linear opera tor on the space of functions of x.
Il owcver, though the process of squaring any function t:/J of x-so that
(p2(X) = [t:/J(x)f-could be regarded as the action of an operator, such an
operator would not be linear, because both (1.18a) and (1.18b) would be
violated.
The definitions of an eigenvector and an eigenvalue of an operator carry
through to the general case:

(1.19) A vector v is said to be an eigenvector of A, with corresponding


eigenvalue a, if v
=1= 0 and Av = avo

We observe, for example, that in our space of functions of x, e 3x is an


eigenvector of d/dx with eigenvalue 3, since

Sums and products of linear operators are defined as before.

(J.20) If A and B are linear operators on <Y, then A + Band AB are linear
operators such that, for all v in V,

(I . 20a) (A + B)v = Av + Bv
( J.20b) (AB)v = A(Bv)

1.11 Inner Products on cy


Alongside our abstract definition of a vector space we now want a definition
of an inner product. Since the vector spaces we now deal with are not
necessarily geometrical, we will not usually be able to give a direct interpre-
tation of this quantity, as we could in the case of 1R2. Instead, we pick certain
features of that inner product, like the fact that the inner product of any two
vectors is a number, and use them as the defining properties of an inner
product in general.
Let <y = (V, +, ',0) be a vector space over a field ':I. (It is assumed here
that ':I is the field of complex numbers; if ':I is the field of the reals then the
complex conjugation signs below are redundant.) Then to each pair of
vectors u and v in V we assign a scalar, denoted by (ulv).
44 'J'lt l' 'im eilln' 0/ )111111111111 '1'1t1'III II

(1. 21) (ulv) is sa id to b ' a n ;'1I11'r /11"11t111 (" 1 O il 'V i (

(I .2Ia) (vlv) ~ 0, and (vlv) = 0 i( a nd only if v = 0


(I.2Ib) (ulv) = (vlu) *
(I.2Ie) (ulav) = a(ulv)
(I . 21 d) (ulv + w) = (ulv) + (ulw)
It is clear that the inner products we defined on ~2 and (:2 conform to
these conditions. We can extend those definitions of inner product to pro-
vide a definition of an inner product on the state of infinite sequences of
complex numbers, provided we restrict the space to those sequences
(X l ,X2 , . . . ) such that I;x;*x; is finite. Given this restriction, we can write,
for two vectors x and y in the space such that x = (Xl' X2 , . . . ) and
Y=(Yl'YV . . . ):

(xly) = Xl*Yl + X2*Y2 + ... = L x;*y;


;

The restriction on the sequences is needed to ensure that the inner product
defined in this way will always be finite, that is, will be a scalar. The space of
such sequences is called [2 .
Similar considerations lead us to define an inner product, not on the
vector space of all functions of X, but on the vector space of all square-inte-
grable functions of X, that is, functions ¢(x) such that f'~. ao¢(x)*¢(x)dx is
finite. This space is known as L2; we define an inner product on it by writing,
for vectors ¢ and 1fJ,

(¢IIfJ) = L~ ¢(x)*IfJ(x)dx
A remarkable mathematical fact now presents itself, that the spaces [2
and U are isomorphic. (See Fano, 1971, p . 269.) That is, we can find a
correspondence between sequences in [2 and square-integrable functions in
U such that, if to the sequences x and y there correspond functions ¢ and 1fJ,
then (i) to the sequence x + y corresponds the function ¢ + 1fJ, (ii) to the
sequence ax corresponds the function a¢, and (iii) (xIY) = (¢IIfJ) (provided
that each of these inner products is evaluated in the way appropriate to the
vectors involved). This isomorphism is relevant to the history of the devel-
opment of quantum mechanics. The early formul a tions of the th eory, by
Heisenberg and Schrodinger, were, respectively, in t rms o f sequences a nd
VI' (' /Of 8/11/( '('8 45

of fun tions; 8 \1bs ('l(lIl'lllly S 'hroding r established that, by virtue of this


isomorphism, the two formulations were equivalent (von Neumann, 1932,
chap. 1.4; see also Stein, 1972, p. 427, n. 10).
Armed with a definition of inner product on a given space, we can define
the length of a vector, what it is for a vector to be normalized, and what it is
for two vectors to be orthogonal, just as we did when discussing (:2:

(1.22) For all v and u in V,


(l.22a) Ivl = "(vlv)
(l.22b) v is said to be normalized if Ivl = 1
(1. 22c) v is said to be orthogonal to u if (vlu) = 0

1.12 Subspaces and Projection Operators


While we were dealing with arrows radiating from a point, the notion of a
subspace of the vector space had a readily visualized geometrical interpre-
tation. Given the fact that we confined the arrows to the plane, we had the
following subspaces: the plane itself, all the lines through the zero vector,
and the zero vector itself, or, more strictly, the set containing just the zero
vector-see (1.23), below. If we had investigated the three-dimensional
real space 1R2, then the whole space 1R3 would have been a subspace, as
would every plane which included the zero vector and, as before, every line
containing the zero vector, and the set containing only the zero vector. In
other words, we would have had subspaces of dimension 3, 2, 1, and O.
Notice that if we add any two vectors lying in a plane, then the result is
another vector lying in the same plane; again, if we multiply a vector v by
any number a, the result av is in the same plane as v. These results hold not
only for the planes of 1R3, but for any of its subspaces, and so a generalized
definition of a subspace, applicable to all vector spaces, presents itself.
Let cy be the vector space (V, +, ',0).

(1 .23) L is said to be a subspace of cy if L is a subset of V, and


(1. 23a) if u and v are in L, then so is u +v
(1 . 23b) if v is in L, then so is av (where a is any scalar)

How does this apply to vector spaces of functions of x? As an example,


consider the vector space (call it 'P) which contains just those functions
which are polynomials in x, that is, functions of the form ao + a1x + a2x2 +
46 '/'lI t! ' /m c/llre of (.2111111/1111/ '1'//1'11/.'/

+ anx". Then a typical subsp;)ce of 'P conta ins all the polynomials of
order two or less. (A function of the form ao + a1 x + a2 x 2 is a polynomial of
order 2.) Clearly, the addition of two such functions yields another of the
same kind, as does multiplication by an arbitrary number a, and this is all we
require.
Orthogonality between vectors was defined in Section 1.11. We also say
that a vector v is orthogonal to a subspace Lif v is orthogonal to every vector in
L, and that two subspaces Ll and L2 are orthogonal if every vector in Ll is
orthogonal to every vector in L2 • Note that in Figure 1.16 the planes L1 and L2
are at right angles to each other but are not orthogonal, since Ll contains
vectors which are not orthogonal to all vectors in L2 • In particular, the vector
v is common to both.
Note that the zero subspace (containing just the vector 0) is orthogonal to
all subspaces.
The projection operators we encountered on 1R2 and 1[2 were projection
operators onto rays (one-dimensional subspaces) of 1R2 and 1[2, respectively.
We can define projection operators onto subspaces of any dimension; if L is
a plane of 1R3, then P v the projection operator onto L, maps vectors into L, as
shown in Figure 1.17.
We define projection operators in the general case just as we did for 1[2.
That is, given a subspace L, we can decompose any vector v into two parts,
VL and V L-,- , so that VL lies in L, Vu is orthogonal to L (and hence to vd, and

v = v L + V u. We then write P Lv = V L, thereby defining the action of P L on


an arbitrary vector v .
However, an alternative, elegant, and equivalent definition is available,
as follows.

figllre 1.16
VI'I'IM Sf/III'/'S 47

""
""
""

x
z

Figure 1.17

We first define a Hermitian operator on a complex space equipped with an


inner product, and then say what it is for an operator to be idempotent.

(1.24) A linear operator A on 'V is said to be Hermitian if, for all vectors u,
and v, (uIAv) = (Aulv).
(1 . 25) A linear operator A on 'V is said to be idempotent if AA = A. (We
write A2 = A.)

We saw in Section 1.7, for example, that the operator

on 1(2 is idempotent.
Our new definition of a projection operator is:

( 1.26) A linear operator A on 'V is said to be a projection operator if A is both


Hermitian and idempotent.

It can be shown that such an operator is an operator which has the property
of" projecting onto a subspace" in the way described above, and conversely
(**l (See Jordan, 1969, pp. 26 - 27.)
The set of projection operators (or projectors) on a vector space is in
on . to-one correspondence with the set of subspaces of that space. It in-
48 'J'//I' Iru l'llI rl' oj QIIIIIIIIIIII '/'/1 1'011/

eludes the zero operator (whi h prokl'l s onto the zero subspa e)- th
operator Po such that, for all v, Pov - 0 and also the identity operator I,
which projects onto the whole space: for all v, Iv = v. As we shall see,
projectors play an enormously important role in quantum theory; in fact, the
discussions of the theory in later chapters are almost entirely in terms of
these operators.
I conclude this section with a general proof of a relation we have met a
couple of times already. Let P be any projection operator and v be any
vector, then:

(l.27) (vIPv) = IPvl 2

As proof, consider:

(vIPv) = (VIP2V) [idempotence]


= (vIP(Pv»
= (PvIPv) [Hermiticity]
=IPvI 2

1.13 Orthonormal Bases


A set {v I, V2' . . . ,v n} of vectors spans a space cy if any vector v in cy can be
written as a linear combination of VI 'V2' . . . ,vn-if, that is, foranyvin cy
thereexistscalarsa l ,a2,' " ,ansuchthatv=alvl+a2v2+'" +anvn·

(1.28) The set {VI ,v2, . ,v n} of vectors is said to be an orthonormal basis


for cy if
(l.28a) {V I ,V2' .. . ,v n} spans CY; and for each Vi and Vj in
{VI,V2, . . . ,V n}
(1. 28b) Vi 1- Vj whenever i =1= j, and
(l.28c) IVil= 1

The set

forms a convenient orthonormal basis for IJ;F, and also for 2. Notice that
Ih 'r(' is no IIniqlll' orlil011 orm.lI basis for any spa e. For exa mple, we see
from ' 'ction 1.7 that 'achofth threesets(x+,x_),(y+,y_),and{z+,z_)isan
orthonormal basis for (:2, as are nondenumerably many other pairs of
v' tor .
An II -dimensional space can be spanned by n mutually orthogonal vec-
lors. If th e space is infinitely dimensional, an infinite set {v;} is required.
If {v I ,v 2 , . . . ,vn} is an orthonormal basis for a vector space <Y, then any
subset of this basis is an orthonormal basis for a subspace of <y. The set
(v l ,V3), for example, is an orthonormal basis for a two-dimensional sub-
space L; we say that L is spanned by {V 1 ,V3 }, and we also talk of the rays
containing VI and V3 spanning L. These rays, of course are themselves
spanned by {vd and {v3 }, respectively.
A result that will be important in Section 8.8 is the following. Let
(v I, . . . ,v n) be any orthonormal basis for <y. Then any linear operator A
on <y is uniquely determined by the vectors A v I , . . . ,A v n' In other
words, to specify A we need only specify its action on an arbitrary ortho-
normal basis. This result follows immediately from the definition of linear-
it y in (1.18). For let v be an arbitrary vector in <Y; then for some c1 , • . • ,c"
we have:

And, by linearity,

( / . 9) Av = A (L c;V;) = L(Ac;v;) = ~ c;(Av;)


I I I

I. 14 Operators with a Discrete Spectrum


A Il ermitian operator which admits eigenvectors is said to have a discrete
s/lI' c/rum, and in this case the spectrum consists of the set of eigenvalues of
Illl' operator. All Hermitian operators on a finitely dimensional vector space
have a discrete spectrum; in the infinitely dimensional case this isn't always
so, and I'll discuss the exceptions, and give a general definition of a spec-
/rlllII , in the next section.
It is easi ly shown, by use of (1.21) and (1.24), that eigenvectors and
l'igL·nva lu es of Hermitian operators have striking properties.

( / .W) If A is a Hermitian operator, then its eigenvalues are real.


( / ..11) If A is a Hermitian operator, such that AVI = a1v l ,
"*
AV 2 = G 2 V 2 , and a l a2 , then VI ..1 v 2 . (*)
50 '/'11/' Slru clllrl ' II/ (JIIIIIIIIIIII 'f'1'/''''1f

In words, eigenvectors correspondIng 10 diHlinr l l'ig('lwalues .He mutually


orthogonal. Thus the maximum number o( distin t eigenvalues of a Hermi-
tian operator A on 'Y is equal to the dimensionality of 'Y . This result we have
already seen for 1R2 and (:2; we now consider a vector space 'Y with dimen-
sionality n.
Let A be a Hermitian operator on 'Y. There are two possible cases: (i) A has
exactly n distinct eigenvalues; (ii) A has m distinct eigenvalues, 0 < m < n.
In case (i), all the eigenvectors corresponding to a particular eigenvalue, aj
say, lie within one ray Lj of 'Y, and every vector in Lj is an eigenvector of A
with eigenvalue aj. These rays span 'Y and are mutually orthogonal. If from
each ray Lj we select one normalized vector Vj, then the set {v;} forms an
orthonormal basis for 'Y .
In case (ii), to each eigenvalue aj there again corresponds a subspace Lj of
'Y such that Av = ajV if and only if v is in Lj , and again these subspaces are
mutually orthogonal; in this case, however, they are not all one-dimen-
sional. Instead, the following general result holds. Denote by dj the ctimen-
sionality of the subspace corresponding to eigenvalue aj. Then

[Case (i) corresponds to the case where m = n and, for all i, dj = 1.] By a
curious usage, when dj > I, we say that aj is degenerate. As in case (i), we can
still choose an orthonormal basis for 'Y consisting only of eigenvectors of A:
we first choose an orthonormal basis for each Lj (each of which will consist,
obviously, only of eigenvectors of A) and then form the union of these sets
of vectors.
With this as background, here is the general form of the spectral decom-
position theorem for a finitely ctimensional vector space 'Y.

(1.32) Let A be a Hermitian operator on a finitely ctimensional vector space


'Y. Then there are real numbers aI, . . . ,am and projectors
PI, . . . ,Pm projecting onto mutually orthogonal subspaces of 'Y
(m =0; n), such that

m
A= L ajPj
j- l

If we add the condition that

(1.32a) aj *- aj unless i = j

then this decomposition of A is uniqu e.


VI ' I' /o r S/III CI'8 :; J

No ll' Ih.11 in Ill(' (' p r('~.~, I( Hl lor A above we art.' adding operators: we have
til

A - a.P. + a2P 2+ · ·· + atllPm = ~aiPi


i- I

.J'his su m ca n now be unpacked in terms of familiar quantities. Each number


III is an eigenvalue of A . The corresponding Pi is the projection operator onto
1'1, the subspace of eigenvectors with eigenvalue ai. If A has n distinct
t'igc nvalues, as in case (i), above, then each projector projects onto a one-
dim nsional subspace, and uniqueness of decomposition is guaranteed. If,
.1S in case (ii), degeneracy occurs, then condition (1.32a) ensures that each
projector Pi projects onto a subspace Li containing all the eigenvectors with
eigenvalu e ai. Now Lj may be more than one-dimensional, and in such a
case a furth er decomposition violating condition (1 .32a) is possible. Con-
sider the case when Li is two-dimensional, and let Lj) and Li2 be any two
orthogonal rays spanning Li , with projectors Pi)' Pi2, respectively. We can
s how that Pj = Pi) + Pi2(**), and so the term ajPi in the decomposition of A
m OlY be replaced by aiPi) + aiPj2 • Notice particularly that any pair of mutu-
.lll y orthogonal rays in Li could be used for this construction, and so this
f1I rther decomposition is itself not uniquely specifiable by, for example, the
requirement that all the Pi project onto rays.
I have offered a discussion and not a proof of the theorem. (See Jordan,
1969, sec. 14; Fano, 1971, chap. 2.3 .) On the basis of this discussion, how-
t'vc r, and given one important assumption (which happens to be true), the
reLlder should be able to supply one.
exe rcise. Given that every Hermitian operator on a finitely dimensional
Vl' tor space admits eigenvectors, prove (1 .32) (**). Hint: Compare the
tra n formation of an arbitrary vector v effected by A with that effected by
Lla,Pi , using the fact that an orthonormal basis for 'Y exists consisting only
o f eigenvectors of A.

1.15 Operators with a Continuous Spectrum


Wh en 'Y is infinitely dimensional, not all Hermitian operators admit eigen-
Vl'ctors. Some do, and for them an infinitary version of (1 .32) holds. Among
Ihose that do not, however, are some operators of great importance in
qu antum theory, like those which represent position and momentum.
B 'fore seeing what form the spectral decomposition theorem takes for
I hl'm, I will present some of the material of the previous section in a slightly
different way. I first defin e a straightforward ordering relation between
pro je tors in term of th e inclusion rela tion between subspaces, and then
int rodu e th e idea of the spectral }}I eaSllre associated with an operator on 'Y.
52 The St ruclure of Q I/111/11I1I1 'I'I11 'ory

(1.33) Let PI and P 2 project onto subspaces L. and L2 of a vector space 'V; we
define the relation::5 Cless than or equal to") by: PI ::5 P 2 if and only if
Ll ~ L2 if and only if every vector in Ll is in L2 .

A spectral measure is a family of projection operators on 'V parameterized


by the real numbers. In other words, for any real number x there is in the
family a projector P(x) corresponding to it. We don't require that different
real numbers always be paired with different projectors. However, as we
move along the real line from - 0 0 to +00, we require that, if a < b, then
Pea) ::5 P(b). Consistent with this is a second requirement: that, as x goes
toward -00, then P(x) goes toward Po (the zero operator) and that, as x goes
toward + 00, P(x) goes toward I. A third requirement C continuity from the
right") I will explain below.
A spectral measure can be associated with any Hermitian operator. Let us
look first at the case discussed in Section 1.14, that of a Hermitian operator
A on an n-dimensional vector space 'V, such that A has m distinct eigen-
values. We can arrange these eigenvalues in ascending order, so that
a1 < a2 • • • < am. Associated with each eigenvalue aj there is a subspace
(not necessarily one-dimensional) containing all the corresponding eigen-
vectors. Let P j be the projector onto this subspace. The spectral measure P(x)
for A is now specified as follows.

For x < aI' P(x) = Po


for a1 ::5 x < a2 , P(x) = PI
for a2 ::5 x < a3 , P(x) = PI + P 2

for am- 1 ::5 x < am, P(x) = PI + P 2 + +P m- 1


for am ::5 x, P(x) = PI + P 2 + +Pm=I

We can prove that these sums of projection operators, PI + P 2 ,


PI + P 2 + P 3 , and so on, are themselves projection operators, because the
subspaces L 1 , L 2 , and so on that PI and P 2 project onto are all mutually
orthogonal (**). In fact, the projection operator PI + P 2 is the projection
operator onto the subspace spanned by Ll and L 2 , that is, the subspace
spanned by the set of eigenvectors with eigenvalu e a j or a2 • Clearly, since
the set of all eigenvectors spans 'V, PI + P 2 + ... + Pili = I.
The picture is this. As we move along the rea l line from - 00 to 00, P(x)
increases, in the sense given by (1.33), by 111 "steps." Th ub pac onto
which P(x) projects just after a st 'p in Iu d s lh . subsp,lCl' projecled onto just
VI'tlor SpCln's 53

11('(or' lh step, and l'ueh Slep is itself a projection operator P j. These steps
oc ur at the eigenva lu cs; the requirement of continuity from the right sim-
ply means that, for exa mple, P(x) = P 1 when a1 ::5 x < a2 , rather than when
(/1 X ::5 Cl 2 ·

The spectrum of A is the set of points where P(x) changes value, in this
casc the set of eigenvalues of A.
Now, when A is a Hermitian operator on an infinitely dimensional space
it i ' still possible to associate a spectral measure P(x) with A, but it can
h.lppen that, where P(x) increases, it increases continuously rather than by
sleps. The sets of points over which A increases are in this case intervals on
I he real line. Again, the set of all such points is called the spectrum of A, but
!lOW we say that A has a continuous rather than a discrete spectrum.

What does it mean to say, in the continuous case, that P(x) is associated
lII illi A? We can explain this by analogy with the discrete case. In the discrete
1',IS we have, by (1.32),

A= L ajP j
j

Whence, for any vector v,

( / ,11) (vIAv) = (vi ~ ajpjv)


= L aj (vIPjv) [by (1.20) and (1.21)]

Let us look at the inner products (vlPjv). We know from (1.27) that any
\·xpression of this form yields a real number. Now, in terms of the spectral
Il ll'asure of A, the projector Pj is the "step" by which P(x) changes at aj.
Wri ting P«aj) for the greatest value of P(x) when x < aj, we have P(aj) =
1'( Cl i ) + Pj, and so

(VIP(Clj)V) = (vIP«aj)v) + (vIPjv)


whence

(vIPjv) = (vIP(aj)v) - (vIP«aj)v)

W see that (i) (vIP(x)v) increases monotonically as x moves up the real


1111mb r line, and that (ii) (vIPjv) is the change in value of (vIP(x)v) at
al . Thus each term in the sum in (1.34) is the product of a real number aj
.I l1d lh hange of value of (vIP(x)v) which occurs there.
54 Th e lru e lllrl' of Q 111111 I 11111 'fltl' lI/ll

Turning to the continuous as " MH.I forsaking mathematical rigor, w can


think of d(vjP(a)v) as the infinite imal hange in (vjP(x)v) which occurs at
x = a. An analogue to (1.34), using an integra l, now gives a generalized
version of the spectral decomposition theorem (see Fano, 1971, chap. 5.8).

0.35) For any Hermitian operator A there is a spectral measure {P(x» such
that, for any vector v,

(vIAv) = L"'", x d(vIP(x)v)


As an example of the spectral measure associated with an operator with a
continuous spectrum, consider the operator x (the "position operator") on
the space U of square-integrable complex-valued functions of x (see Section
1.11). The spectral measure of x (which we parameterize by the real number
y to avoid confusion) is the family {E(y)} of operators on L2 such that, for any
function ¢ in U,

E(y)¢(x) = ¢(x) for x =:; y


o for x > y

Notice that E(y) is indeed an operator on U, mapping functions onto func-


tions as required. It is easily shown to be linear, idempotent, and Hermitian,
and is thus a projection operator. (To show Hermiticity, we need the defini-
tion of an inner product on U given in Section 1.11.) Equation (1.35) can be
shown to hold in four steps (Jordan, 1969, p. 43), as the reader may care to
verify (**).
To revert to the general case, the spectral measure P(x) associated with an
operator A is a mapping of real numbers to projectors. We can extend this to
provide a mapping of measurable subsets of the real line to projectors by
writing, for every semi-closed interval ~ = {x: a < x =:; b},

p& = P(b) - P(a)

and extending this to other measurable sets of reals in a straightforward way


(see Fano, 1971, chap. 4).
We see that, to each Hermitian operator A and measurable subset ~ of the
reals, there corresponds a projection operator P1 defined in terms of the
spectral measure P(x) associated with A . The existence of P(x), and hence of
P~ for any measurable set ~, is guaranteed by the sp 'ctral decomposition
theorem.
Vc'c/or S /IIU't'S 55

To slImmariz(.': Le t :lJ(n~) he.' th . scI of all Borel subsets (measurable sub-


M' tS) of th 'rcal . Th 'n, given a Il ermitian operator A on a vector space 'Y,
till' spe tral decomposition theorem specifies, for each ~ in ~(IR), a unique
proje tion operator P~ . Stretching our previous usage, we call the family
( P ~: !\ .13(IR)) the spectral decomposition of A. It has the following proper-
ti t's .

(/ .16) In the family {P~: L\ E ~(IR)}

(/ .161/) P~=I; P~=Po

and, for all L\, r E ~(IR)

(I ,I()/I) if L\ r, then P~ :5 Pf!


~

(I HH~ if L\ and r are disjoint (~ n r = cP), then P~ and Pf! project


onto orthogonal subspaces of 'Y.

Wt' will meet these operators all the time in our discussion of quantum
IlIl·ory.

1. 16 Hilbert Spaces
Thl' vector spaces we shall use are known as "Hilbert spaces," a term coined
hy von Neumann (see Stein, 1972, p. 427, n. 10). A Hilbert space is just a
vt'ctor space on which an inner product has been defined, and which is also
"'lInplete: a vector space is said to be complete if any converging sequence of
vt'ctor in the space converges to a vector in the space. All finitely dimen-
i()na l vector spaces are complete. To show what's involved in the infinite
,.. ,se, let us look at a space which does not meet this condition.
onsider the space 5 of all finite sequences of real numbers. 5 includes

So = (1)
SI = (1 , t )

S2 = (1, t, t )
56 The 51ruclu ft' of /11111 III III '/'111' 0 11/

Now so , Sl' . . . forms a converging s 'q u 'n c on 5, but its limit, that is, the
sequence to which it converges, is infinite, and so does not lie in 5. Thus 5 is
not complete and so is not a Hilbert space.
On a related topic: in the infinitely dimensional case one should distin-
guish between subspaces and closed subspaces. Any subspace contains all
linear combinations of any finite set of vectors within it [see (1.23)]; a
subspace is closed if, additionally, it contains the limit vector of any con-
verging sequence of vectors within it. Thus a closed subspace of a Hilbert
space is itself a Hilbert space. Quantum mechanics deals with closed sub-
spaces; however, since the examples presented in this book are almost all
finitely dimensional, the distinction will largely be ignored in what follows .
This concludes our hasty introductory survey of vector-space theory. In
Chapter 5 I return to a few selected topics, prompted by some questions
which arise in the discussion of quantum theory in Chapters 2-4.
2
States and Observables in
Quantum Mechanics

To understand the conceptual structure of quantum mechanics we need to


how such notions as the state of a system and an observable quantity are
I'll.' '
represented within the theory, and how they are used in making predic-
ti ons. The mathematics used is the mathematical theory of vector spaces,
,)I1d in Chapter 3 I will discuss why this is a suitable candidate for the job in
ha nd . Prior to any discussion of quantum theory, however, I will look at the
way states and observables appear in classical mechanics. This approach
offers a useful introduction to these topics on two counts. In the first place,
cl assical mechanics is more familiar to many of us than is quantum theory;
He ond, it's instructive to compare the roles these concepts play in the two
theories, since quantum mechanics appears anomalous to us precisely
where it departs from our classical expectations .

. 'J Classical Mechanics: Systems and Their States


Th e formulation of classical mechanics I shall use is essentially that given by
Il a milton and Jacobi in the nineteenth century (see also Gillespie, 1970). A
classical system consists of a single particle or of a set of particles. Some of
I h ' particles' properties, like their masses, remain constant with time;
oth ers, like their positions, vary. Thus, for a complete description of the
system and its behavior we need to know, first, the set TIc of its constant,
1I nchanging properties and, second, how it is at a particular time, that is, the
sl'! nv of the instantaneous values of those quantities which vary with time.
We al so need to know the set A of laws which govern both the interactions
1wt ween the particles and also their interactions with their environment. For
insta n , if the particles are electrically charged, they will attract or repel
(',1("h oth er, and they will also experience forces if the system is placed in an
58 '1'111' lru elllrl' of QIIIII/IIIIII '1'11""11/

electric field. It is these laws which dt'I~'rrninl' how th e system will evolve as
time goes on.
The position and momentum of each particle are particularly significant
members of TIy. (The momentum of a particle is the product of its mass and
its velocity.) We express this by saying that specification of the position and
the momentum of each particle at time t gives us the state of the system at
time t. Once the state at time t is specified, then specification of TIc and A
determines the values of all the properties in TIy at that time.
As an example, consider a system of charged particles. The electrostatic
potential energy of that system at time t is determined by (a) the relative
positions of the particles at that time (given by the state), (b) the charge on
each particle (given in TIc) and (c) the Coulomb law of electrostatic force
(given in A).
Classical mechanics is usually taken to be deterministic; a complete speci-
fication of TIc and the state at a given time would determine the values of TIy
at all other times, provided the system remains isolated. I return to this point
in Section 2.4.
As I mentioned, the state of a system is given by the positions and the
momenta of the particles which compose it. Since physical space is three-
dimensional, to specify the position of each particle we need three numbers
(or position coordinates) qx, qy' and qz to locate it relative to an appropriate
coordinate system. Similarly, in order to specify the momentum fully, so
that we know not only how fast the particle is going but also in what
direction, we need three more numbers Px, Py' and Pz, the momentum coordi-
lIates, often called the components of momentum, parallel to the three axes of
our coordinate system. For each particle these six numbers are independent
of one another. Thus, given a system of n particles, the state is specified by a
total of 6n numbers.
In the same way that we can think of a pair of numbers as specifying a
point in a two-dimensional space like the plane of the paper, and of a trio of
numbers as representing a point in three-dimensional space, we can say that
the state of a system is represented by a point in 6n-dimensional real space.
All that we mean by this is that 6n independent real numbers are needed to
specify the state. This abstract 6n-dimensional space just consists of all
possible sequences of 6n real numbers; it is called the phase space for the
system. We denote the phase space by n and the state of the system by w.
Clearly, wEn.
For illustration I will often consider the simple case of a single particle
constrained to move in one dimension. For this particle the phase space can
be represented by a plane (that is, it can be drawn on the paper); specifica-
tion of the x and y coordinates of any point in the plane picks out a possible
,' ;'rlfr '/l lIlIrI ()/),~ r ' I'7lIlI!l/ ':< ill (}I1/u/lllnl MI'l'iullril'H .'19

state of the partici(' h 1( ' lIilll'. li S its (singl e coordinate of) position, q, and its
momentum, p; w ' h.lVl' (I) (q,p) .

2.2 Observables and Experimental Questions


Let us look in more detail at the way, from n c ' A, and the present state of a
system, its other properties are deduced. The total kinetic energy, for exam-
ple, is determined by the kinetic energy of each particle, T, and this in tum is
determined by its mass and its momentum. For each particle we have,

where m is the mass of the particle in question. I mentioned electrostatic


potential energy in the last section. In contrast, kinetic energy is just the
energy due to motion; the position coordinates of the particles are irrelevan t.
The kinetic energy of the whole system is just the sum of the kinetic energies
of the individual particles.
We call physical quantities like kinetic energy observables. Like most com-
mentators, I find this usage unfortunate; like them, I will continue to employ
it. The simplest examples of observables are position and momentum: their
components can be read off from the state by looking at the appropriate
coordinates. More generally, with each observable quantity A we associate a
function fA which, for every point in the phase space (in other words, for
every state of the system), gives us a real number, the value of A. In mathe-
matical terms, to each observable A there corresponds a function fA: n -+ IR.
Thus, in the case of the single particle moving in one dimension we have,

T = fT (q,p) = ;~
Most theoretically significant quantities in classical mechanics have a
continuum of possible values, but experimentally, of course, we content
ourselves with the rationals, and it is possible to construct artificial" observ-
able quantities" which take on only certain discrete values; an example is
"the observable whose value is 1 when the momentum is positive, and 0
otherwise." To call such quantities "artificial" is not to dismiss them: any
method of testing a system which just gives a yes/no (or pass/fail) answer
measures an observable of this kind, and we can develop an alternative
account of the notion of "state" in terms of such tests. I will return to our
simple example to show what is involved.
60 Til e S/ru c/llre of Q llall/IIIII '/'11 1'1 11 1/

Any measurement made on a sys tem yidds answers to qu estions we ca n


ask about it. If we obtain a measurem ' nt of kinetic energy, for exampl e, we
answer a whole set of questions of the form, " Is the kinetic energy greater
than I?" "Is the kinetic energy between 1 and 2?" and so on. If we know the
state of the system we can give a definite yes or no answer to each such
question. We can now ask, "What does the state have to be in order that the
kinetic energy shall lie between 1 and 2?" In this case the answer is that q can
take any value but that Ipl must lie between .fim and 21m, values which we
obtain from the formula for kinetic energy given above. For a particle of unit
mass (that is, for which m = 1) the region of the phase space for which either
.fim < P < 21m or -.fim > p > - 21m is shown in Figure 2.1. If, and only if,
the state of the system lies within this shaded area can we say that its kinetic
energy lies between 1 and 2.
Similarly, for any question we care to ask, there is a region of the phase
space that corresponds to it. Consider the vertical line on the diagram: this
corresponds to the question, "Is the position of the particle 3 units to the
right of the origin?" As the state of the system alters, the point representing
it moves around the diagram, and at any time the answer to any experimen-
tal question will be yes or no, depending on whether at that time the point
lies within the corresponding region or not. Formally, we may regard any
given state as acting as a two-valued function on the set of experimental
questions, that is, as assigning to each question in the set either the number 1

+--+--+-I---q
2 3

Figu re 2.1
!; /III, ' 111111 O IIN/'IlII/IIII's; /1 )1111/1111111 Mer/IIII/ics 61

(for yl'!-J) or 0 (for no). In thi s vein I ,t us d note the qu stion, " Does observ-
,Ible A hav a valu within ~?" (where ~ is some subset of the rea Is) by(A,~),
and the value assigned to it by the state by w(A,Ll), thus making it clear that
(I) i a function. We then obtain

w (A,Ll) = 1 if and only if !A(w) E Ll

To see that this equivalence holds, consider the conditions under which
l'ach side is true. The left-hand side, w(A,Ll) = I, is true when the state of the
!-Jystem is such that the experimental question "(A,Ll)" - that is, the question
"Does the observable A have a value within Ll?" -receives the answer yes.
But, on the right-hand side, [A(W) gives us the value of the observable A
when the system is in state w; it follows that [A(W) E Ll just when w(A,Ll) = 1.
The experimental questions we deal with are all of the form (A,Ll), and to
each of them corresponds a region, technically a subset of the phase space.
I.a ter on we shall be concerned with the algebraic structure of the set of
t'xperimental questions in classical mechanics; un surprisingly, it has the
structure of the set of subsets of a space.
In the analysis above, the state appeared as a function mapping experi-
mental questions into 1 (yes) or 0 (no). If this account of it seems willfully
obscure, the following implausible narrative may be helpful. Two experi-
menters, one in Moose Jaw and one in Medicine Hat, regularly receive
consignments of identical physical systems. Both of them proceed in the
sn me way: each new system is treated in some way or other ("prepared")
,Ind then tested. Figure 2.2, which should be thought of as a pair of flow

Medicine Hat
Preparation

IMe~od I · 8< Test


Pass

Fail

MOOSeJ~ •
~\
Figllre 2.2 Experiments in Moose Jaw and Medicine Hat.
62 '/'I/(' Slm clllrt' IIJ (JIIII/IIIIIII 'l 'It,'tIIl/

charts rather than as sk 'teh 'S ()I \'x IWI'llwlll al arrangements, hows the
principle. Now imagine tha t ea h of tilt' l'XP 'rimen lcrs has a variety of
methods of preparation at his or her dispo al and that the methods they use
are quite different. All methods of testing, on the other hand, are common to
both, and all their tests are of the pass/fail sort. Clearly, they can soon find
out whether, despite the differences in the modes of preparation, systems
prepared using Method X by Medicine Hat Man are in effectively the same
state as those prepared using Method Y by Moose Jaw Woman. They just list
the tests run on these systems and compare the results. They will also find it
convenient to refer to a prepared state not by specifying its method of
preparation (since these are not common to both experimenters), but in
terms of the test performances which identify it. Thus the state specification
might read, "Test A, pass; Test B, pass; Test C, fail; . . . " and so on. Butthis
is just to regard a state as assigning a value to each experimental question, in
other words, to treat it as a function. To a set theoretician, in fact, a function
is precisely a set of ordered pairs like those we have here.
The two experimenters cannot know whether their specification of a state
is complete, that is, whether there is no further test which, with additional
equipment, would sort out some apparently homogeneous state still further.
If their systems are classical, this knowledge would of course be available to
them if they could establish the components of position and momentum for
all the particles of their prepared systems. As we have seen, classical me-
chanics tells us that all significant tests are tests of the values of various
functions of these variables.
From this discussion it seems that in classical mechanics we have two
ways of thinking of the state of a system. We defined it as a sequence of 6n
coordinates (where n is the number of particles in the system), each of which
tells us a component of position or momentum of a particle. This can be
regarded as a description of the system: the specification of the state is
effectively a list of some of the system's properties. On the other hand,
when we regard it as a two-valued function of the set of experimental
questions, then we are drawing attention to the system's dispositions to
behave in certain ways. The distinction between properties and dispositions
may be challenged; all properties, it may be argued, are just dispositions to
certain kinds of behavior. I will put this question to one side; for the present I
will assume that such a distinction can be made (but see Section 10.2). This
granted, then the specification of the state in classical mechanics can be said
to have two distinct aspects. As we shall see, in quantum mechanics this is
less clearly so: while the specification of the state still serves to summarize a
system's dispositions, its descriptive role is moot.

,
,'; / 11 /1 '1 II/Itl () 1/NI·'III1IJ/('.~ III (J ll llltllIlII M I·('''II"i('.~ 63

..1 Inl es 0111/ OIJ:wru/I/J! es ill Q uantum Th eory


Th c sta tes I dea l with in this section are the so-caIJed pure states of quantum
mc hanics; in Chapter 5 1 extend the discussion to mixed states. In quantum
Iheorya pure state of a system is given by a vector in a Hilbert space. For
ce rtain purposes we need not specify all the components of this vector: for
instance, if we are only interested in the spin of an electron we need only
look at two components, whereas if we are interested in observables which
depend only on position and momentum we can disregard those compo-
nents which refer to the spin. This is why for certain examples (those to do
with spin) I shall use pairs of complex numbers to represent the state of an
l' lectron, while for others I shall represent its state by a function (which, as
we saw in Chapter 1, is an element of a vector space of infinite dimension-
ality). Effectively, an electron has both a spin-state and a position-state.
When these states are pure, each of them can be represented by a vector; the
spin-state vector lies in a two-dimensional Hilbert space and the position
v 'ctor lies in an infinitely dimensional space. Both vectors are normalized,
Iha t is, of unit length. Because the spin-state and the position-state are
independent, and much of what I say applies equally well to either, I wiIJ
usua ll y use the term "state" to refer to just one of them. I will call the Hilbert
space in which any state is represented the state space for the system.
Thus, as in classical mechanics, states are represented by points in a space.
Il owever, a classical phase space is finitely dimensional (unless electromag-
n 'tic field theory, which requires infinite dimensionality, is being consid-
t'red), whereas the Hilbert spaces used in quantum theory may be infinitely
d imensional. Further, two different vectors u and v may both refer to the
same state, if they both lie within the same ray (if, that is, there is a complex
nu mber c such that u = cv). Indeed, it is somewhat more precise to regard a
ray as representing a pure state (and I will adopt this approach in Chapter 5),
bu t at present the manipulations we perform will involve a representative
vector from that ray, and so we take the vector itself to specify the state.
And, as I have mentioned, it is assumed that the vector is normalized.
The radical differences between classical mechanics and quantum me-
chanics appear with the representation of observables. Instead of the real-
va lu ed functions of classical theory, quantum mechanics uses Hermitian
operators in the Hilbert space to represent observables. Typical examples
at" ' the 2 X 2 matrices which represent the components of spin of a fermion,
,l nd the operators x and - id/dx (on the set of square-integrable functions of
x) whi ch represent position and momentum, or, more strictly, their compo-
II cn ts in the x-direction.
64 Th e Siru clll rl' of Q IIIIIII II III '1'1// '0111

Many but not all of these op rn tor odm i l 'ige nvectors; as noted in Sec-
tion 1.15, a notable exception is the positj, n operator x. Such exceptions we
will return to later; for the moment we wili confine discussion to the opera-
tors which admit eigenvectors, that is, to the case when, for the operator A,
there are vectors VI' V2' . . . such that, for each ;, AVj = ajvj(and, since Ais
Hermitian, each aj is a real number).
In these cases, the eigenvalues aI' a2 , • • • of the operator are the possi-
ble values of the observable quantity which the operator represents. We can
see immediately that this aspect of the theory gives us very different results
from classical theory: instead of a continuum of possible values, the observ-
abIes we are now dealing with can have only certain specific values. A
measurement of the observable A represented by A will yield a given value
aj with certainty, provided that aj is an eigenvalue of A and that the state of
the system on which the measurement is carried out is represented by the
corresponding eigenvector. In general, however, the state, v, of the system
will not be an eigenvector of A; in such a case we cannot say with certainty
what the result of such a measurement would be. Inste i we assign to each
eigenvalue (or possible value) of A a probability calculated as follows .
Let Vj be the eigenvector with aj as corresponding eigenvalue, and denote
by Pi' the projection operator onto the ray containing Vj (see Figure 2.3).
Then, according to quantum theory, the probability pv(A,aj) that a measure-
ment of A conducted on a system in state v will yield a result aj is given by
(2.1) pv(A,aj) = (vIPi'v) = IPi'vI 2

Figure 2.3

,
,' ;1111,' 1111111 ()fI/WnJ lIl1h'H il/ (.JIII/I1III1I1 MI'r/lllllicH 65

Si nce v is normnli z('d, W(' know from a previous dis ussion (Section 1.6) that
th ,inn ' r produ t (vll'l' v) ca n only take values 0 and 1. In other words its
vn lu es are appropriate to probability measurements.
As examples, consider the operators Sx, Sy, and Sz used to represent three
omponents of spin of a fermion. They have familiar matrix representations:

S
y
= .!.(?
2I
-i)0 Sz =
1(1 -10)
'2 0

These are just the spin matrices encountered in Section 1.7. Each operator
has eigenvalues +t and -t, and these eigenvalues are the only possible
va lu es of each component of spin of a fermion . (Note that we are working in
natural units of spin, measuring spin in multiples of Planck's constant h .)
The eigenvectors of Sx are the vectors '4 and)L, where, as in Section 1.7,

Similarly, for Sy and Sz we have, respectively,

y-='2 1( 1- i)
-1-;

R call also that the projection operator P y+ onto the (one-dimensional)


subspace spanned by y+ is given by

In Section 1.7 we found that (1) (y+IPy+Y+) = 1; (2) (Y-lPy+Y-) = 0; and


(3) (x+IP y+x+) = t. We can now interpret each of these results as the proba-
biI ity that, when a measurement of a particular observable is carried out on a
system in a certain state, one particular value will appear. In each of the
three cases we are evaluating the probability that an Sy measurement will
yield the result+ t. The eigenvector of Sy with corresponding eigenvalue +t
is y+, and so P y+ is the appropriate projection operator to use in Equation
(2 . 1). The three results correspond to three different states of the particle,
the sta tes y+, y_ , and x+, respectively. In the first case a measurement of Sy
6) Till' ~ /r/l c l/l rl' oj )/1/11/1/11// ,/,It,'11 111

will yield +1 with certainty (Wl' soIY 111 .. 1 YI is nn eigl'lls/a l l' of Sy); in lhe
second the probability of such a resull is Zl'ro, and in the third the chances of
such a result are fifty-fifty . Of course, the slale of the particle need not be an
eigenstate of any of these particular components of spin. For instance, it
might be represented by the (normalized) vector

Call this state u. Then we can quickly show that, if a measurement of Sy is


performed

If a measurement of Sx or Sz is performed

9 16
Pu(Sz, +t) = 25 and Pu(Sz, - t) = 25

In each case, there are only two possible outcomes, and so the probabili-
ties of these outcomes add to unity.
Before dealing with operators which do not admit eigenvectors, I will
amplify a remark made earlier.
We denote by Lf the one-dimensional subspace containing the eigenvec-
tor Vj of the operator A, to which corresponds the eigenvalue aj . Briefly, Lf is
the subspace onto which Pf projects. Then a measurement of A yields aj
with certainty if, and only if, the vector v representing the system's state lies
within Lf. In that case Pfv = v, and

pv(A,a j ) = (vIPfv)
= (vlv)
=1

With this result in mind, we can now extend the discussion to include
those operators which, like the position operator x on the Hilbert space of
square-integrable functions of x, admit no eigenvectors. The possible values
of position lie anywhere along a continuum, and the operator has a co ntillu-
8 /111, '/1 II/ld ()/J.~ I "(lIII"I '.~ iI/ (JIIII/IIIIIII Medl(/I/il's 67

ous spectrulI/ (see Sl'clioJ) 1. 15). Bul, whatever spe ies of observa ble we are
dealing with, the foll o wing holds true.
Let A be an operator representing an observable A . Then to each interval
Ll on the real line there corresponds a subspace L~ of the Hilbert space, such
that a measurement of A yields a value within Ll with certainty if and only if
the state v of the system lies within L~. Let P~ be the projector onto L~ . The
expression (2 .1) for the probability of a particular experimental outcome
now has a straightforward generalization. We write Pv(A,Ll) for the probabil-
ity that a measurement of the observable A conducted on a system in state v
will yield a result in the interval Ll, and obtain

(2.2) Pv(A,Ll) = (vIP~v)

This is the fundamental equation, sometimes called the statistical algo-


rithm, of quantum mechanics, relating experimental outcomes to the prob-
abilities of their occurrence. What are the projectors P~? They are the op-
erators we met at the end of Section 1.15, belonging to the spectral
decomposition of A.
At the risk of tedious repetition, let me review what has been said, once
more using the Pauli spin matrices to illustrate the general result. In quan-
tum mechanics all observable quantities are represented by Hermitian oper-
ators on a state space 7f:. Associated with each such operator A is a family
(P~: Ll Ei3(IR)} of projectors on 7f:. If two subsets Ll and r of the reals are
disjoint (have nothing in common), then two projection operators P~ and P~
project onto orthogonal subspaces. (The converse, however, is not true.) In
the case of the spin matrix Sy, there are just four projectors in the family,
corresponding to these four cases:

(1) If -t ~ Ll and +t ~ Ll, then P1>- = Po;


(2) if -t E Ll and +t ~ Ll, then P1>- = P y_;
(3) if -t ~ Ll and + -!- E Ll, then P1>- = P y+;
(4) if --!- E Ll and + -!- E ll, then P1>- = I.

When Ll contains both + t and - t , as in case (4), the projection operator P1>-
is the identity operator, which maps every vector in 7f: onto itself; it is the
projector onto the whole space. In case (4), for any pure state v, Pv(Sy,Ll) =
(vllv) = (vlv) = 1. Experiments are certain to yield a result within Ll, since
each outcome is either + t or -t. When Ll contains neither of these numbers,
as in case (1), the projection operator is the zero operator, which maps all
v Lors onto the zero vector. In this case we have, Pv(Sy,Ll) = (vlPov) =
68 1'1/ 1' 'lfII('/l/rI' of Q 1/1111 I 11111 ,, '111' /1/ 1/

(vIO) = O. The interpreta tion of th is reSll1t is obvious. In cases (2) a nd (3), Ll


contains just one eigenvalue o f Sy; in lh 'sc ases the projectors P~ are the
projection operators onto the rays containing the corresponding eigenvec-
tors, and so (2.2) reduces to (2.1):

[case (2)]
[case (3)]

The generalization of this example, to the case of an arbitrary Hermitian


operator which admits eigenvectors, is straightforward. Consider, for in-
stance, an operator A admitting·eigenvectors VI' V2' . . . , with corre-
sponding eigenvalues aI' a2t . . . Then for any interval Ll on the real line
which contains just one of these eigenvalues, say, we have P~ = Pf, ai
where Pf projects onto the one-dimensional subspace containing Vi ' If Ll
contains just ai and ai' then P~ projects onto the two-dimensional subspace
spanned by Vi and Vi' and so on.
With complete generality, whether we are dealing with an observable
with a discrete or with a continuous spectrum, we can say that to each
question of the form, "Will a measurement of A on the system yield a result
in the interval Ll?" there corresponds a projection operator P~ onto a sub-
space L~ of the appropriate phase space. The subspace L~ (or, equivalently,
the projector P~) can be said to represent the experimental question (A,Ll).
Now, when the idea of an experimental question was introduced in the
discussion of classical mechanics, each such question corresponded to a
subset rather than to a subspace of the state space (indeed, the notion of a
subspace applies only to vector spaces). In the classical case, a knowledge of
the state enables us to answer yes or no to each experimental question
(depending on whether or not the point representing the state lies within the
relevant subset of state space). We said that the state acted as a two-valued
measure on the set of experimental questions. In contrast, knowledge of the
quantum-theoretical state only enables us to give a definite yes or no in a
few special cases. In general, the state gives us the probability of a certain
result: the state is effectively a probability function on the set of experimen-
tal questions, giving to each question a value in the (closed) interval from 0
to l.
Let us express this formally. To the question "Will a measurement of A
yield a result in the interval Ll?" there corresponds a subspace L~ of the
Hilbert space. Let P~ be the projection operator onto L~ . Then each state V of
a system defines a function,uv such that 0 :5 ,uv(L~) :51, namely, the function
such that ,uv(L~) = (vIP~v) = Pv(A,Ll).
In quantum mechanics there are strong reasons for de nying tha t specify-
S llIl t' ll (lilt! O /l!Jl'ronlJ lcs ;11 Q IIIIIIIIIIII M ccltnll;cs 69

ing th > state doc mor' thnn assign probabilities to experimental questions,
n thi view, the function/l v is conceptually prior to the vector v; this vector
then appears as a convenient mathematical way to represent the function in
question, and, whereas in classical mechanics the state could be said to have
both a descriptive and a dispositional aspect, in quantum theory the de-
scriptive aspect disappears and we are left with the dispositional aspect
alone.
This is a view which, ultimately, I will reject (see Section 10.2), but it is

Tab le 2.1 States and observables in classical and quantum mechanics

Classical mechanics Quantum mechanics

tate space 6n-dimensional real Hilbert space (complex


space, n (the phase vector space) often
space) infinitely dimensional
Pure state Point in phase space: (Normalized) vector in
wEn state space: v E 7i,
Ivl = 1
bservable A Real-valued function on Hermitian operator on
phase space fA: n - IR state space A: 'Ii- 'Ii
Possible values of Range (fA)' usually a Two cases:
obscrvables continuum (1) A has a discrete
spectrum (admits
eigenvectors); possible
values are eigenvalues
of A
(2) A has a continuous
spectrum (no eigenvec-
tors); continuum of
possible values
Experimental question, Subset of phase space Subspace of state space
" Will measurement of f;;.l (fl.) ~ n L~~ 'Ii
A yield result in fl.?"

Answer to question Yes/no answer: Probability answer:


yes if and only if Pv(A,fl.) = (vlP~v)
UW)Efl.
Alterna tive way to Two-valued function on Function mapping
r('gdrd state set of experimental experimental questions
questions into [0,1]
70 Fil e lru clllrc of QIIIIIIIIIIII 'I'///'(I/ 1/

true that there is no obvious analogu • in quantum theory for the equation
W = (p,q) of classical mechanics, which specifies the state in terms of the
properties of the system. On this, more later; this is a good point at which to
pause and summarize what has been said. To this end, Table 2.1 sets out the
main differences between the mathematical representation of quantum
theory and that of classical mechanics.

2.4 Probabilities and Expectation Values


Two short mathematical notes appear as addenda to the previous section.
Both show how we can use the fundamental Equation (2.1) to get further
results.
The first applies whenever we have a Hermitian operator, A, with a
discrete spectrum. Then a set of normalized eigenvectors, v l ' V2, . . . ,
spans the whole space; for simplicity I will assume that the corresponding
eigenvalues, a l , a2' . . . ,are all distinct (that there is no degeneracy). It is
trivial to show that in that case the eigenvectors are all mutually orthogonal,
as noted in (1 .31). Since they span the whole space, we have, for any vector
v, v = CIV I + C2V2 + ... , and their orthogonality guarantees that the
values of the complex numbers, cI , c2 , • • • ,are uniquely determined for a
given v, and also that Lilcil2 = 1, provided v is normalized.
We now obtain a very simple expression for pv(A,ai)' the probability that a
measurement of A upon a system in state v will yield result ai: in this case,

The proof is simple. Let Pi be the projection operator onto the one-dimen-
sional subspace spanned by the eigenvector Vi' Since the eigenvectors are
mutually orthogonal, we have Piv = CiVi' It follows that

pv(A,a i) = (vIPiv) [by (2.1)]


= (VIPiPiV) [idempotence, by (1.26)]
= (PivIPiv) [Hermiticity, by (1.26)]
= (CivilciVi)
= Ci*(VilciVi)
= Ci*Ci(Vilvi)
[normalization]
,',ltll" 1 I"ttl ()/IHI"(lII/JI,..~ III ()IIIIIIIII", MI'/'''"II/I'S 71

Tlit' second fl'Slilt IN ,111 \'xpn'ssion for the exp eclfllioll va lli e of an observ-
,Iblt', that is, th 'avcrag 'v,llu 'we would expect to obtain if we measured the
v,IluC' of A in a large number of trials on systems all of which were in the
sa me sta te.
We denote the expectation value of A by (A). It is obtained by weighting
each possible outcome, aj, of the measurement by its probability pv(A,a j). As
before, we confine ourselves to those observables with a discrete spectrum;
I'll route to our conclusion we use a result argued for at the end of Section
1. 14, that, for an operator A admitting eigenvectors, A = LjajPj (where P j
projects onto the space spanned by the eigenvector Vj, as before).
We have, then,

(A) = LPv(A,aj)a j

= L(vIPjv)aj

= (vILajPjv)

(by the properties of the inner product), and so

( .4) (A) = (vIAv)

learly, although it does not appear in the conventional notation, (A) is a


function of v.
Those with a taste for such things may note that the summation sign
appears in three distinct usages in this brief derivation; I leave to them the
task of justifying these procedures.
More important, note that although (2.4) was derived only for an operator
with a discrete spectrum, it also holds quite generally. The general case, of
cou rse, would involve deriving (2.4) from (2.2) rather than from (2.1).
Note also that, in our presentation of quantum theory, we could have
postulated (2.4) rather than (2.1); in fact, when A is the projection operator
PI' (2.1) appears as a special case of (2.4). A projection operator (and the
subspace it projects onto) acts as an experimental question, which, as we
saw in Section 2.2, is a special kind of observable. To (Pj,l) corresponds the
question, "Will a measurement of A yield the result aj?" Pj is thus the
observable whose value is 1 when the measurement of A yields aj and 0
when the measurement yields any other result. (Recall that the eigenvalues
(If any projection operator are 1 and 0.) Its expectation value is the weighted
,Iv('rage of yes and no answers it elicits, the probability, in other words, that
,1 mea urcment of A will yield 0 1 , Thus (PI) = pv(A,a j ), and since, from (2.4),

(P,) '"" (vIP/v), we obtain Equation (2 . 1): pv(A,al) = (vIPjv) .


72 Tile tru e/IIr1' of Q IIII II/II III 'I'lli'll/II

2.5 The Evolution of State ill lassical Mechanics


Both classical mechanics and quantum mechanics specify how the state of a
system evolves with time. Obviously, at any instant that a classical system
has a nonzero momentum, its position is changing with time, and under the
action of a force it will change its momentum. The forces dealt with by
classical mechanics are those, like gravity, which depend on the relative
positions of pairs of particles or the position of each particle in a field of
forces . In the Hamilton-Jacobi treatment of the evolution of states, talk of
such forces is replaced by talk of energy, and anything that can be said in
terms of the former can also be said in terms of the latter; for instance, if we
have a particle on the end of a spring, we can specify the behavior of the
spring-and hence the motion of the particle-either in terms of the force
needed to stretch or compress it by a given amount, or in terms of the
mechanical energy stored in it when we do so.
Like any other property of the system, its total energy is determined by its
state. It is a function of the position and momentum coordinates of the
particles comprising the system. Write q for the sequence of the numbers
giving all the (components of) position coordinates for the particles, and p
for the sequence giving all the momentum coordinates; then (q,p) specifies
the point in phase space which represents the system's state. Thus we have

Total energy = H(q,p)

where H is a function known as the Hamiltonian function for the system.


It is this function which dictates how the state of a classical system evolves
through time. Since the state is specified by 6n coordinates, to establish how
it changes with time we need to know how each coordinate changes. It turns
out that their rates of change can be elegantly expressed by Hamilton's
equations, simple formulae involving the Hamiltonian function H. The rate
of change of any position coordinate qi (for example, the y-coordinate of
position of the mth particle) is expressed in terms of the dependence of H on
the corresponding momentum coordinate (in the example, the y-coordinate
of momentum of the mth particle), and conversely. For the whole system we
have 3n pairs of equations:

(2.5) dqi _ aH dpi =_aH


---
dt api dt aqi
Note that H is assumed to be differentiable, a point I will return to in the next
section. Provided that this assumption holds, the coordinates which specify
the state of the system at any time t appear as solutions to this set of
differential equations.
,' ;/11 /" 1 ,I/,t1 () "~; (' 111f1I!1I'S i ll (,J III/II/II/ll MI'tilllllics 73

Ju st to se ' th e tlwory in "ction, let us take a particula rly simple example.


on side r a system consisting of a single pa rticle moving in one dimension,
along a line. Assume further that this particle is in a force field such that the
for es it experiences are just those it would experience were it on the end of a
spring; in fact, we will talk as though that were the case. For simplicity, we
set the origin of our coordinate system to be the point occupied by the
particle when the " spring" is unextended, that is, when the force on it is
zero. (See Figure 2.4.)
Si nce we have a single particle moving in one dimension, two numbers, q
and p, suffice to specify the state. The phase space for the particle is a
two-dimensional space representable in the plane of the paper. As the
particle oscillates to and fro under the influence of the "spring," the energy
of the system at any instant is the sum of the kinetic energy of the particle,
,p 12m, and the energy stored in the "spring," kq2/2, where p is momentum,
/II is mass, k is the force on the particle per unit displacement from the origin

(numerically, the force needed to stretch the "spring" a unit distance), and q
is the position coordinate. We have, then,

2 k 2
Total energy = H(q,p) = L + -.!L
2m 2
It fo llows that

dq = aH =£ dp aH
-=--=-kq
dt ap m dt aq
A propos of these equations, note that dqldt is the rate of change of
position with time - in other words, the velocity v of the particle - and also
th at p = mv; thus the left-hand equation informs us that v = v, which is
rl'.lssuring, if not very enlightening. Note, however, that the right-hand
eq ua tion yields Newton's second law of motion, since kq is the force pushing

I
I

-- - I_~ww
.. .wr. t q
I

I
q=o
Fig ll re 2. 4
74 'l'hl' 5/ rue/II n' oj Q 11 1111/11111 '/'1l/ 'fllll

Figure 2.5 Variation of position and momentum with time.

the particle back toward the origin. More relevant to our present concerns is
that these equations govern the evolution of the system's state. Assume for
argument's sake that the particle is displaced by a distance d and instanta-
neously at rest at the time t = 0, so that its initial state is (d,O). Then the
equations tell us that the particle's position and momentum as time goes on
are given by the graphs shown in Figure 2.5 . To put this another way, as
time goes by the state of the particle will follow the trajectory in phase space
shown by Figure 2.6; in the absence of retarding forces like friction, this will
be an ellipse.

2.6 Determinism
In the last decade of the eighteenth century, the Marquis de Laplace wrote:
We ought then to regard the present state of the universe as the effect of its anterior
state and as the cause of the one which is to follow. Given for one instant an
intelligence which could comprehend all the forces by which nature is animated and
the respective situations of the beings who compose it-an intelligence sufficiently
vast to submit these data to analysis-it would embrace in the same formula the
movements of the greatest bodies of the universe and those of the lightest atom; for
it, nothing would be uncertain, and the future, as well as the past, would be present
to its eyes. (1951 [1814], pp. 3-4)

In this way he formulated the doctrine of determinism, the doctrine that,


given the present state of the world, all future ('v ' nl~ nn' int'xornbly detcr-
,' ;/(//1 ' 1I1It! ()/' I"'III//I/" I' ill (,>"'"'/111/1 Mcdllllll l'S 7!,

tllilwd by tlw Idws o f 11,111111 ' I..lpl,l(:c put this ml'laphysicalthcsis in 'piste-
mi l' ll'nns by talking o f the knowledge availa ble to a "supermind "; this
su pennind ould work out th e answer to any question about the future or
the past if it had a complete description of how things are now-the
" situ ati on of the beings" who comprise the world-and the forces which
dt·t ermine how the world changes with time.
Thc epistemic thesis is stronger than the metaphysical one. The meta-
ph ysica l thesis is that (1) there is exactly one state WI of the world at time tI
which is physically compatible with its state Wo at to (tI > to); further, (2)
these states, Wo and WI' determine the values of all physical quantities in the
world at the times in question. The metaphysical thesis might be true, and
th e epistemic one false: WI might not be calculable from w o , even by a
supermind. (For a discussion, see Earman, 1986, chap. 2.) Both theses have
b 'cn associated with the classical world picture. Indeed, the stronger, epi-
stemic, thesis finds precise expression in the Hamilton-Jacobi version of
classica l mechanics. Or so it would appear.
onsider a system of particles. We may think of the specification of its
state as a precise formulation of what Laplace meant by the " situation of the
Iwings" which comprise it. In order to know all there is to know about the
present, a Laplacean supermind would have to know the state of the entire
universe. To deduce from this a description of the universe at any time in the
P,lst or future, this supermind would need in addition to "comprehend all
Ih e forces by which nature is animated," Of, equivalently, to know the
Il amiitonian function for the entire cosmos.
O ur minds, alas, fall short of the Laplacean ideal. Given a system of any
compl exity, the Hamiltonian may be impossible for us to ascertain, or too
Cllmbersome for us to employ. Nonetheless, Laplace's vision can, and in-
tIL' ·d did, function as a regulative ideal for classical physics. That is to say, a
meta physical presupposition, that the universe is deterministic, can govern

figll re 2.6 Trajc lory of slate of parli Ie through phase space.


76 Til e S/ru (' /lIrl' of Q IIIIIIIIIIII '['/11'1111/

the search for scientific laws. And ev 'n our fi nite intelligences can work out
what happens to a particle on the end of an (ideal) spring, as the example in
the previous section shows. We obtained the graphs in Figure 2.5 by solving
Hamilton's equations. Because Hamilton's equations are first-order differ-
ential equations, they can only be solved within a constant term; to obtain
unique solutions we plug in the particular values of p and q at one specified
time (in this case, when t = 0). But from the resulting graphs we can now
read off the state of the particle (qt,Pt) at any time t in the future, and from
this state we can deduce all its (mechanical) properties at that time. All this
knowledge is available to us through our knowledge of "the forces by which
nature is animated" (or, equivalently, the Hamiltonian for the system we are
looking at), and the "situation of the beings who compose it," in this case the
initial state of the single particle involved.
We can see why Laplace took the universe to be both classically governed
and deterministic, but the link between the two is not as clear-cut as he
assumed; the laws of classical physics do not entail the thesis of determi-
nism. As Earman (1986, chap. 3) has pointed out, classical physics can be
made deterministic only by the adoption of seemingly ad hoc assumptions.
Such assumptions are needed, for example, to ensure that the universe is a
closed system; within a framework of Newtonian space-time, they tum out
to be deeply problematic. Thus, far from entailing the deterministic thesis,
classical physics may not even be compatible with it.
Here I will set these fundamental problems on one side and merely
indicate how less problematic, but certainly nontrivial, assumptions must be
made if the Hamilton-Jacobi formulation of classical mechanics is to be a
deterministic theory.
If the state of a classical system of n particles is to evolve deterministically,
then a1l6n differential equations describing this evolution must have unique
solutions for any time t. To guarantee this, H must be continuously differ-
entiable (more precisely, it must be differentiable in principle) with respect
to q and Pfor all physically possible states of the system. The curves showing
the variation of H with each position and momentum coordinate must be
smooth, and exhibit no singularities. This is not an empty requirement. It
rules out, for instance, the view that atoms are incompressible spheres of a
certain definite radius which exert forces on each other only when they
touch. If they were, then the graph representing the force exerted by one
atom on another would leap incontinently to infinity as they made contact,
and the requirement would be violated (see Figure 2.7). On the assumption
that classical mechanics is true, this would mean not merely that no solu-
tions to the set of equations governing the evolution of th e universe were
calculable, either by our finite minds or by a supermind, but th nt no uniqu e
solutions to these equation existpd. Both versions of Ihe II1l'sis of del rmi -
,' ;/11/ 1' I 1I1It! O /IlWr'lJIIIJII'Il ill ) /11/111/1111 M l'rlltlllic !l 77

F F

,,"- ..... , / " ",- -,


"
I
\, \ 1
\
I
".

I
\ 1\ I
d \ d
\"
.... _ - , . /
I ,
....... _-/ / '
' ...... - -'" " ,--,,/
I

1\11 11 re 2.7 Force-distance graphs for (left) incompressible and (right) compressible
ph 'res.

nism, the epistemic and the metaphysical, would fail to hold; there could be
(.It Icast) two distinct states of the world at time t 1 , both of which were
I'ompa tible with a given state at to .

. 7 The Evolution of States in Quantum Mechanics


I,ikt' classical mechanics, quantum theory tells us how the state of a system
\' volves with time. The key role in the equation governing this evolution is
pl'l yed by an operator rather than by the Hamiltonian function, in line with
I hl' general principle that, in quantum mechanics, operators represent phys-
1\'.11 quantities. As in the classical case, the quantity in question is the total
\'Iwrgy of the system; it is represented in quantum theory by a Hermitian
operator H which we call the Hamiltonian operator for the system, The rate
o f change of the state v of a system is given by

( I ())
. av =
Ih - Hv
at
_Ind th is equation is known as Schr6dinger's time-dependent equation, or
>l\lI11e! imes simply as Schr6dinger's equation.
Ther is a n equivalent way to describe what happens as time goes on. It is
possible to use H to construct an operator U, which, as the notation implies,
IN.I fun tion of the time. We use this operator to obtain a simple expression
f or tl1(' state v, of a system at some time t in terms of its present state vo:

(/ I) V, U /VO
78 TIll' S lrllt'illrt' of )111111111111 '/,11/'0 11/

VI is the sum of an infinite seri 'S of (lp~'ralors:

Each tenn in this series is well defined in the algebra of operators, and the
series converges. Its sum is more easily expressed (see Section 1.5) as

(2.Bb) VI = e-itH / h

VI is not Hermitian; it is a unitary operator.

(2.9) We say that V is a unitary operator on <y if V is a linear operator on <y


which has an inverse, V-I, such that VV- I = I = V-IV, and, for all v
in <Y, IVvl = Ivl.

Unitary operators are the analogues in complex spaces of rotation operators


on 1R2 and 1R3 . They leave the lengths of vectors unchanged; thus if Vo is
normalized, so is VIVO = VI; a pure state evolves into a pure state.
The details of the calculation in Equations (2.8 a - b) need not concern us.
The important point brought out by Equations (2.8) and (2.9) is that, since VI
is determined by the operator H and the time t, the future state is uniquely
specified by these two quantities and the present state.
Thus, as far as the evolution of states is concerned, quantum mechanics
seems thoroughly Laplacean. How is it, then, that the theory is usually taken
to model an indeterministic world? The answer lies in the relation between
the quantum state and the values assigned to physical quantities. Recall
from Section 2.6 that a deterministic theory is one that (1) not only specifies
uniquely the evolution of a system's state, but (2) also assigns, via the state,
values to all the physical quantities associated with a system. Quantum
theory fulfills the first requirement, but not the second. As I remarked in
Section 2.3, a crucial difference between quantum theory and classical
mechanics is perhaps this: whereas classical states are essentially descrip-
tive, quantum states are essentially predictive; they encapsulate predictions
concerning the values that measurements of physical quantities will yield,
and these predictions are in tenns of probabilities.
But a bit more needs to be said. It's also true of classical mechanics that the
state descriptions it supplies yield predictions about the values that observ-
abIes will be found to have. Ideally, however, the probability assigned to
any experimental question by a pure state of classical th eory will be either 1
or 0; classically, probabilities can take values be twe nO ilnd I cilh r because
1lIl'.ISIII"('Illl'nl !,ron·ss"I-! .lft' I('ss Ih.1I1 id ,,,lor bt'cause information about the
.1.11" il-! k ss Ihan omplell'. In th 'quantum case, even given ideal measure-
Ilwnts and a precise sp ifi ation of the state, we obtain nonextremal values
of probability.
Thus, in the state-space models we supply for determinist (classical) pro-
,'('sses on the one hand and inherently probabilistic (quantum) processes on
Ilw other, the distinction between them appears neither as a radical diver-
gl'n e between accounts of the evolution of states, nor simply as a distinc-
Iion between descriptive and dispositional accounts of states. It appears as a
diff rence between the kinds of predictions a state makes available. Only in
Ihe determinist case are these predictions, as we say, dispersion-free.
But at this point we can scent a problem. Assume we have a quantum
s ystem Q and a measurement apparatus M. If the measurement process is to
conform to quantum theory, we would expect the state of the coupled
system Q + M to evolve according to Schr6dinger's equation, that is, deter-
ministically; nothing so far suggests that a complex system offers an excep-
Iion to that equation. But if we associate different experimental results with
(I iff 'rent states of M (its "pointer readings"), and if the evolution of Q + M is
dt'lcrministic, how is it that results have probabilities other than 1 or O? I
postpone discussion of this question to Chapter 9; for the present, a faint
whiff of the problem of measurement can be left to hang in the air.

1. 8 Theories and Models


T,lble 2.1 shows how states and observables are represented in quantum
IIH'ory; in Section 2.7 we saw how the time-evolution of states is expressed
III lerms of the action of a family of unitary operators on the vector repre-

~I('nting the state. Quantum mechanics, we may say, uses the models sup-
plied by Hilbert spaces.
Implicit in this way of presenting quantum mechanics is a general account
of scientific theories. A theory T displays a set of models within which the
hdlavior of ideal "possible systems" (or "T-systems") can be represented.
For a realist, at least, to accept T is to say that there exist actual systems
whi ch are T-systems. (For an antirealist but still model-theoretic view, see
V.II) Fraassen, 1980.) The actual solar system, for example, is (approxi-

1l1.ltcly) a Newtonian system, that is, a system representable within the math-
"mati al models supplied by the theory of classical mechanics. A system S is
" II/wllilim system if the behavior ofS is representable within a Hilbert-space
Illod ' I in the way I have outlined.
This model -theoretic account of a scientific theory is by no means
original - it can even be called "the new orthodoxy" in the philosophy of
80 TIll' S lru('/lIrt' of Qllllilllilit '/'11/'0/11

science. (See Suppes, 1967; ier', 1\)7\) ; Suppe, 1977, pp. 22L - 230 .) It
stands in contrast to "the received view" (th e phrase is Putnam's: Putnam,
1962), which takes an axiomatic approach to theories and emphasizes the
role of theoretical laws (see Suppe, 1977, pp. 3 - 61). While I don't quite
share Schopenhauer's view of the Euclidean method (it is, he said, as if a
man were to cut off both legs in order to be able to walk on crutches;
Blanche, 1962), I would reject any claim that an axiom system is the ideal,
canonical form for the expression of a scientific theory. The point is this. For
any axiom system there exists a class of models; Peano's axioms for arith-
metic, for example, have as a model the set of natural numbers. And within
science we are not interested in axioms for their own sake, but in the class of
models they define. It does not matter how this class is specified, provided
that the specification is precise. When we investigate a theory, demands
typical of the axiomatic approach-like the requirement that the specifica-
tion be expressed in a first-order language, or that the predicates of this
language be divided into two classes, observational and theoretical-give
undue prominence to linguistic matters and are extraneous to our concerns.
Thus van Fraassen (1980, p. 44):

The syntactic picture of a theory identifies it with a body of theorems, stated in one
particular language chosen for the expression of that theory. This should be con-
trasted with the alternative of presenting a theory in the first instance by identifying
a class of structures as its models. In this second, semantic, approach the language
used to express the theory is neither basic nor unique; the same class of structures
could well be described in radically different ways, each with its own limitations.
The models occupy center stage.

But when we say that quantum theory uses the models supplied by
Hilbert spaces, what sort of models are these? They are models in two
apparently dissimilar senses. In the first place, they are models as that term
is used in contemporary mathematics; in other words, they are mathemati-
cal structures of the kind described in Section 1.8, containing sets of ele-
ments on which certain operations and relations are defined. More surpris-
ingly, they are also models in the way that a Tinkertoy construction can be a
model of the Eiffel Tower. Just as a point on the model can represent a point
on the tower, so, for example, an operator on a Hilbert space can represent a
physical quantity.
The two senses are linked in the following way. When we recognize that
the Tinkertoy model is a model of the Eiffel Tower, we not only see that
points on the model represent points on the tower, but also that certain
important relations are preserved in this representation; for example, we
would expect the ratio of the overall height to the length of one side of the
.';'11,, '1 1111,[ O/ISI' lllil/JlCtl ill )/111111111/1 M,'dl/lII;cS 81

hoi St, 10 lw II1\.' tldn\\' fOl" bolh Ih e low ' r a nd th ' mod I. That is to say, we
,'x IW( llh e tow 'r ;)11<.1 the model to be isomorphic. But isomorphic structures
. lr~' just th e subject matter of model theory in the first, mathematical, sense.
The outline of quantum theory given in this chapter uses the mathemati-
\", d stru ture of HiJbert space (a model in the first sense) to provide a repre-
lI' lll a tion (a model in the second sense) of the behavior of systems. This
Ih' havior has itself been described in very abstract terms; there is a wide gap
hel ween the way a working physicist uses quantum theory and the account
of th ' theory I have offered. Of such accounts, Cartwright (1983, pp. 135-
I ~6) ays,

( )1\1' may know all of this and not know any quantum mechanics. In a good under-
I'.r,ltiunte text these . .. principles are covered in one short chapter. It is true that
I h,' Schrodinger equation tells how a quantum system evolves subject to the Hamil-
IOlli,ln; but to do quantum mechanics, one has to know how to pick the Hamiltonian.
III\" principles that tell us how to do so are the real bridge principles of quantum
""' 'hanics.

( '.II"lwright gives an instructive account of how an inventive physicist


h"dg's the gap by using models of particular processes "to hook up phe-
1l0llll'na with intellectual constructions" (p, 144). "To have a theory of the
'I' hy laser, or of bonding in a benzene molecule," she says, "one must have
IIl11d 'Is-for those phenomena which tie them to descriptions in the mathe-

1I1,lli al theory" (p. 159). These models, however, have a very different
'""clion from the mathematical model in which we represent states and
I,hut'rvables. They are essentially models in the second, Tinkertoy, sense,
whi ch represent actual entities, like a ruby laser, in terms of fictional ele-
1lll'Ill s ("two-level atoms" in this instance) whose behavior is amenable to
IllI'm'tica l treatment. These are just useful representations, simulacra of
wh,ltthey represent, and are contrasted with the underlying mathematical
III\'lIry: "a model- a specially prepared, usually fictional description of the
"ysll'm under study-is employed whenever a mathematical theory is ap-
1"H'd 10 rea lity . . . Without [models] there is just abstract mathematical
1I,"\'Iur " formulae with holes in them, bearing no relation to reality" (pp.
I 'I H 159 ). This view of the mathematical theory is at odds with my sugges-
111111 Ih at the mathematical models supplied by Hilbert spaces are also re-
IH' ·Hl' nlaliona l. Such models are not simulacra, nor are they to be contrasted
w,lh th e theory; in fact, to present the theory is just to exhibit this class of
'1IIH.ll'ls. In what sense, then, are they more than "abstract mathematical
II I, lid 1I res" ? Wha t, we may ask, do they represent?
Well , 10 ask this question is precisely to seek an interpretation of quantum
Illt'ory . Wh 'n we constru ct models of the Eiffel Tower or of the ruby laser,
82 '1'111' Slrll clllrc of QIIII II I 1/111 'l 'III'tll '/

we start from these objects and proce 'd to th e task of mod el building. In the
case of quantum theory, we have certain notions like "state" and "observ-
able" which find a representation in the model. Antecedent to the theory,
however, these are very insubstantial concepts. We rely on the theory's
models to tell us how they are to be understood. The process of interpreting
quantum theory is thus the reverse of that of building a model of a preexist-
ing object. We judge our models of the Eiffel Tower and the ruby laser by
how well they represent the objects modeled. When we try to interpret
quantum theory we assume that the representation the theory offers is a
good one and ask Feynman's forbidden question: what sort of world could it
represent? In the most abstract, perhaps metaphysical sense, what must the
world be like, if it is representable by the mathematical models that quan-
tum theory employs?
3
Physical Theory and
JJi Ibert Spaces

Il h' previous chapter outlined, in rather summary fashion, the way Hilbert
'11',ln's supply mathematical models for quantum theory. In fact, Hilbert-
IP.lCl' theory was developed for just this purpose. If someone were to ask,
'Why I iii bert spaces?" we might think the question a little peculiar; the
II hvious answer would be, "Because that's the way the world is." But we
, .I II rdin e the question, and ask what it is about the mathematical theory of

I 111I1('rt spaces which makes it clearly suitable for the representation of the
I,ll ysi <I I world. More specifically, given the task of representing the quan-
11 1111 world within a mathematical framework, why might we turn to Hil-
Iw ll s p~ce theory?
T ht' exa mple of classical mechanics shows us that there are possible
1I'III'('sentations of physical theories which do not involve Hilbert spaces. Of
, I It I rSl', this doesn' t mean that classical mechanics could not be reforrnula ted

III Ihis way. In fact, our strategy for providing a partial answer to the

'1I I1'sti on, " Why Hilbert spaces?" will be to show that the theory of vectors
1, .It' ve ry general application. We will take as an example a particular physi-
1.11 situa tion and model it mathematically. The situation will be paradig-
III .. tic<l lI y of the kind with which physical theory deals, but our description
W III be general enough to leave open the question of what sorts of processes,

dl'll'nn inistic or indeterministic, are involved. Similarly its representation,


I I I It' nns of a vector space, will be general enough to be employed for a

V.lli l' ty of physical theories; the particular features of quantum mechanics


11 11 th e one hand, or classical mechanics on the other, will then appear as

oI dd iti onal constraints on these mathematical structures.


T he key to the representation is the fact that Pythagoras' theorem, or its
, 1I1 .. logue, holds in any vector space equipped with an inner product. Con-

'1I1 k r Ih ' space II~P . For any vector v in 1R3,


84 1'111' Slru c/llrI' of QIIIIIIIIIIII 'l'JII'O/I1

Here vx , vy , and V z are the proj " tions o( v on to an orthogonal triple of rays
spanning 1R3 -or, as we can call them, the axes of our coordinate system (see
Figure 3.1).
Pythagoras' theorem tells us that

and so, if v is normalized,

Let us now assume that we wish to represent three mutually exclusive


events that together exhaust all possibilities, and that each event has a
certain probability. For instance, if we were rolling a die, the events might
be: x = die shows even number; y = die shows 1; z = die shows 3 or 5. If we
use the axes of 1R3 to represent the events x, y, and z, we can construct a
normalized vector v to represent any probability assignment to these events.
We simply take vectors v x , v y, and Vz along these axes such that IVx l2=
p(x), IVyl2 = p(y) and IVzl2= p(z), and then add them (vectorially) to yield v.
Since the events x, y, and z are mutually exclusive and jointly exhaustive,
we know that p(x) + p(y) + p(z) = 1 and it follows from (3.1) that v is
normalized.
This almost trivial construction lies at the heart of the use of vector spaces
in physical theory.

,,
,, /
/
x
, I /
-----
z -- ---'./ / " I

Figllre 3.1
t'''YSil'lI/ 'f'''I'(I/Y"lltlttlllll'I'/ S /J/I I'I'H 85

,I I Millill/fll Ass /III1I/lioll s for Phy i al1'lICory


III dl'V,,' loping our g 'nera l representation of a physical theory we start from

lil li' c)sslimp tion, that the world is such that in certain specifiable circum-
1IoIII Cl'S various events can be assigned definite probabilities, I take this
,wsllmption to be minimal if we are to have any physical theory at all: we
,11 111 1(me tha t there are links, albeit only probabilistic ones, between one set of

III 'currences (the initial circumstances) and another (the resulting events),
I( th ' world were fully determined then the assumption would still hold,
,dlhough ultimately all the probabilities involved would take either one or
/I'ro < S values,
To place our theory in a specific context, let us imagine modified versions
I" I he schematic experiments described in Section 2,2, In each of those
" pl'riments, a preparation of a system was followed by a test, and the result
,If thi test was assumed to depend on the mode of preparation. The tests
WI'rl all of the pass/fail kind, and it was tacitly assumed that a given method

IIf preparation would always yield the same test results. We may relax both
Ilin, . conditions. We will consider a test for which there are a number of
IHlllsible outcomes: for present purposes we will assume their number to be
II I l1Iost denumerably infinite, so that they may be labeled Xl' X2 , X3 , and so
I1I1 This allows us to consider any test which involves assigning a rational

IlIlmber to some physical quantity (indeed, it goes beyond the bounds of


I'lI yl1i al plausibility). Further, we will assume that there is a statistical
11 11 r olation between a given mode of preparation and a particular outcome;
11\ other words, that once a system has been prepared, each outcome Xi

IH'l)lIires a certain probability P(Xi) whose value depends on the method of


JII'('paration used.
The question arises whether, by talking in terms of these schematic ex-
111 ·rim nts, we introduce additional assumptions, thereby losing the gener-
,till y of approach we are after. In particular, are such assumptions brought
11110 play by our talk of a system, which is first to be prepared and then
It '/l l('d? As long as we talk only of one particular preparation-measurement
pmcedure, they are not. Effectively, all we are assuming is that there is a
ph ysi a I interaction between one piece of equipment, the preparation appa-
1,1111 5, and another, the measurement apparatus; we express this by saying
Ih,ll a system prepared by one is tested by the other, but this could be
1iI01Igh t of purely as a figure of speech.
Perhaps the situation becomes more problematic when we discuss how
I ' Iwrimentai outcomes using one apparatus are related to outcomes from

1IlH)th or. Certainly the term system then refers (Of, to the scrupulous, appears
10 reft'r) to whatever is in common between two, possibly very different,
p",'pMation-measllrement procedur s. But this just shows that the minimal
86 Tlte ' Imclllre of (JIII/lllllllt 1'11/'1 1/1/

assumption we started with, that in Cl',l,lin circumstances variou s event


can be assigned definite probabiliti '5, does not, on its own, ground the
activity of theorizing. If indeed talk of "systems" betrays a second assump-
tion, that there can be something in common between two different pro-
cesses, this is scarcely problematic. How else does theorizing proceed? No
doubt "something" is vague, and we may be inclined to mistake the nature
of the beast in question - after all, experiment clearly showed phlogiston to
have a negative weight- but that is just to say that our theory may be
wrong. It's not the aim of this chapter to show that only correct theories can
use a vector-space representation.

3.2 The Representation of Outcomes and Events


As we shall see in Section 3.7, a crucial distinguishing feature of quantum
mechanics is the way in which observable quantities are related one to the
other. Nonetheless, for the next five sections I will consider a single observ-
able, measured, moreover, by one specific type of experiment.
We assume that this measurement allows a set {Xi} of outcomes. (The
labels Xl' X 2 , and so on need not refer to numerical values: they are merely
our way of distinguishing one outcome from another and so could abbre-
viate such phrases as "Light D went on," "An explosion occurred," and so
forth .) This list of outcomes is to be exhaustive, and they are to be mutually
exclusive: each repetition of the measurement must yield exactly one out-
come from the set. Our first task is to represent this set mathematically. We
could, for instance, represent each outcome by a sequence of zeroes and
ones, X , by (1,0,0,0, . . . ), X2 by (0,1,0,0, . . . ), X3 by (0,0,1,0, . . . ), and
so on. (Of course, if we have a finite set of outcomes then each sequence can
also be finite.) Or, more obscurely, we could raise the index so that it
becomes an exponent and think of each outcome as an integral power of X,
of X3 as x 3, for example. As it happens, in both cases, ourrepresentation of an
outcome is as a basis vector of a vector space (see Section 1.13); in the first
instance the space is the space of sequences of real numbers, and in the
second it is the space of the polynomials of X with no constant term. This
suggests a more general approach. We represent each outcome, not by a
vector, but by a subspace of a vector space <Y; to emphasize that the out-
comes are mutually exclusive we make these subspaces mutually orthogo-
nal, and to show that they exhaust the alternatives we specify the vector
space to be the span of the set of subspaces. (If L is the span of two subspaces
M and N, expressed L = M EB N, then L is the set of vectors au + bv where
u E M, v E N, and a and b are scalars: the span of a plane in 1R3 and a line
perpendicular to it is the whole of 1R3.)
1'/II/Sil'lIl 'I'It"OIII tlIIIIIIIIIJ/'rt S,J//I"'s 87

Nt·l·d Ihe subs p" n '. Iw O lh' dime nsional? No: by lea ving th e dimension
IlllslW'ifit'd w' allow fo r th ' fa t th a t our tests may be coarse-grained.
" !lolh 'r t'st whi ch w ) rega rd as a refinement of our original procedure
Ill ight persuade us tha t what we had previously regarded as one outcome,
I J' say, should properly be regarded as two: X 2a and X 2 b' In that case we
would come to regard the subspace corresponding to x2 as the span of two
'I tIH'rs. O f course, if we have reason to believe that no further discrimination
1'1 possible, that the outcomes are in a sense atomic, then we are at liberty to

" "Ike the subspaces representing them one-dimensional.


I.d us take our original set of outcomes and enlarge it so that it is closed
Ili ider various operations. We do this by considering subsets of the set {x;} of
II l1 lcomes: each such subset we call an event, e;. The operations of union
(1'1 lJ e2 ) , intersection (e l n e2 ), and complementation (el) can now be
hrought into play.
To each outcome there corresponds an event: to X2 (for instance) corre-
'I ponds the event e2 = {x 2}. If el = {Xl} and e2 = {x 2}, then e l U e2 = {X I , X2};
w,' may say that e l U e2 occurs provided either that Xl occurs or that X 2
1Ill'urs. A parallel in finitary operation, Uj , yields, for instance, the event
I 1,( v,}. which is certain to occur, since we took the original set {Xl ,X2 , . . . } to
lit' l'x haustive. We see that, although to every outcome corresponds a n
,'vi' nt , not every event corresponds to a single outcome.
In like manner, if ea is the event el U e3 and ebis the event e2 U e3, then the
1I 111'rsection ea n eb will be the event e3; and if we have an infinite set of
/'v\' nt {ej }, then nj{ej } will be the set of those outcomes which all the
1I H' llI bers of that infinite set have in common. Notice that if U and n are
"VI'ryw here defined, then our set of events has to contain the null event,
which never happens; this is the event Eo (which, of course, is not an
'" It come).
Th e set of events, together with the operations on it, finds a ready repre-
wnta ti on in our vector space 'Y, The subspace L; corresponding to the
IIl1t come Xi also represents the event {x;}. To the operations U and n corre-
. po nd, respectively, the operations of span (EB) and intersection (n) on the
" 1'1 of subspaces of 'Y, and in finitary versions of these correspond to Uj and
r I, . Not every subspace of 'Y represents an event, just those subspaces which
lil t' (possibl y infinitary) spans of the mutually orthogonal subspaces repre-
rl" llting the outcomes, together with the zero subspace to represent the null
,·vI' nl. As we would hope, the resulting set of subspaces is closed under the
ppl' r,l tions.
1\ 11 this is very nice, but one might wonder what it achieves; after all, the
'1'1 of eve nts already has a stru cture, tha t of a field of sets. What is the point,
1I 1l\' may well a k, of introdu cing il ll th e extra structure built into a vector
88 'I'lle Iru r lLi re of Q IIIIIIIII/ll 'l 'III'lI l y

space-a vector space, moreover, on which an inner product must be


defined, since we talk of " orthogonality." The answer comes with the intro-
duction of probabilities.

3.3 The Representation of States


As a result of any given method of preparation, each outcome Xj acquires a
certain probability p(Xj) of occurrence. If we regard the outcome Xj as the
event {Xj}, then the function p can be extended over the whole set (; of events
in such a way that the Kolmogorov probability axioms hold; in other words
so that

(3.2a) p(Eo) = 0
(3.2b) p(E 1 ) = 1, where E1 = Uj{Xj}
(3.2c) for events ej and ej, p(ej U ej) = v(ej) + p(ej)' provided ej n ej = Eo .
Two methods of preparation are identified if and only if to each outcome
one gives the same probability as the other. Then, by definition, each dis-
tinct method of preparation results in a different assignment of probabili-
ties, that is, a different function p.
Now let us turn to the vector space 'Y. Let Lj be the subspace correspond-
ing to the event ej , and Pj the projection operator onto Lj. Since subspaces
and projection operators are in one-to-one correspondence we may regard
Pj as representing ej . Note that the zero operator, Po, corresponds to Eo and
the identity operator, I, to E1 • We define the length of a vector v E V, as
usual, in terms of the inner product with which we have equipped 'Y and
denote it by Ivl . The projection Pjv of the vector v onto the subspace Lj will be
of length IPjvl .
Now let v be a normalized vector of 'Y. We have Pov = 0 and Iv = v,
whence

(3.3a) lPovl 2 = 0

and

(3.3b) IIvl 2 = Ivl2 = 1

Further, if Pj and Pj project onto orthogonal subspaces Lj and Lj , then Pjv is


orthogonal to Pjv; writing Lj EI1 Lj = Lk , we obtain

by Pythagoras' theorem.
1 IIIy.~ it'1I1 'f'III'OIY IIlItf Ililll/ 'rl S,lfl C('S 89

Wi thin th 'limited l-II't of sub8pa es we are onsidering, two subspaces a re


orth ogonal if a nd only if they have just the zero subspace in common. Thus
111l' co ndition on (3.3c) is that L j n Lj = 0 (the zero subspace), and the match
h\'lween (3 .3a-c) and the probability axioms (3.2a-c) is evident.
We ca n make the match more explicit as follows. Let any (normalized)
Vl'ctor v E V define a function.uv on the set of subspaces of 'V with values in
tlw interval [0,1], such that

For the present we restrict this function to the set of subspaces in corre-
"pondence with the set of events. We then obtain

r I -/11) .uv(O) = 0
r I -//1) .uv(V) = 1
r I -/ (') for subspaces Lj and Lj , .uv(Lj EB Lj ) = .uv(Lj) + .uv(Lj) provided
L; n Lj = O.

II .Ippears that our representation of the set of outcomes within the vector
'lp,lce'V has enabled us to represent not only each possible event in 8, but
.,) so the-probability measures on that set. The probabilities of the various
"Vl'nts are physically determined by the method of preparation, or, as we
IIltl y say, by the state of the system being tested. Thus, already, from our
)'," moralized "theory" we can begin to see the rationale behind the use of
vI'('lors to represent quantum-mechanical states; further, if we look back at
I'. q II a tion (2.1) we find that, both here and in quantum mechanics, probabili-
Iit'S .He computed in the same way from the state vector: for any vector v and
proj 'ction operator P; we have

(I ',) [see (1.27)]

Equation (3.5) shows that to every normalized vector v E V there corre-


'lJlIlnds a probability measure on 8, namely the probability function p such
Ih.ll , for any outcome X;,

(w hl'r the subspace L; and the projection operator Pj represent the outcome
IIlLJlIl'stion). We can also show the converse, that any probability function p
Oil I he parti ula r set of events we are dea ling with can be represented by a

VI'l'lor, Let each ou t orne Xi b rcpresenll'u by the subspace L;. Now we


90 Till' S /III (' /III'" IIf (J1I1I1I/11111 /'11,'11 /1/

choose within each subspacl' 1., d IlUlIll,lli zl' l vector Vi , a nd onstructthe


vector v as a weighted vector sum of all the vecLors V i: we write

v= L CiVi
and specify that, for each i,

Notice that all the vectors Vi are mutually orthogonal; whence, by Pytha-
goras' theorem, we know that

(since IVil = 1 for each i), and also that

Thus

or in other words, V is normalized.


Before showing how v could be constructed, I emphasized that the proba-
bility function it represented was a function defined on a specific set of
'vents, namely the set {; of events associated with the particular experiment
we are concerned with. This emphasis is necessary because quantum theory
assigns probabilities to events associated with whole families of observ-
ables, and these may be measured by a variety of experimental arrange-
ments. It turns out that, although all the events in this enlarged set {; * can be
represented by subspaces of the same vector space, only one of the results
shown above is generalizable to {;*. It remains true that any vector
on the space can represent a probability function on {;*; however, not all
probability functions on {;* are representable by normalized vectors in the
space. I discuss the representation of the others in Section 3.5.
Let us return to the limited class of events associated with a single type of
measurement procedure and to the construction which yielded a vector v
for each probability measure on {;. It is clear from this construction that p
does not have a unique representation in 'V, or, to put it another way, each
vector in 'V does not define a distinct state. During the construction we
""'1IlICIII '/,I1I'(lII/lIlIIllIllIi/'" SI"I/'/'S 91

dillSt' fmlll t'.lCh Illlhr P,II 'I' I., ,III ..rbitrary normnlizl..'d v' tor v,: a different
dlOI('l' would havl' rt'llldtl'd in a different vector v representing the same
prolMbility measure. There are two reasons why a choice of vectors is
,Iv,Ii lable to us. The first is that we have not claimed that our test outcomes
, 11'(' ,1lomic: we have given each subspace L; arbitrary dimensionality. The
ril'cond is that, even within a one-dimensional subspace, there is more than
11 11(' normalized vector. If <y is a vector space over the reals, and v; is a

IIl1lmalized vector within a one-dimensional subspace L;, then so is -Vi; if


'v is n complex vector space (and nothing we have said so far rules it out),
Ill( 'n each one-dimensional subspace L; contains an infinite number of nor-
III,Ilized vectors: if v; E L; and Iv;1 = I, then eVi will also be a normalized
vl'clor within L; provided that lei = I, that is, provided e is expressible in
ill(' form cosO + isinO (see Section 1.5). In passing, we may recall from
"I'l'lion 2.3 that, in quantum mechanics too, the second of these considera-
lit H1 S a pplies, and that a pure state is properly thought of as represented by a
IIIH' dimensional subspace of a Hilbert space .

.1.'1 Determinism, Indeterminism, and the Principle


of Superposition
,i\ II hough the previous sections have dealt with the representation of a
'1I llgle~xperiment, in another respect they have been entirely general: no
IllI1straints have been laid on the kind of processes to be modeled. The
I"odds supplied by vector-space theory are, thus far, suitable for represent-
Ing all sorts of possible physical processes, deterministic and probabilistic
,dlkl'. In this section and the next I will show how the differences between
'I lich processes are modeled in the theory; they will appear as differences
Iwlween the sets of possible states which the theory permits.
In ection 3.3 I showed that every normalized vector v E V defines a
probnbility measure on the set of events associated with a particular experi-
III 'nt, and also that there is a vector corresponding to each probability
1lIl',lsure on that set. Further, I have up to now equated these probability
IIH'dsures with the possible states of the system tested. Among all the var-
lOll S theories which we can formulate using vector spaces, however, there

,In' some in which only certain vectors are eligible to represent states, and, as
/loled in the last section, there are others in which some states are not
11'))1' 'sentable by a vector at all. I tum now to physical theories of the first

kind, nnd defer discussion of the second to Section 3.5.


Consider, for instance, a theory modeling testing procedures which were
III11 y deterministic: in our description of the experiments there is nothing to
'10\ y I hn I we 11/ ust be dealing with indeterministic processes. In the determin-
92 Ti,e Siru cillre of Q 111111 1II III '/'//1'(1/1/

istic case, if we could specify our pr 'paralion procedures with sufficient


precision, then each mode of preparation would yield one outcome with
certainty. Let us call the states corresponding to these modes of preparation
the pure states of the theory. The only probability measures involved would
then be those which yielded p(Xj) = 1 for some outcome Xj and p(Xj) = 0 for
each other outcome Xj; thus the only vectors which would represent pure
states in this theory would be normalized vectors lying within the subspaces
representing the individual outcomes: only for a (normalized) vector Vj lying
within Lj do we have

IPjvjl = 1 but
IPjvjl = 0 whenever j'* i

Bearing this in mind, let us look at one of the ways in which the difference
between classical mechanics and quantum theory has been characterized.
In chapter 1 of his Principles of Quantum Mechanics, Dirac (1930, pp. 10-18)
locates the major difference between the two theories in the role played by
the principle of superposition in quantum mechanics. Put in general terms,
the principle states that,

If there are pure states of a system which yield probability measures


PI and P2 on a set of outcomes, then, if a and b are a pair of real
numbers such that 0 ~ a ~ 1 and 0 ~ b ~ 1, and a + b = 1, and P3 is
the probability measure P3 = apI + bp2' then there is a pure state of
the system which yields the probability measure P3'

In terms of the vectors representing the pure states, it reads,

If v I and V2represent possible pure states of a system, then any vector


V3 = CIV I + C2V2 such that IV31 = 1 also represents a possible pure
state of that system.

We can now see the significance of this principle. It is clear that no deter-
ministic theory can include it, for on such a theory the only vectors allowed
to represent pure states lie within the subspaces LI , L2 , . • . which repre-
sent the outcomes Xl' X2 , • • • • Though these vectors span the whole space,
that is, any vector v can be written as a sum LjCjVj of such vectors, we are not
free to regard every vector constructed in this way as representing a physi-
cally possible pure state. In the two-dimensional case, where there are two
possible outcomes Xl and X2 represented by the on e-dimen sional subspaces
LI and L2 , respectively (see Figure 3.2), th en, on a de lc rmini li lheory, the
Pllysical Tlle/lfy IIl1tlllil/lI'rl pacl's 93

Figure 3.2 The principle of superposition: V3 = (VI + v 2)/.fi.

(norma lized) vectors VI E Ll and V 2 E L2 may represent pure states, but the
v\'cLor V3 = (1/.fi)v 1 + (1/.fi)v2 may not.
Within quantum mechanics, on the other hand, the principle holds; thus
,II I vectprs in the space cy can represent possible physical states, and they
III ,1 y all be written in the form ~iCiVj, that is, as linear sums of vectors within

Illl' subspaces which represent outcomes.

,I. 'j Mixed States


l .t'I us now look at a theory in which certain states are not represented by
vi'rlors. Consider, for instance, the theory which represents an experimen-
loll appa ratus of the kind used to demonstrate a binomial distribution. A
Il'l'l ball is "prepared" by being dropped down a vertical funnel slightly
,',f\'aLer in diameter than the ball itself. As it emerges it is "tested" by
i Ir'()pping into an array of horizontal pins, as shown in Figure 3.3. The pins

I lrt' :lrranged in horizontal lines, and in any line the distance between adja-

( '('111 pins is again slightly greater than the diameter of the ball. The pins of

",Ich line are staggered with respect to those in the lines immediately above
,"1d below, so that, when a ball passes through a gap in one line, it will strike
" pin in the line below. Beneath the array there is a series of boxes, each box
dirl' L1 y below a gap in the lowest line of pins. Each box corresponds to a
diff,' r nt outcome of the test, and each outcome has a certain probability of
1)( 'l'lirr nc .

If Ih e appara tus were sy mmclri 01 we would expect it to be equally likely


(4 'I'l,,' SIIIIt'lIlIl' II/ (,)/111/1"1111 '/'''''/111/

~( 0 0 0

0 0 0 • 0

0 0 0 0 0

0 0 0 0 0 0

0 0 0.0 0 0 0

IJ I I..J I I
Figure 3.3

that a ball would bounce to the left as to the right after striking a pin. In that
case we could assign an expected probability to each outcome as follows.
Given n rows of pins there will be n + 1 different outcomes, which we can
label from the left to right, xo, XI' • • • , X n , and the binomial theorem
would lead us to expect that:

1 n!
JI(Xk) = ~ k!(1/ - k)!

Ilowev r, we need not confine ourselves to the symmetrical case: we may


just assume that some definite probability distribution or other results from
the way the apparatus is set up.
Now we may want to accommodate this experiment within a determinist
account; we may believe that if we had a truly precise description of the
trajectory of the ball as it left the funnel, then we could predict with certainty
its path through the array and the outcome that would result. But the
probabilities assigned to any outcome by the pure states of a determinist
system can only be zero or one, while here we have probabilities lying
between these extremes. If we are to deal with these probabilities without
abandoning our determinist views, we will have to call on a new notion of
state. What we do, in fact, is to describe the ball as it leaves the funnel as
being a mixed state and to regard each mixed state as a weighted sum of pure
states. Let us see first how these mixed states can be represented, and then
how they may be interpreted.
In .1 dt'll'rmini , I /JY:1 1I'1I1 t'. lc h pun.' state yidds probability on ' to some
out come or other. CiVl'11 /I outcomes there are effectivel y tI distin ct pure
SI.1 t<.'S, ea h corresponding to a probability function Pi such that

Pi (Xi) = 1 for some Xi

,lnd Pi(Xj) = 0 for i =1= j. Now, because of the way the apparatus is set up,
l'a h of these pure states may have a particular probability of occurring. Let
th e probability of occurrence of the pure state corresponding to Pi be bi .Then
Ihe probability function P on the set of outcomes can be expressed as a
weighted sum of the functions Pi: for each outcome Xj' we have:

.lnd we can write,

We see that, provided there are at least two coefficients bj and bk greater than
zero, P will be the probability function corresponding to a mixed state rather
Ihan a..pure state.
Within our vector-space representation, each probability function Pi is
r 'presented by the function f.-li on the set of subspaces such that

Thus P is represented by the function Libif.-li '


Now recall that each function f.-li is such that

where Vi is some pure state corresponding to Pi' Note, incidentally: for the
reasons given in Section 3.3, more than one vector can represent a given
pu re state. We may, however, pick a representative Vi E Li and proceed as
though there was no such degeneracy. Given, then, a mixed state repre-
sented by the weighted sum Li bif.-li, one may ask why we may notrepresent
it by a vector which is a suitable weighted sum of the vectors Vi' It is certainly
not mathematically impossible to do so. In Section 3.3 we saw that, as long
a we are confining ourselves to events associated with one particular ex-
periment, we can represent any probability measure on the set of these
96 Tilt! Stru cture II! (J 111111 I 11111 '1'1,/'/111/

events by a vector. In fact, if w • writ'

then, for any outcome Xi'

By doing so, however, we violate the principle that we use vectors to repre-
sent only pure states; in a determinist theory the only vectors that do this are
the vectors Vi' To put it another way, by doing so we use the principle of
superposition.
If we are not to lose an important distinction, we need to find a way of
representing the weighted sum of two or more probability functions which
is distinct from merely adding the (suitably weighted) vectors which repre-
sent them. We do so by finding an alternative representation of pure states.
We have already noted that a ray in 'V serves to represent a pure state; we
use, not the ray, but the projection operator onto it. Mixed states are then
represented by weighted sums of projection operators in a very direct way:
if each Pi is represented by the projection operator Pi' then LibiPi is repre-
sented by LibiPi' Since LibiPi is not a projector unless there is exactly one
coefficient bi which is nonzero (and hence equal to one), the distinction
b tween pure states and mixed states is made clear.
This treatment of states is developed in Chapter 5. There I will treat the
prob lem of finding an algorithm to relate probabilities to these weighted
sum of projectors in the way that the equation

relates probabilities to the vectors which represent pure states.


Within the situation I have described, mixed states find a ready interpre-
tation: they represent our ignorance of the precise state of affairs as the ball
leaves the funnel. For example, if the mixed state were given by

and so was representable bytpj + iPk- this would be taken to mean that the
ball was actually in one of the states I1j or 11k; the cumulative effect of factors
individually too small to allow for means that we do not know which state it
is in, but we do know that the ball is thr tim's li S lik 'Iy to be in III as Il, .
I'hysit'll l '/'II/'II/ Y 11"" 1//1111'1'1 S,"/t't'S 97

( · I ~,.tr l y, any IOtlSic.111lwory dealing with sy tem about which our infor-
111.11 ion is I 's than omp l 'le an use the notion of a mixed state interpreted
hi thi s way. Perhaps more surprisingly, mixed states appear in quantum
Ih('ory as well, but there the " ignorance interpretation," as we may call it,
1',lvl's rise to a number of problems. I discuss these in Chapter 5.
()ne question we can pose at this stage is this: what, in quantum me-
l'lloll1ics, distinguishes the mixed state represented by }:,jbjPj from the state
It'prl'sented by the vector }:,jCjVj , where ICjl2 = bj ? Each yields the same
pl'\lbobility to any given outcome Xj of our experiment (which we may take
,1'1 an 'asuring some quantum-mechanical magnitude). To put the question

Illun' generally: what is the empirical content of the principle of superposi-


II( Ill ?
A fu ll answer to this is given in Section 3.9, but this much can be said in
1lIllicipation. If we are to distinguish operationally between the mixed state
';", rl'presen ted by }:, jbjP j and the pure state Sp represented by the vector
:/I', V" then we need a new experiment for which the probability functions
1', IV\'n by Sm and Sp will differ. In other words, in order to give content to the
1"llIciple of superposition we need to consider more than the single experi-
111('111 which has occupied us so far.

',() Qbservables and Operators


I hnp' the preceding sections have shown how well-suited vectors are to
1l 'llr\'s nt the pure states of a quantum-mechanical system. In the same vein
w,· (\In indicate why it is natural to use an operator to represent a physical
111I1);nilude, or observable. The easiest approach is to consider these observ-
II"'I'S from an opera tiona list standpoint. (A devout operationalist views the
"" '')Iling of a physical quantity as being wholly determined by the experi-
IIII'll l,1 I procedure used to measure it; we can adopt an operationalist ap-

I" u,ll' h in this instance, however, without thereby committing ourselves to


Ill\' whole doctrine.)
'1', I spl'ak guardedly then, at least some physical quantities are the sorts of
IllIlIgS which may be measured by the kinds of tests we have described. In
II 11 ' 11 cases the djfferent outcomes of the tests correspond to different values
III til(' quantity in question. Let us take as a simple example an observable A
wh,lst' va lu e can only be one of the numbers aI' a2 , • •• ,an' (Thus, in the
I.llIglI.lge of Chapter 2, A has a finite discrete spectrum.) We assume that our
I, .. ! dfe lively measures A; this means that the test has as fine a mesh asA
lI 'qllin's, su th at with each outcome Xj of the test we can associate some
il lll;II' v,) lu e fl, of the quantity A .
( 'ollsidl'r Ih ' vc tor ' pa CV in whi h we have represented the set of
1I1I1 1'001H'S of Ih e lest, 'a h O liI COI1H' ,heing fl' prcsented by a subspace Lj .
98 'J'/I/' St,//('//II'1' 0/ (..)/111111,,,,, .,.",'/1/1/

Since each outcome corresponu :-. III 01 p.IIIICu l.lr v,llul' of the obscrvabk A,
we could regard it as atomic(in lhcSl' nseoes('rib 'd in Se lion 3.2), and make
each subspace Lj one-dimensional. We need not do this, however, a nd in th e
remainder of this section 1 shall not assume we have done so. As before, we
denote by P j the projection operator onto Lj • We now construct the operator
LjajPj on <V, and claim that this operator represents the observable A: in fact,
we show this by using the same letter for the operator as for the observable
and writing

It remains to show just what this claim involves and how it is justified.
As a preliminary, let us distinguish what is happening here from what
was going on in the previous section when we constructed the mixed state
Lj bjPj . There each Pj represented a pure state, and (on the ignorance inter-
pretation) each bj represented the probability of its occurrence. Here every P j
represents an outcome of an experiment, and each a; the value of the observ-
able to which the outcome corresponds. Now let us consider the claim itself.
First, we may observe that any operator on <v of the form LjajPj (where all
the numbers aj are real) is Hermitian. Conversely, the spectral decomposi-
tion theorem (1.32) tells us (i) that any Hermitian operator on a finitely
dimensional vector space is expressible in this way, as a weighted sum of
proj ectors onto mutually orthogonal subspaces, and (ii) that, if all the aj are
distinct, then this decomposition is unique. (One further condition, the
ompactness of th e operator, is required if the space is infinitely dimen-
sional: sec f.ano, 1971, pp. 81, 291.) This means that we cannotconstructthe
same opera lor in two distinct ways: if

A = ~
£.III
a·P = ~
£.III
bP'
j

(where all the aj are distinct from one another, as are all the bj ), then
{aI' . . . , an} = {b l , . . . ,bn}, {Pj} = {Pj}, and, for any i and j, if Pj = Pj
then aj = bj • Thus, locked up, as it were, in the operator A is all the informa-
tion we have about the observable A: that the observable can take the values
aI' a2 , and so on; that we take an outcome Xj of the test to mean that the value
of this observable for the system is aj; and that we represent this outcome Xj
within our vector space by the subspace Lj (projection operator P;).
It is worth noting that the values aj are the eigenvalues of the operator A
we have constructed, and that each corresponding eigenvector Vj lies within
the subspace Lj • As in quantum theory, eigenvalues of an operator are the
permissible values of the corresponding observable.
/'/'.'1,.. 1/,11/ '/ '/11'11/1/ IlIltil/lf/'I'''' S/IIII 'I',.. 99

But th is J1)(l thl'm ,ltl \\11 obj \'d , th c operator A, is no t just a m 'mory ba nk
within w hich we store inform ation a bout the observa ble in question. That
alone might be enough to justify the claim that A represents the observable,
but more can be said. For we may use this operator, together with the vector
representing the (pure) state of the system, to calculate probabilities and
expectation values. The algorithms are exactly as they are in quantum
theory: from Equation (2 .1) we know that, in quantum mechanics, the
probability that a measurement of observable A will yield value aj is given by

where P~ is the projection operator onto the subspace L~ containing the


eigenvector Vj with corresponding eigenvalue aj • In our experiment, this
possible value aj of the observable is associated with outcome Xj, and that in
turn is represented by the subspace L j , or, equivalently, by the projection
operator Pi' This operator Pi' like the operator P~ in quantum theory, is a
projection operator from the spectral decomposition of the operator A,
w hich represents our observable. Both projection operators enter in the
same way into the calculation of probabilities, for in Section 3.3 we defined
the state vector v as the vector which yielded probabilities to the experimen-
tal outcomes according to Equation (3.5)

-p(Xj) = IP vl 2
j

From what has been said it is obvious that we should identify p(Xj) in this
equation with pv(A,ai) from the earlier one.
Given identical procedures for assigning probabilities to the various pos-
sible values of a given observable, we could hardly compute expectation
values, denoted (A), differently in our general representation and in quan-
tum theory: in each case they are calculated by weighting the various possi-
ble values by the probability of their occurrence. As in Section 2.4, we obtain

(A) = L IP jvl aj
2

= (vIAv)

3.7 Relations between Observables: Functional Dependence


and Compatibility
So far we have looked at experiments involving a single type of measure-
ment; though different modes of preparation have been considered, we
100 '/'/I t' Slm cll/ rc of )/11111111111 '1'111'11 /1/

have not investiga ted how th e n.'sult s uf on' kind of test might be related to
those of another. We saw in the last se tion th at eac h test can be thought of
as a measurement of a physical quantity, or observable; in this section we
will look at some of the ways in which two observables can be related .
As before, we associate an observable with a measurement procedure; the
various outcomes from the measurement correspond to values of the ob-
servable in question. Again, for simplicity, I will not consider observables
with a continuous spectrum; for an observable of that kind, an outcome
corresponds to a range of values (a Borel set of the reals), rather than to one
value in particular. Most of what we could say about such observables can
be inferred from the discussion of observables with a point (or discrete)
spectrum.
Let us consider, then, two observables A and B: the values of A are
associated with the various outcomes Xl' X 2 , . . . of a suitable experiment,
and values of B with outcomes YI' Y2' ... of another. We now ask, what
relationships can exist between the probabilities p(Xj) assigned to the out-
comes of anA-experiment by a given state and the p:obabilities p(Yj) which
that state assigns to the outcomes of the B-experiment? More formally, let
'JI A be the vector space within which we represent the (outcomes associated
with) observable A, and 'JIB the vector space within which we represent
observable B. Then within 'JI Athere is a set {VA} of normalized vectors which
represent admissible probability measures on the outcomes of measure-
ments of A: we may call these the admissible pure A-states. Similarly,let {VB}
b the set of admissible pure B-states. Then an ordered pair (VA,VB) will
r 'present a probability measure which simultaneously assigns probabilities
to A-outcomes and to B-outcomes. Any relationship that obtains between
ob erva bles A and B will effect a constraint on the set of ordered pairs which
we regard as admissible pure AB-states.
Consider first the relation (or nonrelation) of independence. In this case
there are no constraints on the set: if A and B are independent, then the
ascription of a set of probabilities to A-outcomes gives us no information
about the B-outcomes. We may say that A and B are independent if and only
if each ordered pair (vA,VB) represents an admissible AB-state. Within classi-
cal mechanics each component of linear momentum and of position is
independent of all the others, and within quantum theory each component
of linear momentum is independent of each component of spin. The condi-
tion for the independence of A and B requires us to treat 'JI A and 'JIB as two
distinct vector spaces. We may, if we wish, think of the state of a system as a
vector in the direct sum of these, 'JI A EB 'JIB, and use the ordered pair (VA' v B)
to 'represent this vector. If we do so, VA and VB will be the components of
(VA,VB) in the subspaces 'JI A and 'JIB of 7fA EB 71" . This, in fa ct, is how the
I'''YH;(,II / '1'I1I'ory 1I1II/1Ii/llt'rl Spact'/! /OJ

in<.Il'pt'ntk'nl obs~'rv.lblt' /l linear momentum and spin are dealt with in


llll <lI1lum theory . fours " botlt VA and VB need to be normalized, and so,
while we may allow superpositions on 7i A , and on 71 8 , we do not allow
su perpositions on 71 A EEl 71 B' Such a superposition could yield a component
in 'l18, say, with norm different from one, and this component would not act
ns a probability measure on the possible values of B.
Of more interest are the cases in which we can represent A-outcomes and
B-ou tcomes in such a way that both sets of subspaces span the same space,
so tha t 71 A = 71 8 , Although we can represent independent observables A
and B by an operatorona common vector space 7i A EB 71 8 , neither the set of
su bspaces representing the A-outcomes nor the set of those representing the
B-outcomes span the whole of this space. Clearly, however, if 7i A = 71 8 ,
'onsistency demands that (vA,V8) represents an admissible AB-state only if
VA = V8, and so the constraints on (vA,V8) take a particularly simple form. All
the relations we will consider from now on will be of this kind.
To deal first with the most trivial case, it could be that A and B both
measure exactly the same physical quantity; although they may use differ-
ent experimental arrangements, a one-to-one correspondence exists be-
tween the outcomes {Xi} of the A-experiment and the outcomes (Yj} of the
H-experiment such that, for any state and any corresponding pair of out-
omes (Xi' Yj)' we have P(Xi) = p(Yj)' In this case it is somewhat less than
remar-kable that we can find a representation in which 71 A = 71 8 , Here,
where the relation is that of identity, we use the same subspace to represent
each of a corresponding pair of outcomes (Xi,Yj)' In general, when observ-
a bles A and B are related to each other, these relations will appear within the
Ililbert-space representation as relations between the subspaces represent-
ing the A- and the B-outcomes.
Let us take a slightly more complex relationship, that of functional de-
pendence.

(.1. 6) A is functionally dependent on B provided that each value (outcome)


ai of A corresponds to a set of values (outcomes){b i " . . . , bij , . . . }
of B in the following sense: the probability p(ai) assigned to ai by any
state is the sum p(b i ,) + ... + p(b i) + ... of the probabilities the
state assigns to bi " • • • , bij , • • •

It follows that each state which assigns probability 1 to any of the outcomes
iJ l " • • • , bij , . . . of B also assigns probability 1 to the outcome ai of A.
lienee the sets (b i) corresponding to different a/s are mutually exclusive.
In terms of the vector-space representation, we represent an outcome ai
of A by th e span Lf of the subspaces q corresponding to different outcomes
102 TIl l' Slm clllrc of (Jllf/III II 1/1 '1'/// '/ 1/ II

bit' . . . , bij , . . . of B. Thus, by 'onstruclion, 'II . . = 'If Ii' It also follows


that the projector p~ = Ll~.
If bk is anyone of the possible values of B which correspond to the value ai'
then we write f(b k) = ai' and so define a function f that maps the set of
possible values of B onto the set of possible values of A. It follows immedi-
ately that, if we represent B in our vector space by

we obtain

(3. 7) A= L aiP~ = L ai L P~ = L f(bk)Pr df f(B)


k

As the notation suggests, the last equality is a definition of f(B).


Classical mechanics offers many examples of functional dependencies,
albeit between observables with continuous spectra. Consider, for instance,
the observables momentum p and kinetic energy T for a single particle
moving in one dimension. Since T = p2/2m, to each value of T there corre-
spond two values of p, one positive and one negative, and so T is function-
ally dependent on p. In fact, as the formula shows, it is a continuous func-
tion of p. Similar dependencies exist within quantum theory.
In passing, note that in the three-dimensional (classical) case, T =
(p; + p; + p;)/2m, and so T is functionally dependent on three indepen-
dent observables, Px, Py' and pz. I won't discuss these more complicated
functional dependencies here, though they could be accommodated within
the framework we are using.
Instead, let us turn to the case when two observables, A and B, are both
functionally dependent on an observable C. In this case A and B are compati-
ble. We obtain a representation in which 'liA = 'lie = 'liB by starting with
the mutually orthogonal subs paces Lf corresponding to C-outcomes. Then,
since A is functionally dependent on C, each A -outcome can be represented
by the span of some of the spaces Lf. So can each B-outcome, and so within
'lie we obtain subspaces corresponding to A- and B-outcomes. The A-sub-
spaces, as we may call them, and the B-subspaces are mutually orthogonal
where they do not overlap.
It is convenient to extend the use of the word compatible to this relation
between subspaces. As an illustration of what the condition involves, we
can ask which subspaces within 1R3 are compatible with the plane L shown
in Figure 3.4. The zero subspace is orthogonal to every subspace, and so is
compatible with L. Of the lines through 0, all thos within I. arc ompatiblc
1'/II/sil'1I1 '/'111'111.11 1I/lt/llllIlI'fl S,IIII'I'S J(),l

r- ---
I I
I I
I .-l--
I /--- I
_1---/
<---
--- /

/
/
...:'--

"i.~ " rl! 3.4 Subspaces compatible with L are (a) the zero subspace {O}; (b) any line in L,
,lilt! th e line U perpendicular to L; (c) the plane L, and any plane obtained by rotating La
,.hout the line U (for example, La' Lb , L,); (d) the whole space 1R3.

with L, as is the line through 0 at right angles to L. Of the planes in 1R3only


Ilwsc are compatible with L: L itself and the planes (shown by dotted lines in
tlw diagram) at right angles to it. Finally, the whole space 1R3 is compatible
wi th L:.. Formally,

( I Ii) In any vector space LV, subspaces L. and Lb are said to be compatible if
there exist mutually orthogonal subspaces L.o' Lboand Lc in LV (any or
all of which may be the zero subspace) such that

In th e theory of vector spaces it is not hard to show that two projection


olwrators P nand Pm commute (that is, P nP m = P mP n) if and only if they
pro ject onto compatible subspaces. This theorem gives us an alternative,
,lIld highly convenient, definition of the relation of compatibility between
,' , II bspaces.

To summarize, if two observables A and B are compatible, then (1) their


rt'p resentations can span a common vector space, and (2) in this representa-
tio n any pair of subspaces corresponding to their outcomes are compatible.
( 'onsider now the operators A and B on this space, corresponding to these
(l bsl'rva bles. In our usual notation,

" bI p!lI
B = 'L.J
104 The Siru cillre of QIIIIIII II III 'f'lII 'lIry

From (2), all the projection operators Pf and P? commute with each other;
this in tum guarantees that the operators A and B commute. Thus we obtain
the elegant result that compatible observables may be represented by com-
muting operators.

3.B Incompatible Observables


This chapter began with the question: what is it about the mathematical
theory of Hilbert spaces that makes it suitable for providing models for a
physical theory? The introduction of incompatible observables prompts a
different question: what is it about quantum mechanics that makes its
representation in Hilbert spaces so natural? For in quantum mechanics we
find observables which are not compatible but yet have a minimal represen-
tation on the same Hilbert space. By a "minimal representation" I mean a
representation on which each value of the observable is represented by a
one-dimensional subspace of the space. (Thus, for the moment, I am con-
tinuing to talk only of observables with a discrete spectrum.)
To see what's involved, consider a pair of observables, each of which has
two possible values: observable A has values a l and a2 , and observable B has
values bl and b2 • We look first at a single mode of preparation (or state),
which assigns probabilities p(a l ), p(a 2 }, p(b l }, and p(b 2 } to the values of these
observables, such that p(a 1 } + p(a 2 } = 1 = p(b 1 } + p(b 2 ).
We can represent the A-state as a vector VA in a two-dimensional space
whose axes correspond to (A,a 1 ) and (A,a 2 ); likewise the B-state can be
represented by VB in another two-dimensional space. (See Figures 3.5 and
3.6.) Now, as long as we deal with just one state, we can always superim-
po c the two diagrams, so that the same vector yields both pairs of probabili-
ties. As Figure 3.7 shows, it is just a matter of picking up the B-diagram and
rotating it until the vector VB in it coincides with V A in the A-diagram.
In general, however, we wouldn't expect that this particular superim-
posed picture would be useful in representing a different preparation pro-
cedure. A new state would assign new probabilities to the A-outcomes and
to the B-outcomes, representable by two new vectors, v~ and v~; we would
not expect that exactly the same rotation of the B-diagram as before would
suffice to make v~ and v~ coincide (see Figure 3.8). But the remarkable
feature of quantum mechanics (or of the systems quantum mechanics de-
scribes) is precisely this: that certain observables are related in a way that
makes the superimposed picture work for all states.
Consider the components of spin of a fermion, Sx, Sy, and Sz. Each of these
components has two possible values, +t and -1-. (These values are in
"natural units," such that h = 1.) Accordingly, we can represent the out-
comes x+ and x- of an Sx-experiment within a two-dimensionaillilbert space
1'/IYll i(,lI/ 'l'It/'/lIy I//I(///t//II'I'/ SpII/ '/'S /05

82
bp b2
VA

VB b1

81 b1 81

Figure 3.5 Figure 3.6 Figure 3.7

82
VA'
b2

b1

81 b1 81

Figure 3.8

'If x' thdSe of an Sy-experiment within a two-dimensional Hilbert space 'Jf y'
,)nd those of an Sz-experiment within a two-dimensional Hilbert space
'If z' Thus, vis-a-vis this trio of observables, any state can be represented by a
lriple (vx,vy,vz) of vectors, where Vx E 'Jf x , Vy E 'Jf Y' and Vz E 'Jfz. But it
lurns out that these vectors are not independent; we can use the same
l wo-dimensional Hilbert space to represent all three observables, so that,
for any pure state, Vx = Vy = VZf and this vector will assign probabilities to all
three pairs of outcomes. To do so we first need to make 'Jf x , 'Jf Y ' and 'Jfz
complex-that is, to use the space 1[2 for all three of them-and then to
rota te 'Jf x and 'Jf y' as it were, to fit them on top of 'Jf z.
To speak geometrically-that is, analogically, since 1[2 is complex rather
lhan real-within 1[2 the rays we use to represent, say, x+ and z+ can be
obliquely inclined to each other in a way that captures the relation between
/I(X+) and p(z+) for all states of the system. For any pair of spin observables,
some, though not all, states are representable in 1R2; within the partial
r 'presentation of Sx and Sz which 1R2 affords, the x+ ray must be at 45 to the 0

Zl ray, as shown in Figure 3.9. Any normalized vector in 1R2 represents a


possible assignment of probabilities both to x+ and x- and also to z+ and Z-.
Both in 1[2 and in the partial representation in II~F, the rays corresponding
lo Xl and to z+ are oblique one to the other, and hence are, in the technical
se nse, incompatible. The obscrvnbl('s S, a nd Sz are likewise incompatible:
'06 nit' Slm (' /II/'1' of (J 1/(/1I111111 ,,'It ,'O /I1

z
x- x~

z+

Figure 3.9 Partial representation of Sz and Sx,

the operators representing them do not commute (see problem 1 in Section


1.7).
The term incompatible here is a bit misleading. When, for example, we say
of two spin components that they are incompatible, we do not merely mean
to deny that they are compatible; we also mean to say that a very strong
relationship holds between them, of being representable in the same Hilbert
space. We are particularly interested in incompatible observables which are,
to speak loosely, of the same sort. SXI Sy , and Sz are all "of the same sort,"
whereas Sx and S ~ are not, despite the fact that they are representable on the
same Hilbert space.* We can begin to make this intuitive notion more precise
as follows. Consider first a Hilbert space on which Szis represented. A pair of
orthogonal rays in this space represents outcomes z+ and z- associated with
positi ve and negative values of Sz. Now observe what happens when we
rotate these axes (or perform the complex-space analogue): the axes will
come to represent the same values, but of a different component of spin. In
th e partial representation supplied by II,F, when we have rotated them
0
through 45 they will represent the outcomes x+ and x- of Sx '
It turns out that a very simple relation between the operators Sx and Sz
corresponds to the fact that we can "rotate" axes to transform a representa-
tion of Sz-outcomes into a representation of Sx-outcomes, and we can use
this relation to specify what exactly being" of the same sort" involves. Recall
from Section 2.7 that a unitary operator U is the complex-space analogue of
a rotation operator.

(3.9) We say that two observables are mutually transformable if (a) they are
representable in a Hilbert space 7f by operators A and B, and (b)
there exists a unitary operator U on 7f such that A = UBU- 1 .

• S~ is the observable quantity that is represent able by thl' opera tor S ~.


l'I/I/IIIt'III '1'/11'011/ IIlItlll/l/wl'l S/IIII'I'1i IO ?

This reblioJ1 , wlllt'li it is tl'mpting to all th' jllrry relll/ioll, is reflexive,


sy mmetric, and transiliw. Note in particular that, if A = UBU I, then
B U IAU.
As an example, let A = Sx and B = Sz; the matrix representations of Sx
.1I1d Sz appear in Section 1.7. We now choose V, and hence V-I, so that,

V_-.fi
-
211
(1
-1) _ ..fi (
V- I - -
2 -1
1
~)
It is simple to show that Sx and Sz are mutually transformable.
Where there are no incompatible observables, the relationship of mutual
Iransformability becomes trivial, as the "transformations" involved reduce
10 a relabeling of the outcomes of a single experiment. However, mutual
Iransformability is an important characteristic of sets of observables in
quantum mechanics, and I discuss one such set in detail in Chapter 4.
Unlike Definition (3.9), the definitions of functional dependence and
l'ompatibility given in Section 3.7 made no direct reference to the represen-
1.llion of observables within a Hilbert space. Both definitions, however, can
I>l' reformulated in these terms; the definition of functional dependency is a
hil cumbersome, and I omit it, but a definition of compatibility of striking
~, implicity presents itself:

r I /0) Two observables are said to be compatible if they are representable on


a Hilbert space 'Jf by commuting operators, A and B.

One advantage of Definitions (3.9) and (3.10) is that neither is restricted to


observables with a discrete spectrum. Note in particular that the position
,Ind momentum observables, represented by operators Q and P on U
(where Q = x and P = -id/dx), are mutually transformable. There is a
IInitary operator V on U such that VPV- I = Q (see Busch and Lahti, 1985,
pp. 65 - 66). It is known as the Fourier-Plancherel operator on L2, and P and Q
.Ifl' sa id to be Fourier-connected; this is just a special case of mutual trans-
Immability.

,1.9 The Representational Capacity of Hilbert Spaces


No physical theory is in fact developed merely by setting up experiments
jIlld observing the frequency of occurrence of each of the possible outcomes.
Till' reason is obvious: no experiment takes place in a conceptual vacuum.
()nly within the context of a theory do we know what experiments are
worth p 'rforming, or even what procedure is to count as an experiment.
NI)lwllwless, let us imagine this approa h being taken . Then the existence of
108 '[h e Slm clllrt: of Qllallllllll 'f'I1l'o ry
---
incompatible observables A and B could be shown as follows: for a range of
states - that is, modes of preparation of the system dealt with - the proba-
bilities of the various A-outcomes and B-outcomes could be compared and
the incompatibility of A and B inferred from the relations among these
probabilities. In contrast, on an orthodox approach to quantum theory, we
deduce these probability relations from the fact that the operators corre-
sponding to incompatible observables do not commute.
On both approaches, "operational" or orthodox, probability relations
associated with incompatible observables give rise to the uncertainty princi-
ple. I discuss this principle in detail in Chapter 9; roughly, it tells us that
certainty about the anticipated result of a given experiment can only be
bought at the expense of uncertainty about the anticipated results of others.
For the present (and without gross distortion) we can take it to say that there
are incompatible observables.
While Dirac took the principle of superposition to be the crucial innova-
tive principle of quantum mechanics, others have cast the uncertainty prin-
ciple in this role; witness Hanson's remark (1967, p . 45) that "John von
Neumann generated all of quantum mechanics from an operationally suit-
able statement of the uncertainty relations alone./I The principle of super-
position tells us something about the set of admissible states, the uncertainty
principle something about the set of observables encountered in the theory.
Any theory which includes either of these principles is, we may say, inher-
ently probabilistic; that is, each principle entails that there are pure states
which assign to the outcomes of certain experiments probabilities other than
one or zero. When the principle of superposition holds we can construct
such states from any pair of states which assign a probability of one to
different outcomes of a given experiment. For instance, given states Pi and Pj
such that, for two distinct outcomes Xi and xj' Pi (Xi) = 1 and Pj(Xj) = I, we
can construct a third pure state, Pk' such that, for any outcome Xn of the
experiment in question, Pk(Xn ) = CiPi(Xn ) + CjPj(xn ), where 0 < Ci :S; cj < 1
and Ci + Cj = 1. Then 0 < Pk(Xi) = Ci < 1. Likewise, when observables A and
B are incompatible, there are noncompatible subspaces corresponding to
outcomes Xi and Yj of A- and B-experiments, respectively. (In geometrical
terms these subspaces are oblique one to the other.) In this case, if for some
state P(Xi) = I, then 0 *- p(Yj) *- 1; as an example, consider the observables
Sx and 5z, with their outcomes x+ and z+. Only if all (nonindependent)
observables are compatible can we have a determinist theory, if by that we
understand that the pure states of the theory assign to experimental ques-
tions no values other than one and zero.
Note that, even if we accept (as empirically adequate) an inherently
probabilistic theory T, we do not therefore have to deny th e th esis of deter-
minism. The theory could be true but incomplct ': by proper 8upplcmcntil -
PhysiclI/ 'J'//C'ory IIlIrI/lil/n' r' '/Ifl ees 109

lion it might b' m tUt' into n determjnist theory P (see Bohm, 1957). Of
roursc this would mea n that the " pure states" of T had, so to speak, been
misidentified; presumably they would appear as rruxed states in P . Supple-
mentary "hidden-variable" theories of this kind have in fact been proposed
for quantum mechanics, and I discuss them in Section 7.8.
Although both the principles under discussion entail that the theory is
inherently probabilistic, they are conceptually independent. The existence
of incompatible observables does not entail that we can add any (suitably
weighted) pair of pure states to obtain another; conversely, we can envisage
i.1 theory in which all pairs of observables are either compatible or indepen-
den t but in which the principle of superposition holds. In the latter case,
however, when all pairs of nonindependent observables are compatible, the
principle of superposition may have no empirical content. In the absence of
incompatible observables there may be no way to distinguish a superposi-
I ion of two pure states from a rruxture of them.
To see what's involved here, let us return to the farruliar incompatible
observables Sx and Sz and the (pure) sta tes Pz+ and Pz- which assign probabi 1-
ity 1 to outcomes z+ and z-, respectively, of an Sz-experiment. Note that we
have

I n the space (:2 the states Pz+ and Pz- (the eigenstates of the observable Sz) are
represented by the vectors

(~) and (n
(the eigenvectors of the Sz matrix: see Section 1.7). Now consider the state
represented by the vector

(see Figure 3.2). This is a superposition of the two eigenvectors of Sz; it is an


eigenvector of the Sx matrix, and represents an eigenstate of the observable
Sx. If Px+ is the probability function defined by this eigenvector, then
110 TIll' /fll e /llfe elf )11/111/1111/ '/'1/1'1".11

This function Px+ is thus a pure state such that, for ea h Sz-outcome Zi'

On the other hand, if we construct a mixed state p from the equally


weighted sum of Pz+ and Pz- , then, while as before, for each Sz-outcome Zj,

we now obtain

or, in other words,

The (pure) superposition Px+ is distinguished from the mixture p not by the
probabilities it assigns to the Sz-outcomes, but by those assigned to the
Sr-outcomes. It is the existence of an observable Sx incompatible with Sz
which enables us to distinguish the mixed state p from the pure state Px+'
The fact th at different probabilities are assigned to the Sx-outcomes by P and
Px I is associated with the fact that the subspaces in C2 representing these
ou tcomes are (geometrically speaking) obliquely inclined to those repre-
senting the Sz-outcomes: as we noted, in the partial representation of Sx and
Sz available in ~2 (see Figure 3.9), the x+ line is at 45 to both the z+ line and
0

the z-line.
If all the outcomes in question could be represented by mutually orthogo-
nal subspaces, or by subspaces all of which were generated from one set
of mutually orthogonal rays-it in other words, the observables were
compatible-then such differences would not occur. Assume, for instance,
that each outcome a of observable A, and each outcome b of observable B
(which is not independent of A), can be represented by subspaces M. and Mb
such that
!'111/Nil'lIl 'I'III'II/y llwl 11111' 1'1" Sfllll'I '~ III

where each of til(' fl \ll!~j Pd "i'fl / '"1 nnd L", is a m ' mb ' r o f a set {L;} of mutu ally
orthogo nal rays of a fl lX I(l' '/1 . Assume further tha t the set {M.} of subspaces
corresponding to A-outcomes spans 'Ii , as does the set {Mb} of those corre-
sponding to B-outcomes. Clearly, A and B are compatible.
We see that any function Ji on the set {L j }, such that (a) 0 :s; Ji(L j) :s; 1 for
each Lj in {L;} and (b) LjJi(Lj) = 1, determines a probability function p on the
sets of A-outcomes and of B-outcomes such that

p(a) = L Ji(L.) p(b) = L Ji(L b)


j

Also, to any probability function p there corresponds a function Ji, though


the latter is not necessarily unique.
Using these ideas, one can easily show that, if PI' P2' and P3 are probabil-
ity functions on these sets of outcomes, then we can only choose numbers dl
and d2 such that

provided that

Thus, where all observables are compatible, if one probability function is


th e weighted sum of two others with respect to one set of outcomes, then it
must be so with respect to all sets of outcomes.
The conclusion that emerges from this analysis is that, in the absence of
incompatible observables, the evidence of experiments like those we are
considering would provide no way to distinguish pure states which yielded
probabilities other than zero and one from mixtures which gave the same
probabilities. We may say that the principles of uncertainty and superposi-
tion are conceptually but not epistemically independent. Where no incom-
patibility obtains, it is consistent with any evidence of the kind we are
considering to regard the appearance of all probabilities in the theory (save
zero and one) as the result of our ignorance about an essentially determined
state of affairs. On this view the only possible pure states would be those
represented by functions Jij such that

Jij(L j) = J jj = 1 if i = j
= 0 if i =f= j

All other sta tes w ould be mixtures.


112 TIll: ' Im ClllrI' 0[(2111111111111 '1'/;/'0 11/

Some final remarks abou t th(' Ilignif'ienn t' of incompatible ob rvables


need to be made. We have seen that a Ililb >rt-space represen tation is possi-
ble for a wide class of theories; we would, however, regard it as peculiarly
fitted for a theory which had these features: in the space in which we
represent the states and observables of the theory, (1) each ray (or normal-
ized vector within that ray) represents a pure state, (2) every subspace of 7i
represents an experimental question, and (3) every Hermitian operator rep-
resents an observable. To echo a point made just now, these features are not
entirely distinct. If all observables were compatible, not only would the
Hilbert space have, as it were, some surplus capacity for the representation
of observables, but a number of rays would represent the same pure state; in
the simple two-dimensional case shown in Figure 3.10, if any outcome of
any experiment could be represented using just one pair of mutually orthog-
onal subspaces, LI and L2 , then no distinction could be made between the
pure states represented by VI and v 2 : whichever we projected onto LI and
L2 , we would obtain the same probabilities.
Be that as it may, a theory with all these features would employ all the
representational capacity, so to speak, of the Hilbert space. The question is,
does quantum mechanics do so? Well, the principle of superposition, when-
ever it obtains, guarantees the first feature . There are systems which do not
exhibit strictly quantum behavior, however, and for which the principle
fails. These include, obviously, classical systems which can be, at different
times, in distinct states 51 and 52 but can never be in a superposition of the
two. There are also other, nonclassical examples: for example, it is useful to
consider a proton and a neutron as two different states of a nucleon, but no
superposition of proton-state and neutron-state exists (Beltrametti and Cas-
sineIli, 1981, chap. 5). However, there are many systems for which feature
(1) does hold, and for which, therefore, any vector in the Hilbert space can
represent a pure state.

Figure 3.10
Now the 'xis t 'n 'l' o( IlInHlll .Itiblc obscrvabl es is not enoug h to guaran -
In' either (2) or (3). For t' ample, in the Hilbert space of square-integrable
fun ' tion of x there are in ompatible observables P and Q (momentum and
position) but, it seems, no genuine observables corresponding to the Hermi-
li.lI1 operators P + Q or PQ + QP, to name but two (see Wigner, 1973,
p. 369). For the Hilbert spaces representing spin systems, however, (1), (2)
.lI1d (3) all hold; this was established by Swift and Wright (1980). To dem-
onstrate (3) Swift and Wright showed that under certain idealizing
.Issumptions-in particular, the assumption that we can create in the labo-
I"ltory any electromagnetic field consistent with Maxwell's equations-an
.lrbitrary Hermitian operator on a spin system can be measured using a
I-> lIitable generalization of the Stem-Gerlach experiment. (They also ignore
masking effects due to charge; see Section 10.1.)
Thus, at least in the case of spin systems, quantum theory makes use of
the full representational capacity of a Hilbert space.

,1. 10 The Schrodinger Equation


In quantum theory, the state of a system at any time tl specifies the proba bil-
ities attaching to outcomes of any experiment performed at that time. If the
\'xperiment is carried out at a later time t2 (tl < t 2), the probabilities will not,
In gene.!al, be the same as at t I ; we say that between tl and t2 the state of the
Ilystem has evolved. Whereas at tl it was representable by a vector VI in the
.lppropriate Hilbert space '11, at t2 it is representable by V2. (I assume here
I hat the initial state is a pure state.) It is the latter state V 2 which assigns
probabilities to outcomes of experiments conducted at time t2 •
The Schr6dinger equation of quantum theory describes how the state
('volves through time. That is to say, it enables us to use the present state of
I he system to assign probabilities to future experiments. As we saw in
S 'ction 2.7, if t2 - tl = t, then the dynamical evolution of a system's state is
d 'scribed by the equation

where VI is a unitary operator on '11, Furthermore, this operator is a complex


(unction of the Hamiltonian operator H, which represents the total energy
o( the system; we have Equation (2.8b):

(I have here suppressed the constant h .)


114 Ti,l' Sl rtl Clllrt' of (Jill/III 11111 '1'11/'(/111

As this equation shows, H defines not just a single unitary operator V , but
a family {Vt} of such operators indexed by the time t. The question we now
address is: why should the dynamical evolution of states be given by opera-
tors of this kind? Is there, so to speak, an a priori derivation of Schrodinger's
equation?
Note first that the family {Vt} has a structure: it forms a one-parameter
group parameterized by the real numbers. This statement needs some am-
plification.
Consider two sets of numbers, the set ~ = {t: t is a real number} and the
set P = {e t : t is a real number}. ~ forms a group under the operation of
addition, and the identity element of this group is the number zero (see
Section 1.8). Since (i) for all t}, t2 E~, etl + t2 = etl • et2 and (ii) eO = 1, it
follows tha t <~, +,0) is isomorphic to (P, . ,1). In other words, P also forms a
group (under multiplication) whose identity element is 1.
The set we are interested in, {Vt}, is a set not of numbers but of operators,
each expressible in the form e- iHt . However, the rules for operator multipli-
cation echo those for arithmetical multiplication:

Hence the set {Vt} also forms a group isomorphic to (~,+,O); the group
operation is operator multiplication, and the identity element is the identity
operator I. It should be clear what is meant when we say that this group is
parameterized by the real numbers.
We can show that,

(3. 11) If A is any Hermitian operator on 7f, then

(i) e- iAt is a unitary operator on 7f;


(ii) {e- iAt } forms a group parameterized by the real numbers.

Of more interest to us is the converse theorem, due to Stone. (Fano, 1971,


shows this for the finitely dimensional case; see also Jordan, 1969,
pp.51-52.)

(3.12) If {Vt} forms a (weakly) continuous group of unitary operators on 7f


parameterized by the real numbers, then there is a unique Hermitian
operator A on 7f such that, for all t,
/ ·ltY:lit'IIJ '/'/" '/111/ 11111/ /1"/11'/'1 !-i,ll/r ('s // 5

TIll' s i ~ nifj can n' of Ihls IIwOrl'm is this: if we can show why th e dynamical
,'volution of sta tes should be givcn by a weakly continuous one-parameter
",fOUp of un ita ry opera tor , then it will follow from the theorem that there is
.. single Hermitian operator governing this evolution. (See Jordan, 1969,
p, 52; weak continuity is defined below, but see also Fano, 1971, p. 331.)
Wha t such an investigation will not show is why this operator should be the
11 " miltonian (the energy operator) for the system.
I.c t us ignore, for the moment, the fact that a Hilbert-space representation
of the states of systems exists, and consider a state just as a probability
f unction on a set of experimental questions, a set {(A, a;): A an observable, ai
, Ill outcome of an A-experiment}. We assume that the state P2 at time t2 is
"Iwcifiable in terms of the state Pt at tt (tt ::5 t2), whatever the latter may be.
Thus we can write,

where V~~ is some function on the set 5 of states; formally V~~ : 5 --+ 5 is a
Ill a pping of the set of states into itself.
If the state Pt is in turn specifiable in terms of the state Po at to (to ::5 tt) -
!h<l tis, if

then

,lI1d, using the standard notation for the composition of functions, we may
write

We have, of course, V~~ = V~: = I, where I is the identity function. In


~(' n eral,
if time is homogeneous-if, that is, no point in time is to be
distinguished from any other- V~~ will depend only on the interval t2 - tt,
so th at

This simplifies our notation conside rably . We define V" by


11 6 Till' 5 1m l' IIII'I' (II QIIIIIIIIIIII '1'111'1111/
and obtain

v:: = V, where t = t2 - tl

The definition of the product of these functions now gives us, for all t l , t 2 ,
and t3 ,

(3.13a) V,,' V'2 = V"+'2 = V'2 . V"


(3.13b) V,,' (V'2 . V(3) = (V" . V,,) • V'3
(3. 13c) V,,' Vo = V" = Vo . V"
(3.13d) Vo = I

Thus from just two assumptions, (1) statistical determinism, that the state
at time t2 is a function of the state at time tl (tl :5 t2), and (2) homogeneity, that
time is homogeneous, it follows that the evolution of states is governed by a
family {V,} of functions having the structure of a one-parameter commuta-
tive semigroup. By adding the further assumption, (3) continuity, that the
probabilities given by the state vary continuously with time (so that small
changes in time result in small changes in probability), we give {V,} the
structure of a continuous one-parameter commutative semigroup.
If {V,} is to be a group, then (4) each mapping V, of S into S must be
on e-to-on e. That is, to each mapping V,: S --- S there must correspond an
inverse ma pping V ~ I : S - S, so that

(3. / 3e) v, . V ~ I = Vo = V~I . V,

Mackey (1963, p. 81) called this assumption (4) "reversibility, "but this name
is " not quite appropriate," as Stein (1972, p. 390 and n. 21) has remarked,
because the assumption does not imply that, for each possible dynamical
evolution of the system, there is another evolution like the first but in the
reverse order.
We may associate each inverse mapping V~1 with a negative number - t
by writing

and thus obtain what we want, a continuous one-parameter group parame-


terized by the rea Is; note, however, that on this account each ma pping V_ ,
"lty,~i(' {/{ '{'It"IIIY/llllfllillll'rl .',ll/ceS I J7

(l· oI (.' h mapping Ih.ll , NO II ) H<ly, moves the stat ba kward through time)
ob lains it physi al signifi 'an c only from V" the member of the original
Il(' migroup of which it i th inverse.
Two more assumptions are needed to ensure that each operator V, can be
represented by a unitary operator V, on a Hilbert space 7f, The first is (5)
,Jrl'seroation of pure states, that V, maps pure states into pure states, Then its
n'presentation in 7f maps vectors into vectors, and so is an operator V, on
'/1 . Furthermore, since all vectors representing pure states are normalized,
V, leaves the lengths of such vectors unchanged.
A second assumption is needed to ensure that V, is linear; this may be
('xpressed by either of two requirements. The first is (6a) preservation of
~; lIperpositions, that V, preserves superpositions on 7f: that for all scalars a
lind b, and for all vectors u and v,

V,(au + bv) = aV,u + bV,v

The second requirement-equivalent, given (4), to (6a) - is (6b) preseroa-


I ion of inner product, that, for all u, v E 7f,

We...rnay get some feel for the physical consequences of (6b) from the
(oll owing considerations. Let Po and qo be two pure states, and let us assume
Ihat for some experimental question (A,a), po(A,a) = 1. Now let Po and qo
('volve under the same evolution operator v, to pure states P, and q" respec-
Iively, such that p,(B,b) = 1 for some new experimental question (B,b). Then,
provided that (6b) holds for the operator V, representing V"

qo(A,a) = q,(B,b)

To use a term we have not hitherto corne across, (6b) guarantees that
I rn nsition probabilities between states are preserved under dynamical evolu-
lion.
Assumptions (5) and (6) between them ensure that each V, is a linear
opera tor which leaves the lengths of vectors invariant. Since we have as-
sumed (4) that each V, has an inverse, it follows from Definition (2.9) that
l'ach V, is a unitary operator on 7f.
Hence, given assumptions (1) - (6), we know that the Hilbert-space repre-
st'ntations of the evolution operators satisfy the antecedent of Stone's
Iheorem (3 .12). It follows that, if these assumptions are satisfied, then
118 '/'lie lru elllrl' of QIIIIIIIIIIII 1'/1"11/ 1/

Schrodinger's equation takes the form

. iJv
1-=
AV
at
where A is a Hermitian operator. As was stated earlier, however, the as-
sumptions do not tell us why this Hermitian operator should be the energy
operator for the system.*

• For an argument by analogy with classical mechanics, Sl't' Jordan , 1969, pp. 101 - 102.
4
Spin and Its Representation

I n Chapter 3 I showed in general terms why a vector-space representation is


appropriate for a theory involving probabilities. I noted that a characteristic
feature of quantum theory was that the observables it deals with are, in the
technical sense, incompatible, and I focused on observables having a finite
number of values. In this chapter I will look at a particular family of such
observables, namely the components of spin of the spin-t particle, and their
representation. The representation of three of these observables (5 x ' 5y , and
S.) has been discussed in Sections 1.7 and 2.3.
[n ~art, then, this chapter provides a specific example to illustrate the
rather abstract discussion of the last one, but it also addresses a general
question, first broached in Section 3.8: from the point of view of physics,
does any significance attach to the structure of Hilbert spaces? It might seem
that the vector-space formalism is so well adapted to the representation of
probabilistic theories that it could be adopted for pretty well any theory of
that kind; perhaps its use in quantum theory merely indicates a decision to
represent states and observables in a mathematically convenient way.
Along these lines Cartwright writes (1983, pp. 135 -136), in a passage I have
already quoted,
IWithin quantum mechanics] states are to be represented by vectors; observable
qua ntities are represented by operators; and the average value of a given quantity in
a given state is represented by a certain product involving the appropriate operator
and vector . ..
But notice: one may know all of this and not know any quantum me-
chanics . . . to do quantum mechanics, one has to know how to pick the Hamilton-
ian. The principles that tell us how to do this are the real bridge principles of
quantu m mechanics. These give content to the theory .

And reyerabend (1975, p. 42, n . 9) suggests that


720 '/'/1(' 'true/llr1' Ilf (JIII/II/1111/ '1'111'11 /1/

The quantum theory can be adaph.· J 10 <I grl'nl lll<lIly dim ulties . It is an op 'n theory,
in the sense that apparent inadequacies a n be a counted for in a n ad hoc manner, by
adding suitable operators or elements to the Hamiltonian, ra ther than by recasting
the whole structure,

Both sets of remarks are true, but both ignore the fact that the Hilbert-
space formalism is, in an important sense, not theory-neutral. This fact has
been hinted at in the discussion of minimal representations in Section 3,8
and of representational capacity in Section 3.9. In this chapter it is illustrated
by an analysis of one particular problem.
The problem is this. Suppose that we neglect, for the moment, the physi-
cal significance of spin, the interaction of spin with a magnetic field, for
instance. Are there very general constraints to which the family {Sa} of
components of spin conform, and which guarantee that the family is repre-
sentable in (:2 in just the way that quantum theory tells us? According to
quantum mechanics, we can represent S%I Sy, and Sz by the Pauli spin
matrices; we can also produce a general form of matrix by which to repre-
sent any component of spin. What is it about spin that establishes that this
representation must be the right one? Come to that, what is it about spin that
establishes that a minimal representation in a Hilbert space exists? We shall
find that the possibility of such a representation depends crucially on certain
features of the family {Sa}; we can portray systems, not very dissimilar to
quantum systems, whose behavior cannot be modeled in this way. These
results will give us good reason to think that Hilbert spaces provide repre-
sentations of quantum behavior which are not only versatile and adaptable,
but physically significant.

4.1 Symmetry Conditions and Spin States


We are dealing here with a family {Sa} of observables. The index a picks out
a direction in space; intuitively, we can set out to measure the component of
spin in any direction we choose. We specify a as we would a point on a
sphere, by picking out the azimuthal angle c/> and the longitude () (see Figure
4.1) and writing

So that each point on the sphere is represented by just one pair of coordi-
nates we set

-n n
-n<c/>$n -<()$-
2 2
S,d" 11111/ 11/1 1~1'1'''I'III ' I/I(lli()1/ I I

Figure 4.1 Angular coordinates of points on the sphere.

The point a' on the sphere diametrically opposed to a has coordinates


(q) - n,B) if ¢ is positive and (¢ + n,B) if B is negative; a' represents the
direction antiparallel to the a-direction.
Ea<:.h observable 5" is assumed to have just two values, " plus" and
" minus. " The units we use are not significant, and so we omit the customary
~ or t h. For the questions (5", +) and (5", -) we write a+ and a-, respec-
lively. If the two values + and - are associated with the directions parallel
an d antiparallel to a, then we have

We assume that a state w of a system assigns a probability to each question


a+.ln other words, to each point a on the unit sphere of ~3 a state w assigns
a number pw(a+), so that, for all a,

The pure states of the system are those which assign probability 1 to exactly
one question a+ (and hence probability 0 to the complementary question
a - ).
Let us now see what the effect is of imposing some very general con-
straints, like symmetry and continuity, on the way that the probability
varies over the sphere. We assume the following to be the case.
122 '1'/11' ~ /1'II('/1I1'/' tI/ (..."/11/1/111/1 '1'/11°11111

(4.1a) There exists a family (S.. ) of obSt'1 voIblt's, indexed by points 011 the
unit sphere S of 1R3 (in othl'r words, by directions in physical space).
(4.1b) For each point a on S, the observable Sa has two possible values, +
and -, which we associate with directions parallel and antiparallel to
a.
(4.1c) The pure states w of the system assign probabilities Pw to all values of
the members of {Sa}.
(4.1d) (i) For each pure state w there is one direction in space a w such that
pw(a;:;) = 1.
(ii) For each direction in space a there is one pure state w such that
pw(a+) = 1.

Alternatively:

(4.1d) There is a one-to-one correspondence between states wand direc-


tions in space a, such that for wand the corresponding direction in
space a w, pw(a;:;) = 1. For ease of notation we write Pa for the proba-
bility function corresponding to a.
(4. Ie) For any pure state w, the probability assignments vary continuously
over the sphere.
(4.1[) For any pure state w, the assignments of probability over the sphere
are symmetric about the axis defined by a w .
(4. / ~O The set of pure states displays spherical symmetry.

Assumptions (4.1e-g) need some commentary. Let us take the case of a


pure state w associated with a particular point X on the sphere, that is, the
pure state w such that a w = X. For ease of description I will use a geographi-
cal vocabulary and refer to X as the N-pole of the sphere, to X' as the S-pole,
and so on. To X the state w assigns probability I, and to X' it assigns O.
Assumption (4.1 e) tells us that, as we move from X to X', we don't get sudden
jumps in probability between neighboring points. It rules out, for example,
an assignment whereby all points on the northern hemisphere (except X) are
assigned t, and all points on the equator and the southern hemisphere
(except X') are assigned t .
Assumption (4.1f) tells us that the assignments of probability are sym-
metrical with respect to the polar axis XX'. Thus all points on the same line of
latitude will be given the same probability, and for every point a on the
\'qll.llor ,

I
PX(a I ) = PX(a' I) = "2

More formally, assumption (4.1f) guarantees that the probability assigned


10 a+ is a function of the angular separation of a and X on the sphere's
surface.
Assumption (4.1g) now tells us that this function is independent of the
particular point X we choose in specifying the state. For example, if the state
10): assigns probability t to points on the sphere at an angular separation of
60 from X, then the state w.; (associated with a point ~ on the sphere) will
0

assign t to those points on the sphere at an angle of 60 from ~. 0

In sum, under the assumptions (4.1a-g),

(., 2) A continuous function t exists, mapping angles into probabilities


(t: [O,n] -+ [0,1», such that, for all pairs of points a and p on the unit
sphere of II~P,

..........
_ where ap is the angular separation of a and p. Further, t(O) = 1 and
t(n) = o.
We have also seen that t(n/2) = t and, in general, that

(1/ . .1) t(n - A.) = 1 - t(A.) for all A. such that 0 ~ A. ~ n

4.2 A Partial Representation of Spin in ~2

We now come to the question of whether the family {So,} of observables,


together with a set W of states, can be represented in a Hilbert space. Ther~
are two possible values for every observable in the family, and so an ideally
simple Hilbert-space representation would use a space of two dimensions.
In fact quantum theory tells us that there exists a representation of {Sa} in ((2.
But before we look at this representation, it will be instructive to see why, for
a family (Sa} of observables and a set W of states conforming to (4.1), a
representation in 1R2 is ruled out.
It will be useful to use a rectangular coordinate system for 1R3 as well as the
angular coordinates we have used so far. We label as thez+, x+, andy+axes
124 7'111: SlrII Clllrt' of Q i/{IIII II III '[,hl 'MY

of this rectangular coordinate system the directions in spa c passing


through the points (0,0), (n/2,0), and (n/2, n/2) on S, respectively, as in
Figure 4,1. The corresponding observables, S" Sv and Sy , have values de-
noted by z+, z-, x+, x-, y+, and y-,
Let us restrict ourselves, for the moment, to the subset {S",} of observables,
where {S",} = {Sa: a = (¢,O)}, In other words, we limit ourselves to the ob-
servables associated with the great circle G on the unit sphere through
which both the z-axis and the x-axis pass, For this set {S",} of observables we
consider in tum three sets of pure states, WI' We, and Ws , and theprobabili-
ties they assign. From (4.1d), we know that each set corresponds to a set of
points on the unit sphere S. WI contains just the state corresponding to the
point (0,0); using an obvious notation, we denote by Pz+ the associated
probability function, so that pzJz+) = 1 and, from (4.3), pzJx+) = t =
pzJx- ). We is the set of all states corresponding to points on G, while Ws is the
set of all pure states and corresponds to the whole sphere S.
For each set W the question is: are the states in Wand the observables in
{S",} representable within the two-dimensional space 1R2? It turns out that
when W = WI a representation in 1R2 is always available; when W = We a
representation is possible provided that a certain condition holds; however,
when W = Ws no representation in 1R2 is possible. A fortiori, no representa-
tion of Ws and {Sa} in 1R2 is possible.
To avoid ambiguity we must distinguish between the physical space con-
taining the directions (¢,O) and the representation space from which, by the
usual algorithm, we can generate probability assignments. The two values
of any S,,> in (Sq» must be represented by orthogonal rays in the representa-
Li on space, and so we are required to map the points (¢,O) on the unit circle G
in physica l space onto rays L(¢) in the representation space, in such a way
that any two diametrically opposed points on G are represented by orthogo-
nal rays in 1R2 (see Figure 4.2). Arbitrarily, we show the ray L(¢) in the first
quadrant when ¢ > 0 and in the fourth quadrant when ¢ < O.
The mapping we need is obvious. Recall that in the first instance we are
concerned with a single pure state Pz+' This is to be represented by a unit
vector z+ in L(z+) . As usual, the square of the length of the projection of this
vector onto a ray L(¢) is to give the probability of (¢,O)+. [I write (¢,O)+ for
the question a+ when a = (¢,O).] To obtain, for example, pzJx+) = t =
Pz+(x-), we orient L(x+) and L(L) at n/4 (45 °) to the ray L(z+). In general, for
any point (¢,O) in G, we orient the ray L(¢) at an angle /If", to the ray L(z+)
given by
S"ill /1/11/11 .' [<,'/Irt'I/ClIlll/;III' / :;

(D.D)

1-------",~----t(1T/2.D)

Figure 4.2 Unit circle in physical space Oeft) and representation space 1R2 (right) .

.' ince, for all a, pzJa) = 1 - pzJa'), L(</» is orthogonal to L(n - </», as re-
quired.
In this way we obtain a representation of {S",} and WI within II~F. Can this
construction also give us a representation of {S",} and WG? The question to be
.Inswered is this. We have mapped the unit circle G into the set of rays of ~2
in a way that yields the probabilities pzJ(</>,O)+] for each 5", in {S",}. These are
'he pr~babilities assigned by the pure state z+. But does the construction
hold good for pure states associated with other points on G? Are the rays
/.«/» oriented in such a way as to yield the correct probabilities for all such
slates? For instance, consider the state x+, such that PxJx+) = 1. This state
must be represented by a unit vector in L(n/2). Now this certainly gives the
correct probabilities to the possible values of Sz, since we have

.Ind, by our previous construction, L(x+) is at 45° to L(z+) and L(L) (see
Figure 4.3).
Il owever, consider the angle </> such that 1fI", = n/8, in other words, the
point (</>,0) on G such that pzJ(</>,O)+] = cos 2(n/8) . The subspace L(</» is at an
angle n/8 (22.5° ) to L(z+). Clearly, if our representation is to hold good for
'he sta te x+, then the question (</>,0)+ has to be assigned the same value by x+
_IS by z+. But, on the assumptions (4.1), this means that the point (n/4,0),
'qu idista nt from (0,0) and (n/2,O), must be among the points of G mapped
onto L(c/». (Note that, on the assumptions (4.1), the function t need not be
126 '['ill' S/m('/I/f(' II/ (2 111111 I II III 'f'ill'IIIY

L(Z)

Figure 4.3

one-to-one.) This implies that t{n/4) = cos 2{n/8), where, as in (4.2),


/'...
t{afJ) = POI{fJ+). But now observe, using the results from Section 4.1, that

t{n) = ° = cos
2
(~)
t (~) = ~ = cos ~) 2
(

t( ~) = cos
2
(i)
In fact, an extension of the argument given above to pure states associated
with the points (n/4,0), (n/8,0), and so on, shows that, for every nonnega-
tive integer n,

t (!!...)
2n
= cos 2 (~)
2+ n 1

Use of the relation t{A) = 1 - t{n - A), together with the continuity as-
sumption, now gives us:

(4.4) t{A) = cos 2 (~) for °~ A ~ n

Thus the only consistent representation of {S",} and WG in ~2 is remarkably


simple: the ray L{1» must be at an angle 1>/2 to L(z I), and only if (4.4) hold s
S/';II IIlItllI !! NI'/lrl' SI'/Ill1lillll 127

does our vector sp.ln· l'l'pl'l'sl'ntation hold good (or a ll the pure states asso-
ciLltcd with poin ts on the great circle G.
To recapitulate, (4 .2) told us that, given certain assumptions about {Sa},
the probability Pa(fi +) is a function of the angular separation of a andfi; (4.4)
tells us what this function must be if we are to represent the subset {S",} of
{Sa}, together with its associated pure states, in II,F; (4.4) is a necessary
condition for obtaining a representation of {S",} and We in II'F.
Equation (4.4) does indeed hold for spin-t probabilities, and so the repre-
sentation we have constructed is perfectly adequate, as far as it goes. But it
does not go far enough. The only states that find representation in it are
those associated with points on G; for full generality we need to consider,
and to represent, the full set Ws of states, or every state corresponding to a
point on S. The state y+, for example, such that, in accordance with (4 .2) and
(4.3),

cannot be represented in Figure 4.3 . Thus, even as regards all possible


probability assignments to members of {S",}, the representation in 1R2 is
inadequate.
Moreover, any attempt to represent other members of {Sa} on Figure 4.3 is
doomed to fail. For what ray is to correspond to (Sy, +)? The fact that, for the
state z+, pzJy+) = t = pzJx+) suggests that L(y+) = L(x+); but, by parallel
reasoning in terms of the state x+, L(y+) = L(z+) .
Limited though its success is, nevertheless the representation of {S",} and
We in 1R2 is not without interest. Effectively, the only consistent representa-
tion available is a uniform map of points on the great circle G in physical
space to points on a semicircle in the representation space, such that the
angular separation of any two points on G is twice the angular separation of
their images. This suggests that any consistent representation of the set {Sa}
of observables and the set Ws of states must respect the symmetries of the set
{a } of points in physical space. And, in a very precise sense, this is what is
achieved by the representation of {Sa} and Ws within 1(2.

4.3 The Representation of {Sa} in ([2


By a symmetry of a set of objects we mean a set of mappings of the set onto
itself (or automorphisms of the set) which leave invariant some relation or
identity characteristic of the set (see Weyt, 1952). If we rotate the unit sphere
S of 1R3, for example, about an axis through its center, so that the point
128 Til e 51ruciUrt' of QUIIIIIIIII' '/'''I'O Il!

xi + yi + zi = x~ + y~ + z~
The identity x 2 + y2 + Z2 = 1 is invariant under rotations.
We can readily show that a set of transformations under which an identity
is invariant forms a group (see Section 1.8). The symmetry group of 5 is just
the set 5U(3) of all rotations of 5 about its center. Inter alia, this leaves
invariant the angular separation of pairs of points on the sphere.
Let us now look at the way symmetry considerations enter into the prob-
lem of finding the conditions under which a Hilbert-space representation of
{5a } and Ws exists. As we have seen, one task is to find a mapping of points of
5 onto the rays of some two-dimensional representation space which yields
probability assignments consistent with assumptions (4.1). Within the rep-
resentation space these probabilities are determined by the "angles" be-
tween rays. (The term angle is metaphorical, if we are in a complex space: in
general, probabilities are given by expressions of the form 1< ulv >1, where u
and v are normalized vectors within the two rays.) The symmetry assump-
tions (4.1£) and (4.1g) require that, to any automorphism of 5 under which
the angular separation of points of 5 is invariant (that is, to any rotation of 5),
there correspond an automorphism of the set of rays of the representation
space which leaves invariant the "angles" between them; to such an auto-
morphism, in turn, will correspond a unitary operator on the representation
space (see Section 2.7).
We may express this by saying that assumptions (4.1£) and (4.1g) require
th e group 5U(3) of rotations of 5 to have a representation in the representa-
tion space. A group 9 is said to have a representation within a space 'V if there
exists a set of unitary operators on 'V which, under the operation of operator
multiplication, forms a group isomorphic to 9. Using this terminology, we
can attribute the partial success and ultimate inadequacy of 1R2 as the repre-
sentation space to the fact that, while (obviously) there exists within 1R2 a
representation of 5U(2) (the group of rotations of the unit circle G), there is
no representation within it of 5U(3).
But, as Felix Klein showed in the late nineteenth century, 5U(3) does have
a representation within (? which is effectively unique (see Goldstein, 1950,
chap. 4.5 and bibliography on p. 140). (I say the representation is "effec-
tively" unique because any rotation can be mapped onto two matrices, M
and - M, in C2.) Further, this representation (which is a mapping of rotation
operators on 1R3 onto unitary operators on C2) is consistent with a particular
mapping of points of 5 onto subspaces of C2, namely the mapping which
I,lk's Ihe point (V «IJ,O) ( S into the ray L(a), who e projcctorP(a) is given
by

COS2(4))
(;t .5) P(a) = 2
( cos 2¢ sm. ¢ e'O.
2
( ompare this projector with Po, discussed in Section 1.2.)
The argument so far has shown that, if the probabilities associated with
(Sa) conform to assumptions (4.1a - g), then the only possible representation
of {Sa} within C2 will use the mapping given above. But it has not yet been
shown that the probability function given by this representation is the
............
function t(aP) = Pa(P+), which actually obtains in quantum theory, still less
that it is the one which must obtain. In the remainder of this section I will
deal with the first of these issues; the second I postpone to Section 4.4.
The subspace L(a) projected onto by P(a) is to represent the experimental
question a+. The pure state w such that pw(a+) = 1 can be represented by a
normalized vector ~ in L(a), where

It is trivial to show that ~ is indeed in L(a).


If we choose the polar axis to be the z-axis of our coordinate system, as in
Figure 4.1, then we obtain, for x+, "-, y+, y_, z+, and z_, vectors familiar
from Section 1.7 as the eigenvectors of Sx, Sy, and Sz. These are shown in
Table 4.1.
Notice, incidentally, the vectors z+, z_, x+, and x_, and compare them
with Figure 4.3. We see that the II~F-representation obtained in Section
4.2 - the representation, that is, of states and observables associated with
points in S for which () = 0 -is embeddable in the C2-representation of {Sa}.
In order to obtain a general expression for t«(;jJ), we need only take the
most straightforward case, since we know that spherical symmetry obtains.
Accordingly, let us assume that the system is in the state z+. We then expect,
from (4.2), that the probability of a resuIta+, where a = (¢,(), depends only
on the angle ¢ . In fact we get
, ,-,,' ,r..- " 'rill rllff III \."UII'f"", 'r1rP fV

and, us ing the ('xprl'ssions lor "(tV) .lIId Z I , Wl' obl.lin

COS2 1> )
2
P(a)z+ = 1> 1> .
(
sin "2 cos "2 e'o

whence

Given this representation, the function t has a particularly simple form.


It has become apparent that the representation of SU(3) we are forced to
by symmetry considerations (provided, that is, some representation is possi-
ble) is exactly that used in quantum theory. As a final confirmation of
this-and also to display a result of great elegance-let us consider the
matrices on 1(2 which, on this account, are to represent the observables SOt.
Since we are effectively assuming the possible values of SOt to be + 1 and
-1, we know by the spectral decomposition theorem that, for any ex =
(1),8),

SOt = P(ex) - P(ex')

Before doing this calculation note that, for each point a on the unit sphere S
of Il~P, th ere will be an operator SOt. Although the steps of the calculation are
best performed using the angular coordinates of ex, in the final stages it is
worth moving to Cartesian coordinates, so that ex = (x,y,z), where x 2 +
y2 + Z2 = 1. We set 1> = 0 along the z-axis and 8 = 0 along the x-axis, as
before.
A wonderfully simple result now presents itself:

x - iY)
(4.7) S
Ot
= (
x +z iy -z

The Pauli matrices Sx, Sy, and Sz appear as special cases of (4.7). In terms of
these matrices we obtain

(4.8)
[','i,' IIl1d Ti ll R"IIII'l t'lIllIlllllI I ,ll

cos 2 /' III/}

111 11/" 4. J Spl'ciLll cast's of till' 1001IlUI,I : /), ('P )


sin! e,1J/2
2

Spin states of the spin-t particle

z+ z ~ =z _ x+ x~=x_ y+ y~= y-
7t 7t 7t 7t
,II 0 7t
2 2 2 2
7t 7t
() 0 0 0 0
2 2

lOS -
1> 1 0 -
1
-
1
-
1
-
1
2 .fi .fi .fi .fi
1>
', 111 - 0 1 -
1
--
1
-
1 1
--
2 .fi .fi .fi .fi
1 ]
" "'/2 1 1 1 1 - (1 - i) - (1 - i)
.fi J2
1 1
1',11/2 1 1 1 - (1 + i) - (1 + i)
.fi J2
(r,
G) (n ~G) ~(~1) ~e-i)
2 1+ i 1(I-i)
2 -1-1

4.4 Conclusion
The conditions imposed by (4.1) guarantee that, if a representation of {Sa}
and Ws exists in C2, then it is the one which employs the Pauli spin matrices.
Further, if this representation is faithful, then the function t of (4.2) is given
by
/'... 1 /'...
(4.9) t(ap) = cos 2 2' (ap)

It follows that, unless we can show why this is the only t-function possible,
we have not established that {Sa} must be representable in C2. But (4.9)
cannot be derived from (4.1a-g). Any monotone function t", of the form
132 Th e Stru ctll re (lJ Qllllllt""t '1'111'0 /.'1
(where, as the notation implies, 1fI«(/» is a function of (/» is consistent with
these assumptions, provided that

IfI (~ - 4> ) = ~- 1fI(4))

Typical admissible variations of 1fI(4)) with 4> are shown in Figure 4.4.
As an illustration, consider this whimsical example, proposed by Mielnik
(1968, p . 55; see also Beltrametti and Cassinelli, 1981, pp. 204-207). Imag-
ine that we have a spherical container, exactly half full of some liquid.
Imagine, further, that the surface of the liquid in the sphere is always a plane
through the sphere's center. This container, we assume, can be divided in
half by a thin partition along any plane through its center, and whenever
this is done we find that all the liquid ends up on one side or other of the
partition; thus the liquid exhibits quantum behavior. Furthermore, the side
of the partition that the liquid moves to is not determined; rather, there is a
certain probability of the liquid's moving to one side of the partition rather
than the other, and this probability depends on the orientation of the parti-
tion to the original surface of the liquid, as follows. If V L is the (volume of
the) hemisphere originally occupied by the liquid, and VA is the hemisphere
on side A of the partition, then the probability that all the liquid will be
found on side A of the partition is given by

o~~----+-------+-~
7T/2 1T

Figure 4.4 Functions 1{I(c/» such that I{I (~ - c/> ) ~ ~ - 1{I(c/» .


Thus, for 'xamp k , if till' p.trtilion is introduced along the existing surface of
the liquid, there i ' Z ' fO probabi lity that the liquid will move to occupy the
other half of the sphere.
This imaginary device conforms to (4.1a-g). For any point a on the
phere (see Figure 4.5) there is an observable Sa which we "measure" by the
experiment of introducing a partition along the plane equatorial with re-
spect to a. The "value" of this observable is positive if the liquid moves to
that side of the partition where a lies, and negative if it moves to the other
side. The state w of the system is given by the original orientation of the
liquid's surface; the point a w is the polar point of the hemisphere originally
occupied by the liquid.
It's easy to verify that all seven clauses of (4.1) hold but, as Mielnik points
ou t, there is no Hilbert-space representation of such a device. And, in the
light of our previous discussion, we can see why: the dependence of the
/",..
probability p(iP+) on the angle ap is given not by

but instead by

What constraint, then, must we add to (4.1) to guarantee that (4.9) holds?
Well, what is nowhere expressed in the assumptions (4.1) is the sense in

o.w
Figure 4.5
134 '['lte S/ru c/ure II! (JIIIIII/IIIII '1'111'/111/
which the members of {Sa} ar' ompol1ents of a physical quantity. From
(4.8) we see that, if we assume that {Sa} is reprcsentabl as a set of Ilermitian
operators in C 2, then these are indeed vector operators, which can be re-
solved into components (Messiah, 1958, vol. 2, p . 509). But it's not obvious
how such a relation might be expressible just in terms of the probabilities
that states assign to values of Sa. Clearly such probabilities cannot add
vectorially, on pain of yielding probabilities less than zero.
However, a possible condition on expectation values presents itself. We
write <Sa)w for the expectation value of Sa, as in Section 2.4; then

With respect to an arbitrarily chosen Cartesian coordinate system, let a have


coordinates ax, a y, and a z (see Figure 4.6). Thus a = (cpa,ea) = (ax,ay,a z ).
We now add to assumptions (4.1) the assumption

Given (4.10) it follows that

To see this, assume that the system is in a pure state wand that the angular
eparation of a and a w is cp. We now choose a coordinate system such that

z+
a z ---a=(<I>.O)
I
I
I
I
I
I
I

Figure 4.6 Components of a = (e/>,O) .


S"ill IIlItllI ll !<t'IIrt'IWllllllillll I.JS

(\' «(1),0) (Hin(/), () , co~I(M .111(.1 (~lII - (0,0) = (0,0, I). Then

whence, from (4.10),

( Sa)w = COSc/>( Sz)w = COSc/>


2pw(a+) - 1 = cosc/>

1
pw(a+) = cos 2 '2 (c/» [Q.E.D.]

Equivalent to (4.10), given assumptions (4.1), is:

(1.11) For any mutually orthogonal triple of points (a,p,y) in ~3,

The question posed at the beginning of the chapter now has an answer.
Under the assumptions (4.1) and (4.10), a family {Sa} of observables and a
set Ws of states has a representation in C 2 , and this representation, involving
the Pauli spin matrices, is just that employed in quantum mechanics for the
spin-t particle. Further, these assumptions are nontrivial; as Mielnik's ex-
ample shows, there could be "quantum systems" for which no such mini-
mal representation was possible.
Two more general conclusions can be drawn. The first is that any inter-
pretation of quantum mechanics must recognize that the theory deals with
families of observables which are knitted together in a way precisely cap-
tured by the Hilbert-space representation. The mutual interdependence of
the members of {Sa} is not a functional interdependence of the kind found in
classical mechanics, but an essentially probabilistic interdependence; the
observables are, in the technical sense, mutually transformable, as defined
in (3.9). Prima facie, any interpretation which invites us to consider them
independently should be mistrusted.
The second is that the way in which the relations between the observables
Sa in quantum mechanics are determined by the symmetries of three-
dimensional physical space typifies the way in which the relations within
any family of mutually transformable observables are determined by un-
derlying symmetries in nature.
5
Density Operators and
Tensor-Product Spaces

When the idea of a mixed state was introduced in Chapter 3, I suggested that
a weighted sum of projectors could represent such a state but postponed the
problem of providing a statistical algorithm. The problem is that of finding a
natural generalization of Equation (2.1):

Pv(A,Ll) = (vIP~v)

that is, of the equation whereby to each experimental question (A,Ll) the
state assigns a probability.
I will attend to this problem first. In the rest of the chapter I will discuss
the vector-space representation of states of complex systems; when two
hilh rto independent systems interact, they behave as one complex system,
and we ca n represent the states of this complex system, and observables on
ii, within a new vector space, the tensor product of the spaces appropriate to
the two component systems.

5.1 Operators of the Trace Class


As a first step toward the discussion of mixed states, I introduce the concept
of the trace of an operator.
Consider a Hermitian operator A on a Hilbert space 71. A is said to be
positive if, for all v in 71, (vIAv) ~ O. In fact it follows from this condition
alone that, if A is positive, then (i) A is Hermitian, and (ii) the eigenvalues of
A are positive (*). Now let {Vi} be an orthonormal basis for 71 (see Section
1.13). If A is positive and 71 is finitely dimensional, we can always evaluate
Li( vii A Vi), and even in the infinitely dimensional case there are still positive
operators A for which Li(ViIAvi) is finite. Wesaythatan operator A belongs
to the trace class provided A is positive and L/(v/IAv/) is finite. (See Fano,
1971, chap. 5.12.)
/ ), ' /1 1 111/ O/I/'f'II/OYIl /llIiI 'l't'II Nllf I'milllcf S,"I(,('S /3 7

This definition iN.!('('('p l.lble because, surprisingly, the value of L;(v;IAv;)


is ind 'pendent of the parli ular orthonormal basis {v;} which is chosen. [It's
a omparatively simple exercise (**) to show this.] Thus its value depends
on ly on A, and we call it the trace of A:

(fi. /) Tr(A) df :L (vjIAvj)


where {Vj} is any orthonormal basis for '71.
Since we're at liberty to choose any orthonormal basis whatever to evalu-
ate Tr(A), we may as well use the basis which makes life easiest. For in-
stance, let P be a projection operator onto a ray of '71. In this case we choose
an orthonormal basis {Vj} in which one vector, vi' lies within the ray in
question. The other vectors in this basis are all orthogonal to this ray, and so
we have

PVi = vi and PVj = 0 whenever i =1= j

whence

It is easy to show that, for any projection operator P onto an n-dimensional


subspace of '71 (where n is finite),

(!i. 3) Tr(P) = n

In addition, we can use the spectral decomposition theorem (1.32) to


show that if A is in the trace class and there is no degeneracy, then the trace
of A is the sum of its eigenvalues (*); since A is Hermitian, it follows that
Tr(A) is always real. When A is given a matrix representation, the sum of its
diagonal elements gives the trace of A.
The trace has the following properties. If a is any real number and A and B
are operators in the trace class, we have

(!i. 4) Tr(aA) = aTr(A) (*)


(.'>.5) Tr(A + B) = Tr(A) + Tr(B) (*)

An important result involves the product of a trace-class operator and a


bounded linear operator. B is bounded if there is a real number b such that,
for a ll v 'if , IBvl S blvl; all ontinllous operators are bounded (and
138 Tilt' SlrtlClllrt' III (Jill/III II III '1'/,1',1/1/

conversely-see Jordan, 1969, S' .6), and al l lin ear operators on a finitely
dimensional vector space are bounded.

(5.6) If A is a trace-class operator and B is a bounded linear operator, then


AB and BA are both in the trace class, and
Tr(AB) = Tr(BA). (***)

(See Jordan, 1969, sec. 22.)

5.2 Density Operators


We are particularly interested in a subset of the trace class:

(5.7) D is said to be a density operator if D is a trace-class operator and


Tr(D) = 1.

The terms statistical operator and density matrix are also used.
From what has been said, any projection operator P projecting onto a ray
of 7f is a density operator. Further, let {P;} be a family of projection opera-
tors projecting onto rays of 7f. Then, by (5.4) and (5.5),

(5.8) D = ~iaiPi is a density operator, provided (a) 0 :s; ai' for each
ai' and (b) ~iai = 1. (*)

We see that (5.8) gives us a recipe for constructing density operators from
projectors. But does it also give us a prescription for decomposing a density
opera tor? Specifically, (i) can we always express a density operator as a
weighted sum of projectors, and (ii) is this decomposition unique?
The answer to (i) is yes. Every density operator D admits a set {ail of
eigenvalues. (This is because every density operator is compact: see Fano,
1971, pp. 376, 291.) Assume, for the moment, that there is no degeneracy
(see Section 1.14). From the discussion in the previous section, these eigen-
values are all positive and add to one, and the spectral decomposition
theorem (1.32) then guarantees that a set {Pi} of projectors exists (each
projector Pi projecting onto a ray containing eigenvectors of D with eigen-
value ai) and that D = ~iaiPi'
Even if there is degeneracy, we can still apply the spectral decomposition
theorem and stipulate that each Pi project onto a ray of 7f. We will then find
that not all the ai are distinct, that aj = ak , for instance. But all this means is
that some ai are going to appear more than once in the summation that yields
~iai = 1 in clause (b) of (5.8).
I I,'" 1/1/ ('I/I'rt/lors IIlId '1'1'11 110 / 1'lIIdlll'l 8 /1111'1 '8 1.l9

The possibility of dIT,(' IIl'I"ICy, however, is one r 'ason we anno Lguaran -


tCl'a unique de ornposiLiol1 (or D (in other words, why the answer to the
sc ond question is no). Assume, for instance, that we have ai = ak. Then the
rays onto which Pj and Pk project span a plane Ljk in 71, and, if Pi and Pk
are projectors onto any two orthogonal rays of Ljk' we can replace Pj and Pk
in {Pi} by Pi and Pkto form a new family {Pi} of projectors (such that, for j *
*
i k, P: = P;). We then obtain,

As an example, consider the projection operators associated with the


Pauli spin matrices (see Section 1.7). The rays projected onto by P x+ and P x-
span the whole space C2, as do those projected onto by P y+ and P y_ and
those projected onto by P z+ and P z_. Numerical computation confirms that

111 111
"2 P x+ + "2 P x- = "2 P y+ + "2 Py- = "2 P z+ + "2 P z-
More fundamentally, the very construction employed in (5 .8) ensures
that density operators do not, in general, have a unique decomposition . For
in that construction there was no requirement that the rays onto which the
projectors Pi projected were to be mutually orthogonal. Yet we know from
the spectral decomposition theorem that for each D there exists a set {Pj} of
projectors onto mutually orthogonal rays such that D = LibiP:. Thus, in
general, we have

D = LaiPi = Lbli but {Pi} * {Pi}


i

and so a density operator has a nonunique decomposition. In fact, any


density operator D which is not itself a projector is expressible in an infinite
number of ways as a weighted sum of projectors onto rays, according to the
formula D = LiaiPi (with ai ~ 0 for each ai and Liai = 1).

5.3 Density Operators on (:2


In this section, following Beltramettiand Cassinelli (1981, chap. 4.2), I quote
a number of results for the operators on C2 . They are generalized for opera-
tors on the space Cn by U. Fano (1957, sec. 7). The reader is invited to supply
the proofs of these statements.
140 n/(, ' /ru t /llr(' of QIIIII//III/l '/'III 'll l y

Consider the four opera to r on 2:

(72 =
0
( i
-i)° I = (~ ~)
These are, of course, familiar: (71 = 25x , (72 = 25y , and (13 = 5 z (see Section
1.7).
Let A be a linear operator on (:2.

(5.9) If A is Hermitian, then there are real numbers PI' P2' P3' and P4 such
that

[see Section 1.6]

(5.10) If A is a density operator, then P4 = t.


(5.11) If A is a projection operator, then
(i) P4 = t and
(ii) pi + p~ + p~ = t (by idempotence).

Hence, writing r l = 2PI' and so on,

(5.12) If A is a projection operator, then A may be written in the form

where d + r~ + r~ = 1.

(5.13) Any three real numbers rI , r2 , and r3 such that ri + r~ + r~ = 1 spec-


ify a projection operator on (:2; the set of all projection operators on
(:2 is in one-to-one correspondence with the set of points
on the unit sphere of 1R3 . (*)

Let PI and P2 be the points on the unit sphere of 1R3 corresponding to the
projectors PI and P 2 on (:2.

(5.14) A density operator on (:2 expressible as the weighted sum of PI and


P 2 is represented by a point within the unit sphere of 1R3 on the
line PIP2 (see Figure 5.1). (*)
1)/' // /11/1/ O/ll'miMIl 111111 '1'/' ,111(1/' 1'1'0111/('1 SI'flCCIl 141

rigure 5.1 The set of density operators on C2; D = alP I + a2P2 = bl P3 + b2 P.; PI is
orthogonal to P 2.

(!,. 15) If A is a density operator on (:2, then A may be written in the form

where ri + r~ + r~ :5 1.
The last two results of this section are included solely on account of their
elegance; they will not be used in what follows.

(:;, "/ 6) The set of Hermitian operators on (:2 forms a four-dimensional vec-
tor space over the reals, and {GV G2,a3' I} forms a basis for this
space.
(:;.17) t Tr(AB) supplies an inner product for this space; with respect to this
inner product, the basis {GV G 2,a3' I} is orthonormal (see Section
1.9). (*)

5.4 Pure and Mixed States


We can now answer the question posed at the beginning of this chapter:
what algorithm can generate the quantum-mechanical probabilities for
mixed states?
Let P v be the projector onto the ray containing a normalized vector v, and
let Q be any projector on the space 7i. If {v;} is any orthonormal basis for 7i,
142 'l'ltt· Siru clllrt' of QIIIII/Ill/ll '1'111'0 111

then, by definition ,

Tr(QPv) = ~(vjIQPvvj)

= Tr(PvQ) [by (5.6)]

Using the strategy used to derive (5.2), let us take a basis {vJ containing v as
one of its members. Then Pvv = v, and PvVi = 0 when Vi 1= v, whence

But this is exactly the expression occurring in the statistical algorithm of


quantum theory; if Q = P~-that is, Q is the projector onto the subspace
representing the experimental question (A,Ll)-we have, by (2 .2),

(vIQv) = (vIP~v) = Pv(A,Ll)

It follows that, if we represent a pure state by the projector P v rather than the
vector v, then Pv(A,Ll), the probability that this pure state assigns to (A,il) is
given by

Noti ce that, by taking P v to represent a certain pure state, we eliminate the


oddity we noticed in Section 3.3, that two distinct vectors can represent the
same state. This happens when u = ev and lei = 1. Since any two such
vectors lie within the same ray, there is only one projector corresponding to
the state they represent.
We see tha t a projector P v onto a ray acts as a probability measure J.l p on the
set 5(71) of subspaces of 71, such that, for any L E 5(71),

(where P L projects onto L). It is straightforward to show that, if we have a


(finite) set of probability measures {J.li}, then any weighted sum LjajJ.lj of
such measures is also a probability measure, provided that (a) ai ~ 0, for
each ai' and (b) Ljai = 1. (To see this, confirm that the Kolmogorov axioms
(3.2) and their generalizations (8.4) all hold.)
But now consider a density operator D, where D = LiaiPi' Each Pi projects
onto a ray of 71, and to each corresponds a probability measure Pi on 5('11) .
1 )/,,,' 1111 ( '/ lI' l'IIl orll llllll '/" ' 11110 / I'I'O tlIl C/ S /II/( '/'II 14 ,1

Fill" .ln YSubsp.l n' I . . IIHI proll'l'lor 1\ we ha ve, using (5.4) and (5.5),

Tr(DP L ) = Tr( ~OiPiPL)


= LOjTr(PjPL )

= LajJ.lj(L)

Since 0 is a density operator, the constraints on the aj are just those we need;
I hus to 0 there corresponds the probability measure J.lD = LjajJ.lj on the set
S(,// ) of subspaces of 'if. To each subspace L of 'if it assigns the weighted
slim of the probabilities assigned by the pure states P j according to the
.l lgorithm

(I, 19) J.lD(L) = Tr(DPd

Thus, if L represents the experimental question (A,~), we can now general -


ize the statistical algorithm of quantum mechanics as follows :

(', )() pdA,M = Tr(DP~)

Within this equation, the density operator D represents the state of a system .
The use of density operators allows us to give a vector-space representa-
tion to mixed states. Mathematically, these are just appropriately weighted
su ms of pure states, so that, for instance, if P 1 and P 2 represent distinct pure
slates, then any density operator D = a1P1 + a2P2 (with a1 > 0, a2 > 0, and
II I + a2 = 1) represents a mixed state. We express this fact by saying that the

set of states forms a convex set, of which the extremal points are the pure
sta tes. This geometrical mode of expression seems particularly apt in the
case of ( ? , where the terms convex set and extremal point find a literal
re presentation. Recall from Section 5 .3 that the set of density operators on
('2- that is, the set of all states-can be put into one-to-one correspon-
d ' nce with the set of points in the unit ball of 1R3 . Within the set of states, the
extremal points, or pure states, represented by projectors onto the rays of
~ 2 , a re in one-to-one correspondence with the points on the surface of this
ba ll (in other words, with the points on the unit sphere of 1R3). Of course,
after the discussion of the spin-t particle in Chapter 4, this latter fact should
ha rdl y come as a surprise.
Le t me once more emphasize the distinction between a superposition
. .H1d a mi xture o f two pure sta tes, using, yet again, the example of spin. Con-
sider th e pure sta tes z+ a nd z_ (equivalently, P z + and Pz- )' We can form a
144 n'l! ' Im r /llrl' of )/11111111111 '1'11 1'/ 111/

superposition -1z+ + !z_ of lh 'I'll: statl.'s, whi h we norma lize to yit'ld


fiUz+ + t z- ) = x+ (equivalently, Px I)' Thi is a pure state. Ilowevcr,
t P z + + t P z - represents a mixed state: as we saw in Section 5.2,

This particular mixed state, in which the particle is, as we say, completely
unpo/arized, is one we shall come across again in future chapters.
The customary interpretation of mixed states used to be the ignorance
interpretation. According to this interpretation, a system in a state 0 =
alP I + a2 P2 was really in some pure state (PI or P 2 ), and the coefficients a)
and a2 represented the likelihoods of its being in one or the other; these were
epistemic probabilities, representing our best estimates of the chances.
This interpretation of a mixed state is clearly appropriate to a classical
theory (see Section 3.5), but it is open to two objections in the quantum-me-
chanical case. The first stems from the nonuniqueness of decomposition: as
we saw in Section 5.2, any density operator 0 which is not itself a projector
can be decomposed in an infinite number of ways. Now this may just mean
that our ignorance when we represent a state by 0 is (vastly) greater than we
had assumed; still, it does seem odd that when we cannot say which are the
possible pure states of a system, we can assign to a particular pair of them
probabilities which add to one. In the case of the unpolarized spin-t particle,
for instance, can we say that there is a probability of 0.5 that the particle is in
the x+ state and a probability of 0.5 that it is in the x_ state, and that the same
holds true for the y+ state and the y_ state, and for the z+ state and the z_
state, not to mention the nondenumerable infinity of other pairs of states
associated with different directions in space? And this is not merely a diffi-
culty associated with the central point of the set of states; all mixed states
allow an infinite number of decompositions.
It may be that the particular decomposition we should consider is in all
cases determined for us by the preparation the system has undergone. If so,
this is a fact that the formal specification of the state fails to reveal. And
there still remains a second, possibly more telling, objection against the
ignorance interpretation, which I will spell out in Section S.B.
Nonetheless, even though the ignorance interpretation is suspect, the
following remains true.
Assume that we prepare an ensemble of systems in a mixed state 0 and
that 0 can be decomposed according to the equation 0 = LjajPj. Then our
estimate of the relative frequency of any given experimental result from this
ensemble is exactly what we would get if th e ens mbl e consisted of variou s
1 )1'I1 1111f ('IJl'mlors 11111/ 'I't·II .~or l'milll l'l S,It/Ct'S /J/.!i

Hubensc mbll's, l'tlCh ill ,I pllrl' sl,)l '1'" and ea h of these subcnsembles were
r 'pre nt d in lh 'whole 'ns ' mble with relative frequency a, . This follows
from the fact that, for any projector P,

Tr(OP) = ~aiTr(PiP)

5.5 The Dynamical Evolution of States


When we use density operators to represent states, Schrodinger's equation
lakes the form

VI is the same unitary operator that appears in (2.8b):

Equation (5 .21) extends Schrodinger's equation to mixed states. A notabl


feature of the dynamical evolution it describes is that it leaves invariant th •
convex structure of the set of states. Assume, that is, that a mixture 0 is a
weighted sum of two pure states, p. and Pb , so that

0= aP. + bPb
Let 0, p., and Pb evolve under (5.21) in time t to Of, P~, andPb, respectively.
Then

0' = aP~ + bPb


A corollary of this rule is that if we prepare an ensemble in a mixed state 0
which is statistically indistinguishable from a collection of subensembles,
each in a pure state Pi (as in the case discussed in the last paragraph of the
previous section), then the ensemble and the collection will remain indistin-
guishable under dynamical evolution.
Of course, if the ignorance interpretation of mixed states is the correct
one, this is as it should be; an ensemble in a mixed state is not just statistically
indistinguishable from a collection of sub ensembles, it is such a collection
and the preservation of the convex structure of the set of states is just what
we would expect.
If the ignorance interpretation is rejected, however, the assumption that,
statistically, mixed states behave as though it were true is one that leads to
14 6 '/'''1 ' S /I'll r //I/'t' 0/ (>/1(1/1/11/11 '1'111'01,11

striking results. A th eorem du e 10 Kndison (1 9 I), d fecl ivel y the COl1 w rsl'
of the result quoted above, shows th e consequ ences of assumin g ( ) prl'sl' r
vation of convexity: that the convex stru cture of th e set o f sta tes is preserved
under dynamical evolution.
Let ft be a mapping of the set S of density operators on a Hilbert space 71
onto itself: ft : S -+ S. Then

(5. 22) If ft preserves the convex structure of the set S, then there is a unitary
operator V t on 'Jf (with inverse V;-l) such that, for every density
operator D in S,

Recall now the "derivation" of the Schr6dinger equation offered in Sec-


tion 3.10. We see that the assumption (C) does the work of the assumptions
(4), (5), and (6) made there. In other words, if we assume (1) statistical
determinism, (2) homogeneity of time, (3) continuity, and (C) preservation
of convexity, then the dynamical evolution of a system is given by a family
{V t } of unitary operators forming a weakly continuous one-parameter group
parameterized by the reals (see Simon, 1976; also Beltrametti and Cassinelli,
1981, pp. 52-55, 252-254). As before, Stone's theorem tells us thatthere is
a Hermitian operator A such that, for all t,

V, = e- iAt

5.6 Gleason's Theorem


In Section 5.4 I showed that the probability measures on the set S('Jf) of
subspaces of a Hilbert space include not only those representable by nor-
malized vectors (the pure states), but also those representable by density
operators on the space (both pure states and mixed states). The vectors and
density operators generate probabilities according to the (by now) familiar
algorithms

respectively.
The question arises: does this exhaust the set of possible probability
measures on S('Jf)? In other words, is every probability measure on S(7f)
representable by a density operator? To this question, " The affirma tive
answer was assumed by von Neumann, conjectured by Mackey, a nd
1)1'/1 11/1/ ('/I/" 'lIllIrS 111111 '/'/'/1 III/' I'mtl/l/ 'I SIII/ ('ell II/ 7

proVl'd by C Il'.IHOIl " (1II'1Ir.IIlH'lli ,1Ild 'assine lli, 1981 , p. 11 5; see Mackey,
1963; Gleason, 1<)57) .
The formal stateme nt of Gleason 's theorem runs as follows.

(,' , 23) Let J1 be any measure on the closed subspaces of a separable (real or
complex) Hilbert space 71 of dimension at least 3. There exists a
positive self-adjoint operator T of the trace class such that, for all
closed subspaces L of 71,

The term self-adjoint is effectively synonymous with Hermitian (but see


Fano, 1971, p . 279). If we demand that J1 be a probability measure, thus
requiring that J1(7i) = I, then Tr(T) = 1; in other words, T is a density
operator.
Note that Gleason's theorem only applies to Hilbert spaces of dimension -
ality higher than two. Thus the space (? used for most of our examples is in
this regard anomalous. This doesn't mean that we can't represent spin states
by density operators on (?, but rather that we can't know tha t this 'x hall sts
the set of possible states. As will appear in Chapter 6, this fact is linked to the
possibility of a "hidden-variable reconstruction" of the spin statistics for th .
spin-t particle.
To use the discreet euphemism preferred by mathematicians, Gleason's
original proof of the theorem is nontrivial. However, in 1985 an "elemen-
tary" proof was given by Cooke, Keane, and Moran, and this is reproduced
in Appendix A.
The heart of the theorem is the proof that, in Gleason's terms, every frame
func tion is regular. A frame function of weight W for 71 is a real-valued
function f defined on the unit sphere of 71 (that is, an assignment of real
numbers to the normalized vectors of 71) such that, for every orthonormal
basis {v;} of 71,

Lf(v;) = W

In other words, whatever orthonormal basis we choose, the assignments f


makes to its members always add to the same result.
It follows that a frame function for 71 is also a frame function for a closed
subspace of 71, albeit with a (possibly) different weight, and hence that all
normalized vectors in a ray are assigned the same value by a given frame
function (*). At the risk of belaboring the obvious, frame functions of
weight 1 are signifi ca nt to LI S beca use we regard any set of mutually orthog-
148 Th e Sirucilire of Q 111111 III III '1'111'11 /.'1

onal rays which spans 71 as rcprcsenting a sel of mutually 'x lusivc and
jointly exhaustive outcomes of a possible experiment; the probabilities as-
signed to these rays should therefore add to 1.
A frame function is said to be regular if there exists a self-adjoint (Hermi -
tian) operator Ton 71 such that, for all normalized vectors v,

f(v) = (vITv)

It is straightforward ('tr) to show that (5.23) follows from the fact that all
frame functions are regular.
The importance of the theorem can be summarized in this way. A quan-
tum-mechanical state gives a simultaneous assignment of probabilities to all
experimental questions involving observables in a given family (for exam-
ple, to all questions involving components of spin). Quantum theory allows
us to represent all members of this family on the same Hilbert space 71, and
tells us that certain states are representable by vectors in 71. With respect to
these (pure) states, the structure of the set of all these experimental
questions - the structure of the set of quantum-mEchanical events -is that
of the set 5(71) of subspaces of 71. Gleason's theorem tells us what the set of
all possible states on this structure is: it contains just those states which are
representable by density operators on 71; they form a convex set with the
pure states as its extremal points.
As we shall see in the next chapter, any straightforward account of the
properties of a quantum-mechanical system is ruled out by this result.

5.7 Composite Systems and Tensor-Product Spaces


Wh en two quantum-mechanical systems interact, they form a composite
system. States and observables of this composite system are then repre-
sented in a vector space 7i A ® 'liB formed from the spaces 7i A and 'liB in
which the states of the two component systems, A and B, are represented;
7iA ® 'liB is known as the tensor product of 7i A and 'liB. (See Jauch, 1968,
chap. 11.7, 11.8; Beltrametti and Cassinelli, 1981, chap. 7.)
We construct 7i A ® 'liB so that, if {vf) is an orthonormal basis for 7i A and
{un is an orthonormal basis for 'liB, then the set of pairs (vf ,un forms an
orthonormal basis for 7iA ® 'liB . We use the notation vf ® u7for the pair
(vf,un·
The inner product of the tensor-product space is defined in terms of the
inner products on 7i A and 'liB:
/1 ,' /1 /1 111/ O /,('m llll's IIJIII '['I'/IIWr I'mtill cl S," /('/'s 14 9

Sin 'Ihe s'l (vt 0<> un , j>.II) S '/fA ® 'liD, this quation defines an inner
prod uct on th ' wholl' tl'J)sor-product space, In any vector space, Ivl = 0 if
and on ly if v is the zero vector [see (1.21)]; it follows from (5 .24) that, for any
VA E 'lfA and UO E 71 8 ,

(5. 25) VA ® 0 = 0 = 0 ® u B

For our purposes, the details of the construction of 71 A ® 7fB are not
important (see Jauch, 1968), chap. 11.7; van Fraassen, 1972, pp. 351-362).
But a highly significant result of this construction is that the set of vectors
expressible in the form VA ® u Bis only a proper subset of 7fA ® 7fB . In other
words, although every vector in the space we construct is a linear sum of
vectors expressible in the form VA ® u B, not every vector in the space is itself
expressible in that form. Thus the tensor product of 7fA and 7fB is not simply
the Cartesian (or topological) product of 71 A and 7fB, but includes it as a
proper subset.
Since all vectors in a space are linear sums of the basis vectors, we can
define linear operators in terms of the transformations they effect on the
latter (see Section 1.13). We use this fact to define an operator AA ® All on
7fA ® 7fB in terms of the action of linear operators AA and AD on 7{A and
7fB, respectively, by writing:

(5.26) (AA ® AB)(vt ® uf) = AAvt ® ABU?

and extending this, by linearity, to the whole of 71 A ® 7fB.


These operators are Hermitian, provided that A Aand ABare. They repre-
sent observables on the composite system, measurable by measuring A Afor
system A and A Bfor system B. If a measurement is performed on only one of
the component systems (A Bon system B, say), then we represent this as a
measurement of I ® A Bon the composite system.

5.8 The Reduction of States of Composite Systems


We represent states of a composite system just as we do states of a simple
system, by density operators, or (in the special case of pure states) by
normalized vectors. But these operators and vectors are now to be defined
on a tensor-product space. If the two component systems are in pure states
VA and u B , then the composite system is also in a pure state VA ® u B.
However, because not all vectors in 71 A ® 7fB are expressible in the form
VA ® u B, the converse is not, in general, true. The question then arises, does
. every state, pure or mixed, of the composite system allow a unique reduction
into state of the component systems?
'I r: () '/'111' SIrul'ill rt' (lj (.>1111111""1 '1'111'1111/

Let us make this ques tion more prl'citl " Let 0 b' n d 'nsily op'ralor on
7i A ® 7i B representing a state of the omposilc y lem , Assume lhnl the
spectral decompositions of arbitrary Hermitian operators A A and AU on 'f f A
and 7i B are given by {PA} and {PB}, respectively. The question is now, are
there states DA and DB of the component system which, for all observa bles
AA and AB, and for all Ll and r ~ ~, satisfy the equations below?

(5. 27a) Tr[D(P~ 0 I)] = Tr(DAp~)

(5. 27b) Tr[D(I 0 P¥)] = Tr(DBp¥)

In each case, the trace is to be defined on the appropriate space.


These equations just express a consistency requirement: probabilities of
outcomes of measurements on either system are to be the same whether or
not we consider that system as a component of a larger one.
It turns out that, for any state D of the composite system, DA and DB are
uniquely specified by (5 .27) (see Jauch, 1968, chap. 11.8). But there are pure
states D of the composite system which reduce into mixed states DA and DB
of the component systems. (An example of this is discussed in Chapter 8.)
This fact is of considerable importance for our interpretation of mixed states,
since it shows that, in this case at least, an ignorance interpretation cannot
be maintained. (See Section 5.4.) Consider a composite system in the pure
state D, of which the component states are the mixed states DA and DB. For
the sake of argument, assume that DA = a}Pi' + a2P~' while DB = b}PT +
b2P~, with a1 oF a2 and b1 oF b2, so that there are no problems of degeneracy.
Then, according to the ignorance interpretation of DA and DB, system A is
really in one of the pure states Pi' or P~, and system B is really in one of the
pure states P~ or P~. These four states may also be represented by vectors vi',
v~, uT, and u~, respectively, such that Pi'vi' = vi', and so on. But this would
mean that the composite system is really in one of the four states vi' 0 u~,
vi' ® ut v~ ® uT, or v~ ® u~, with probabilities a} b}, a} b2, a2b}, a2b2,
respectively- in other words, that the composite system is in a mixed state.
Since this contradicts our original assumption, the ignorance interpretation
simply will not do. I return to this point in Section 9.6.
Another significant feature of the relation between composite and com-
ponent states is that, in the event that the component states are mixed states
DA and DB, then DA ® DB is not the only composite state satisfying (5 .27). In
other words, the composite state is not uniquely defined by DA and DB . This
suggests that there is, in general, more information available from a specifi-
cation of a composite state than from a specification of its component states.
The importance of this will appear in Chapter 8.
I " ' /l II /ly ('I/I'mlor.' 1I11t! '1'1'11 1/(/ 1 1'1111111 1'1 S/II/ ('('S 15 1

III HUIII , i( th ~' ('01 II p w. 1I I' ,llld (ompon 'n t sta t 's sa tis fy (5.27), th en:

(.', )HII) I( th e component states are pure (that is, representable by vectors
VA and Ull), then the composite state is pure and is represented by
VA ® U Il .

(.', JH I,) If the component states are mixed, then the composite state is not
uniquely defined by them; in particular, it may sometimes be a pure
sta te not expressible in the form VA ® u B •
(', Jlk) Any composite state D defines uniquely two component states, DA
and DB.
(', lHIi) If (and only if) the composite state is expressible in the form VA ® u B
are the component states pure.
II
The Interpretation of
Quantum Theory
6
Th e Problem of Properties

Thi s book is really an extended examination of the statistic.. 1 n lg ulithlll 0 1


l lll .. ntum mechanics, that is, of the equation

PD(A,L1) = Tr(DP~)

whi ch, in the case of pure states, reduces to

In Part One, I looked at the right-hand side of these equations; I was


concerned to sort out the mathematical theory of Hilbert spaces and to show
how naturally and elegantly they lend themselves to the representation of a
probabilistic theory. In Part Two I turn to the left-hand side and to the
problems which appear when we seek a deeper understanding of what the
algorithm tells us. These problems are easier to state than to resolve. First,
how are we to understand the quantities (A,L1) to which the theory assigns
probabilities? Second, what concept of probability does the theory invoke?
Third, what account can such a theory give of the measurements to which
the algorithm implicitly refers? More briefly, in Part Two I ask whether
looking for answers to the question, "How can the world be like that?" is as
conducive to despair as Feynman suggests.

6. 1 Properties, Experimental Questions, and the


Dispersion Principle
Recall from Chapter 2 that, in classical mechanics, a pair (A,L1) can be
thought of as a property of a system . Associated with a system there are
physical quantiti es (obseruables in our terminology); the values of these
156 The Illt erpretatioll of QlltllIllIlII 'l '/II'ory

observables change over time as the ystem ' state changes, but at any tim'
a measurement of any quantity will (ideally) yield a value within any de-
sired range of accuracy. A specification of the state gives us these values; as
we saw, the classical state w acts as a two-valued function on the set of pairs
(A,~): when w(A,~) = 1, the system possesses the property in question;
when w(A,~) = 0, it does not.
In this way classical mechanics allows us to preserve certain elements of
the ontological structure of the world first enunciated in Aristotle's Catego-
ries. * Where Aristotle had talked of "substance" and "quantity," in classical
mechanics we speak of "system" and "property." The question this chapter
addresses is whether these categorial elements can be preserved in an inter-
pretation of quantum theory.
In the discussion of quantum theory in Chapter 2, a pair (A,~) was de-
scribed as an "experimental question." But what exactly does such a ques-
tion ask? In classical mechanics too, the pair can be thought of as a question:
it asks of system whether it has the property (A,~), to which the state gives
the answer yes or no. The functions defined by the states of quantum
theory, however, are not two-valued; their values lie anywhere in the inter-
val [0,1]. Nor do classical states-states, that is, which assign to every
question either a yes or a no-emerge as special cases. In any theory which
uses the full representational capacity of a Hilbert space, there will be
questions represented by incompatible subspaces to which no state simulta-
neously assigns the limiting values 1 or O. Thus there will be no dispersion-
free states. This is easily seen geometrically. Consider, for example, (Sz,+)
and (Sx,+)' As we saw in Section 4.2, we can represent these experimental
qu estions, together with a selection of states (including z+, z_,~, and x_),
in 1R2(Figure 4.3). Clearly, any vectorlyingin, or at right angles to, the (Sz,+)
ray will be at 45 ° to the (Sx,+) ray. But these are the only vectors which
assign limiting values to (Sz,+), and they all assign a probability of t to
(Sx,+)' In fact, imagine the state vector v moving round the representation
space ~2. Then Pv(Sz,+) = cos 2 1f1, but piSx,+) = cos 2(1fI- n/4), and we see
that each probability approaches a limiting value only when the other
approaches t (Figure 6.1). This holds even if we move to C2, for none of the
additional states representable in C2 but not in 1R2 assigns a limiting value to
either question.
For observables with a continuous spectrum, the situation is even more

• In Categories 6 Aristotle suggests that the only quantities of substance are position, length,
area, and volume, but in Physics IV.14 locomotion (speed) also appears as a quantity. These
works are included in Aristotle (1984), among many other editions.
----1
I
I
I
I
I
1/2

Tr/2 1T
J'igure 6.1 Probabilities of (5., +) and (5,,+) for the state a+, where a = (1),0), as 1> varies
from 0 to 1l.

striking. Consider the noncommuting observables position (Q) and mo-


mentum (P) . If the system is, as we say, localized in a finite interval [a,b], that
is, if it is in a state v such that

pv(Q,[a,b)) = 1

then the only ~ ~ IR such that

is the set IR itself (Busch and Lahti, 1985; see also Section 9.1).
In quantum theory the dispersion principle holds: there are no dispersion-
free states (see Section 9.1). But neither the claim that the pairs (A,~) repre-
sent properties nor the claim that individual systems possess a full range of
such properties is necessarily at odds with this principle. Imagine the fol-
lowing hypothetical situation. At all times each observable for a system has
a well-defined value. Thus, for any putative property (A,~) at any juncture,
either the system has that property or it does not. Our present theory,
however, can only predict the probability that a given system has the prop-
erty in question; as a description of reality the theory is incomplete. If, in this
situation, we were to rest content with the theory we had, then there would
be serious and systematic limitations to our knowledge of the world. On
Einstein's view, this is just the situation in which we are placed by quantum
mechanics.
I !)H Till ' 11t/1'1111'I'11I/l011 tI/ (} IIIIIIIIIIII 1'111'111 .'1

6.2 The EPR Arg ulll clIl


Einstein's reservations about qua ntum th eory are well kn ow n. It was not
that he rejected the theory; rather, he decl ined to rega rd any th eory whi ch
just yielded probabilities as a candidate for an ultimate, compl ete account o f
the world. His remark chiding Born for believing in " the God who plays
dice" is now proverbial (letter of September 7, 1944, reprinted in French,
1979, pp. 275 - 276; for an interesting analysis of Einstein's views, see Fine,
1984).
But these reservations went beyond expressions of distaste for probabilis-
tic theories. With Podolsky and Rosen, in 1935 Einstein coauthored a re-
markable paper, now often referred to simply as "EPR." The title asks, " Can
Quantum Mechanical Description of Reality Be Considered Complete?"
The answer given is that it cannot; surprisingly, the argument uses results
obtained from the theory itself.
Einstein, Podolsky, and Rosen sometimes talk of the completeness of a
theory, sometimes of the completeness of the description of physical reality
given by a theory. They use the former as an abbreviation for the latter; the
assumption of the paper, that a physical theory should provide a represen-
tation of physical reality, is explicitly stated: "The physical concepts with
which the theory operates . . . are intended to correspond with the objec-
tive reality" (EPR, p. 777).
The relation their account suggests between physical reality, on the one
hand, and its mathematical representation by a theory, on the other, is this.
Theoretical physics employs mathematical models. Of these models only
certain elements represent existing features of the physical world. Ptolemaic
astronomy, to take a historical example, used a complex array of rotating
ircles mounted one on another. Yet (for Ptolemy at any rate) not all the
points on these circles represented elements of reality, but only those points
which represented the Sun, the Moon, Mercury, Venus, and so on. EPR
looks at the mathematical model supplied by quantum theory and gives us a
sufficient condition for an element of that model to represent an element of
reality:

If, without in any way disturbing a system, we can predict with certainty (i.e., with
probability equal to unity) the value of a physical quantity, then there exists an element of
physical reality corresponding to this physical quantity. (P. 777)

I will call this the EPR criterion for physical reality. The quotation above
makes it clear that the "elements of physical reality" they are concerned
with are values of physical quantities. These are thought of as properties
(A,a) of systems, as in classical mechanics, and (on our account) are repre-
'1l' llt,lbll' by I4 l1h, l1,h','/1 I , ,~ IIf .\ Ililbt'rt spa ' c. A J) 'cssa ry onditi on for the
('o llli let ' J)l'SS of ,\ tlll'ory, EI'I< says, is that "every element of the physical
I'l'lility III/I st have (/ cO //llt erpart ill the physical theory" (p. 777).
What Ei nstein, Podolsky, a nd Rosen now claim about position and mo-
l1Ientum applies equally well to the two noncommuting observables Sz and
S, for the spin-t particle:
If both of them had simultaneous reality-and thus defirrite values-these values
would enter into the complete description, according to the condition of complete-
ness. If then the wave function provided such a complete description of reality, it
would contain these values . .. (P. 778)

As we have seen, the spin state vector cannot" contain" the values of Sz and
5, simultaneously. However, the fact that they both can't enter at one time
into the kind of description which the state vector provides may just indicate
that they cannot have simultaneous reality. We could say, for instance, that
in the state z+ the particle has the property (5z ,+); the value of 5z is predict-
able with certainty, and so there is an element of reality corresponding to it.
Ilowever, we could also say that, in this state, the particle has neith er the
property (5 x ,+) nor the property (Sx,-), that neither of these properties
constitutes an element of reality,
Einstein, Podolsky, and Rosen saw that the fact that quantum mechanics
admits no dispersion-free states does not, on its own, tell us whether the
theory is complete or not. As they write,
From [the dispersion principle] it follows that either (1) the quantum-mechanical
description of reality given by the wave function [in our terminology the state vector] is
not complete or (2) when operators corresponding to two physical quantities do not
commute the two quantities cannot have simultaneous reality. (P. 778)
Now it may be surprising that, by using the theory itself, one could ever be
led to embrace alternative (1) of this disjunction. Although the EPR criterion
is only a sufficient condition for the ascription of reality, if this is the only
criterion we have, then what we regard as real will be limited by what we
ca n predict with certainty. But these predictions are provided by the theory.
How can a theory fail to predict with certainty something which it predicts
with certainty?

6.3 Bohm's Version of the EPR Experiment


The EPR strategy is to describe an experimental arrangement involving
correlated pairs of particles. These particles interact and then separate;
th ereafter measurements made on one particle can be used, via the correla-
160 TIle /II/ I'rl'rl' /n/io/l (If (J IIIIII/1111/ '1'111'(11.11

tions, to generate predictions about th 'oth 'r. These predictions have prob-
ability one, and so, according to the EPR criterion, properties of the second
particle acquire the status of elements of reality. Furthermore, since we may
choose what measurement to carry out on the first particle, such predictions
can be made about either of two incompatible observables. But it is implau-
sible that the reality of a property of the second particle depends on what
measurement is carried out on the first; hence values of both of these observ-
abIes should be considered elements of reality. Since this contradicts alter-
native (2) of the EPR disjunction, we are therefore led to alternative (1): the
quantum-mechanical description of reality is not complete.
In the thought experiment the paper describes, the incompatible observ-
abIes in question are position and momentum. I will describe an analogous
experiment suggested in 1951 by Bohm, in which the observables are dif-
ferent components of spin of the spin-t particle.
It is possible to prepare pairs of particles, such as an electron-positron
pair, whose total spin in any direction is zero. If the pair then separates,
theory suggests that if, for instance, an Sz experiment is carried out on each
system, then the results will always be opposite in sign: if the result of
measuring Sz on the electron is +, then on the positron it will be -, and vice
versa. The same holds for all directions in space (that is, for Sx, Sy, and so on),
provided that both experiments measure the same component of spin.
It's worth sketching the formalism by which quantum mechanics reaches
this result; the general result, Equation (6.1), will be important later. We
represent the spin state of a single spin-t particle on a two-dimensional
complex space; call it'll. States of the composite system, electron +
positron, will be represented in the tensor-product space'll e ® 'II P of two
such spaces (see Section 5.7). Now let v+ and v_ be the eigenvectors for
some component of spin S~ for the electron, and let u+ and 1L be the
eigenvectors of the same component of spin, S~, for the positron. The singlet
spin state in which the system is prepared is given by

The intriguing thing about this state is that it is independent of the direction
a; that is, we get the same vector in 'lie® 'II P no matter what component of
spin we choose to work with, provided only that we choose the same
component for both systems. Compare this with the single system, for
which
'I'lli' I'm/J!,' /11 IIJ flrtJ/l('rl ii's 'I G1

'l'giv'8 WHI H ·d t .. ,liI )lI of lh , sta teof th ' ompo it system.To measure
.11' observabl ' on th(, composi le system we can perform an experiment on
"oIeh of the component systems; for instance, we may measure S~ on the
"I,'elron and Sp on the positron. Such a (joint) observable is represented by
I Itl' opera tor S ~ ® Sp on 7i' ® 7i p. The probabilities computed by using the
Il.lndard quantum-mechanical algorithm on the tensor-product space are
II,i n t probabilities, the probability, for instance, that a measurement of S ~ on
Ihl' electron will yield + and that a measurement of Sp on the positron will
.dso yield +. It turns out that, for the singlet spin state, this joint probability
HI given by

(It I) (**)

/"-..
where ap is the angle /"-..
between the directions a and p.
Notice that when ap = 0 (when a andp coincide) there is zero probability
I hat both measurements will yield +; this is exactly in line with what was
s.,id earlier, that if the result of measuring S~ (say) is +, then the result of
111 ' asuring S~ must be -. In fact we have, for any direction a,

1
(t, '}) P'l'[(S~,+),(S~,-)] ="2 = P'l'[(S~,-),(S~,+)]
/"-..
Effectively, in these cases ap = 180 0

The argument now runs as follows. Assume that we perform an Sa mea-


surement on the electron of a given pair. Then, without disturbing the
posi tron, we will be able to predict with certainty what value of Sa a mea-
su rement would reveal for it. Thus, according to the EPR criterion, the value
of SOl for the positron is an element of reality. But we could as easily have
measured Sp for the electron (where p is distinct from a), and thereby been
able to predict with certainty the value of Sp for the positron. It follows that
the value of Sp for the positron is equally an element of reality. The represen-
tation furnished by the state vector for a single particle is therefore incom-
plete, since it does not contain elements which are counterparts of both
these elements of reality.
The crucial moves in the argument are these. After the interaction, the
second particle (the positron in our example) is regarded as physically inde-
pendent of the first. (This condition is sometimes known as the locality
condition, ) Because of the correlations resulting from the interaction we may
obtain information about the properties of the positron by means of experi-
ments performed on the electron, but these properties are assumed to exist
162 'f'I1 1' 111/ /" 1,,./'1 11 /; 0 11 tI/ (.) 111111 /11 1/1 '1''' /'/1 11/

independen tly of w hat happe ns to ti ll' electron one - th e pa ir has separa ted .
In particular they are assumed to ex ist ind ependentl y of the fact th at we
perform measurements upon it. Notice tha t although certainty of prediction
is a sufficient condition for ascription of reality, what exists is not to be
identified with what we can predict. This lifts the paradox we met at the end
of Section 6.2: there is no suggestion that we can predict with certainty the
values, for example, of both S~ and S~ at the same time. For any given pair,
we can choose to perform either an S~ or an S ~ experiment. Each of these
experiments would reveal an element of reality associated with the positron .
It is because (if locality obtains) our choice will not disturb the positron in
any way that we can claim that both these elements of reality exist simulta-
neously. In the words of EPR, to make " the reality [of S~ and Sn depend on
the process of measurement carried out on the first system, which does not
disturb the second system in any way" is something that "no reasonable
definition of reality could be expected to permit" (EPR, 1935, p . 780).
The summary I have given departs from EPR, not only by reworking the
argument in terms of spin components as Bohm suggested, but also by
putting it in terms of incompatible properties of the second particle, whereas
EPR assigns it two distinct states. (I discuss EPR in terms of states in Chapter
8; see also Beltrametti and Cassinelli, 1981, pp. 69 -72.) I have rewritten itin
this way partly to emphasize that the argument, if valid, does not convict
quantum theory of internal inconsistency. Nor was that its aim. As will
appear, there are other deep problems which the EPR experiment raises, but
here I have been concerned to bring out the thesis argued by the original
authors, that we can regard quantum mechanics as complete only at the cost
of abandoning a particular-and appealing-account of physical reality."

6.4 The Statistical Interpretation


Einstein's realism about the properties of systems went hand in hand with a
specific interpretation of quantum theory, now generally called the statisti-
cal interpretation. At one time the phrase referred to any account of quantum
theory which accepted Born's rule for deriving probabilities from the
squares of projections of the state vector (or "wave-function," as it was
generally called); in fact von Neumann (1932, p . 210) used the phrase in just
this sense. Now, however, the Born rule is effectively part of quantum
theory, and we understand by the "statistical interpretation" an interpreta-
tion of quantum theory which views the state description provided by the

• For a detailed analysis of EPR, see Hooker (1 972); for a futl accounl of r('spo nscs 10 ii, Sl't'
Jammer (1974, chap . 6).
stat' v 'ctor or dt'lwll ol"'ldltlr ,IS "pplicable to .lll ensl'mble of sim ilarl y
prepared sys tems, r.IIII,'r 111.1111 0 an individual sys tem (l3all ' ntinc, 1970).
The term ellselllbll' is borrowed from statistical thermodynamics; it refers
to a conceptual entity: a se t of similarly prepared particles. As Ballentine
(1970, p . 361) points out, this should not be confused with a beam of
particles, whose individual members may well interact with each other.
On this interpretation, the state description provides statistical informa-
tion about such ensembles; a natural, though not necessary, concomitant of
this is the view that quantum mechanics is a classical statistical theory, in
that the probabilities yielded by the state vector give the relative frequencies
of occurrence of properties among the members of the ensemble. If, for
example, an ensemble of spin-t particles were in the z+ state, so that
p(Sv+t) = p(Sx,-t) = t, then half of the members of the ensemble would
have the property (Sx,+t) and half the property (Sx,-t). Which property
any particular system had would be revealed upon measurement.
It is clear that, on this interpretation, the description of individual systems
offered by quantum mechanics is invariably less than complete.
The view I have sketched here has three components, which can be ca ll ·d
the Precise Value Principle (PVP), the Relative Frequency Principle (RPP),
and the Faithful Measurement Principle (FMP). (I use the nomencla ture of
Healey, 1979, here, and the general direction of this chapter is clo ely
aligned with that of his paper. RFP is implicit in his account, though not
explicitly stated.) According to PVP, whatever the state of a system (or, more
properly, of the ensemble containing the system), each observable has a
precise value for the individual system. According to RFP, the quantum-
mechanical statistics represent the relative frequency of occurrence of these
values within the ensemble. FMP suggests that every successful measure-
ment reveals the (preexisting) value of that observable for the particular
system under test. FMP thus tells us that, if the value a of an observable A
occurs in an ensemble with relative frequency n, then (ideal) measurements
of A will yield that value with the same frequency.* Thus the measured
frequencies coincide with the existing frequencies of particular values, pro-
vided, that is, that the measured sample can be thought of as a genuine
ensemble.
Elements of this view are to be found in the work of Einstein and of
Popper. Certainly, both believed that the quantum-mechanical form alism
applied to ensembles of systems, and both espoused PVP. (See, for exa m pie,
Einstein, 1948; Popper, 1982; Ballentine, 1972.) And, as Healey points out,
without FMP, PVP has little empirical content. Note, however, that Popper

• In an acidul ous footnote Fine (1979, p . 152) dispul es Ihis o rrelation, but his rejeclion 10 il
seems, instead, to be a rejection of FMP.
764 Till! 11I1L'rllrl'll/lioll IIj )/Ilillllllll 'I'//('ory

(1982, pp. 64 - 74) did not int ' rprl't probabilities as relative frequen ies,
preferring instead a propensity interpreta tion .
Independently of any cachet bestowed by its pedigree, the statistical
interpretation is prima facie a very plausible and attractive view of quantum
theory. Unfortunately it cannot be maintained-at least, not in the simple
form in which I have presented it.

6.5 Kochen and Specker's Example


The statistical interpretation, as presented in the previous section, will be
threatened by any counterexample to PVP. Such a counterexample is of-
fered by Kochen and Specker (1967); if their result holds, then we cannot
regard the properties of systems in the way that the statistical interpretation
suggests.
The example they use involves a spin-l system. Whereas for the spin-t
particle there are only two possible values, +t and -t, of any component of
spin, for a spin-l system there are three: + 1,0, and -1. Thus the square S~
of any component of spin can take as values only + 1 and O. Kochen and
Specker show, first, that, if we take any triple of these squares, S~, Sj, and
S ~, corresponding to three mutually perpendicular directions in space, a, p,
and y, then for all states of the system a measurement will show two of them
to have value 1 and the third O. PVP would then require us to assign 1 or 0 to
each direction in space, and to do so in such a way that, of any three
mutually perpendicular axes, a, p, y, two receive value 1 and the third O. By a
geometrical argument, Kochen and Specker show that this cannot be done.
This is a very remarkable result-how remarkable can be seen by com-
paring this situation with that of the components of spin of the spin- t
particle, whose possible values are just +t and -toIn this case, PVP sug-
gests that each direction in space must receive a value different from that
given to the diametrically opposed direction. Clearly, one elementary way
to do this is to imagine a sphere split into two; to one hemisphere we assign
+t, and to the other we assign -t. Whether or not we could ever generate
the quantum-mechanical statistics from such an assignment of values is, of
course, a very different question. The point is that Kochen and Specker's
example shows that, for certain systems, even that trivial kind of assignment
is denied us. Recall, in this connection, that Gleason's theorem applies only
to a space of dimensionality three or greater. (See Section 5.6.)
Let us look at Kochen and Specker'S argument in more detail. Since, for
the spin-l particle, there are three possible values of each component of
spin, a three-dimensional vector space is needed to represent the spin sta tes
of such a system. We use the space C 3 , on which op rator a re given by
3 X 3 matrices of complex numbers; the rules for 111, nipulating th 'Ill ore
'J'III' I'ro/J/('/II IIf l'mJlerl it,S 165

n.• tural ext 'nsions of thOH~' UNI.·d for the 2 X 2 matrices of ( ? (see Section
1.6). The analogues of th • Pauli spin matrices for the X- , y-, and z-compo-
n 'nts of spin are

S·~O
0
0
-0 S,~ ~ ( -I
0
0
0 ns~U . %

0 n
-i
0
0

We see that

s:~O
0
1
0 n s~~O n 0
0
0
s:~O
The operators Sx, SY' and S% do not commute with each other; like the Pauli
0
1
0 n
matrices, they obey a cyclic commutation relation (see Section 1.7). The
operators S;, S;, and S; , on the other hand, commute with each other. Each
of them has eigenvalues 0 and 1, and so these are the possible values of th
observables they represent. Their sum is given by

ince all vectors in (? are eigenvectors of 21, with eigenvalue 2, it follows


that measurements of the sum of S;, S;, and S; (strictly, of the observable
represented by their sum) will always yield value 2. Thus, of the trio of
observables, S; ,S;, S; , two have value 1 and the third O. By symmetry this is
true for any trioS~, S~, S~, provided that a, p, and yare mutually perpendic-
ular directions in space.
Two assumptions are being made here (compare Healey, 1979; Stairs,
1983b). The first is that there are (unique) observables which are repre-
sented by S;, S;, S;, and I. Second, we assume that when one operator is
written as a function of others, as when we write

then the possible values of the corresponding observables are functionally


related in just the same way, so that we can add the values of S;, S;,
andS;
to obtain a value for 21. (1 is the" observable" whose value for any system is
always 1.)
As for this second assumption, there seems little reason to doubt it, at least
when, as here, the observables are compatible, and the functions involved
(n' simpl sum. nd product fun tions. For, as was shown in Section 3.7,
166 '1'1t(' IlIlelllft'llIlio" II/ ()IIIIIIIII/ll / 'ItI'lI ry

such functional relations among cornp<ltiblc op 'ra tors M' dcfinl'd on just
this basis. Kochen and Specker address the first assumption by proposing a n
experiment which would yield values to the observable repre ented by

The system they consider is an atom of orthohelium. Thus they establish not
only that 21 represents a genuine observable (when a = b = c), but also that
5;,5;, and 5; are actually commeasurable as well as being compatible. For
the possible values of K (its eigenvalues) are a + b, b + c, and c + a, which
will be distinct provided that a, b, and c are. From our second assumption,
these values correspond to the cases when 5;, 5;, and5;, respectively, have
value O.
There remains the question of the uniqueness of the observables repre-
sented by the 5 matrices (by 5;, for example), but I will defer discussion of
this until Section 6.8.
I will give the impossibility proof in an elegant version Jue to Friedberg
(first published in Jammer, 1974, p. 325).
Let us assume (A): We can assign a value of 0 or 1 to each point on a sphere
in such a way that, of any orthogonal triple of points, just one receives value
O. Call such an assignment an A-assignment. We then show: (I) There is an
angle p such that, if any point p on the sphere receives value 0 on an
A-assignment, then so does any point q at an angular distance pfrom p. (II) If
one point on the sphere receives value 0 on an A-assignment, then, from (I),
so do all the others. But (II) contradicts our original assumption; it follows
that no A-assignment exists.
In what follows, our notation shows A-assignments assigning values to
vectors rather than to points on the unit sphere; for example, we understand
by v(x + y) the value given by an A-assignment to the point q on the sphere
where it is pierced by the vector x + y (in its positive direction).
To show (I): Consider an orthonormal triple of vectors, {x,y,z}, from the
center of the sphere. From this triple we generate two more orthogonal (but
not normalized) triples of vectors: {x + y, x - y, z}, {x + z, y, x - z}. We
now show that there is no A-assignment v such that,

(6.3) v(x + y) = 1 = v(x - y)

(6.4) v(x + z) = 1 = v(x - z)

For such an assignment would yield, from (6.3), v(z) = 0 and, from (6.4),
v(y) = 0, thus violating assumption (A).
'/'},I ' 1'/0/111'111 ollll'lll,a/it 'H / 67

Now cOJlsilil- r Ihl ' VI'l l oll l (y I z) X and (y I z) I x. It is easy to show


hy ve tor geonw i ry 111,,1

(X + y) 1.. [(y + z) - xl 1.. (X + z)


(x - y) 1.. [(y + z) + xl 1.. (x - z)

Si nce no two perpendicular vectors can both be assigned 0 by an A-assign-


ment, it follows that there is no A-assignment v such that

v[(y + z) + xl = 0 = v[(y + z) - xl

si nce this would yield

1 = v(x + y) = v(x + z) = v(x - y) = v(x - z)

and we have already proved that such assignments are forbidden.


[n this way we have found two vectors, (y + z) - x and (y + z) -I- x,
which cannot both be assigned 0 by an A-assignment. By taking their inl1l' r
product (see Section 1.4) we see that the angle ex between them is
m
cos- 1 = 70 °. Since the choice of the basis {x,y,z} was arbitrary, it foll ows
that no two points on the sphere whose angular separation is ex can be
assigned 0 by an A-assignment.
Now let w be a normalized vector in the x-y plane, lying between x and y,
and making an angle ex with y. Then w makes an angle p with x, where

If v is any A-assignment for which v(w) = 0, then v(y) = 1, and, since


w 1.. z, v(z) = 1. It follows that v(x) = O.
Again, this may be generalized: if the angular separation of two points P
and q on the sphere is p, then for any A-assignment v, v(p) = 0 implies
v(q) = O. This proves (I).
To show (II), let P and q be any two distinct points on the sphere. We show
that there is a finite sequence of points (PI ,P2' .. . ,Pn) where n ;;:::: 2, such
th at PI = P, Pn = q, and the angular separation between any pair of succes-
sive points, Pi and Pi+l, is p. Then, from (I), any A-assignment assigning 0 to
p also assigns 0 to q. Starting at P, we mark on the great circle through P and q
a sequence of points, so that the angular separation of each from its prede-
c ssor is p, and such that the last, Pi say, has an angular separation from q
less than or equal to p. Clea rly, if the angul ar separation of Pi and q is equal to
168 'I '/It' lillt'l/m' lal/o" IIJ )111111111111 'f'''''III~11

Figure 6.2

p, then the required sequence is {p, ... ,Pi,q}. If the angular separation of
Pi from q is less than p, then, by continuity, there is a Pk whose angular
separation from both Pi and q is equal to p (see Figure 6.2), and the required
sequence is {p, ... ,Pi,Pk,q}.
This concludes the proof.

6.6 Generalizing the Problem


The formalism of quantum mechanics entered the argument of the previous
section in one place only: it was used to establish that the sum of the values
of S;, S;,and S; must be equal to 2. However, we can use an extension of the
impossibility proof to show that PVP cannot hold in any physical theory
that uses the full representational capacity of a Hilbert space of three or
more dimensions, that is, in which there is a one-to-one correspondence
between experimental questions pertaining to a certain class of observables
and the set of subspaces of such a space.
Let us assume that we have, as we may say, a full set, {Ai}, of observables,
each with n values (n 2: 3), representable on a Hilbert space 'lI. 'lI will have
n dimensions, and to any orthogonal n-tuple of rays will correspond
the values aI' . . . , an of some observable Ai in the set. (I assume that
there is no degeneracy.) These rays will represent the properties (A/ ,o l)'
(A i ,a 2 ), ••• ,(Ai,a n )·
If PVP held we should be able to assign a value to every observable
simultaneously. That is, of each orthogonal n-tuple of rays, exactly on \
would be given the value 1 ("The system has property (A /,Ok)," say] and th \
'1'111' l'ro/JII'1I1 of Properl iI's 169

others th ' valu(' () 1" '1'111 '/ yH 1I '1lI do '$ not have the prop rties (A /,a !), (A j ,a 2 ),
l'l ."J. The impossibi lity proof in the previous section showed that in a
three-dimensional real pa ce we cannot assign the values 0,1, and 1 consist-
'ntly to each orthogonal triple of rays; trivially, we cannot assign the values
I , 0, and 0, either. The proof extends straightforwardly to complex spaces
within which there are orthogonal triples of vectors, that is, to any space of
dimension three or greater. The crucial condition on assignments, the con-
dition impossible to fulfill, is that, of any mutually orthogonal set of rays
spanning the space, exactly one be assigned the value 1 while the others are
all assigned 0. In this extended proof, we replace talk of angular separation
of points on the sphere by formulations involving the inner product of two
vectors. (Recall that, in 1R3, (xly) = Ixl . Iyl . cosO.)
Alternatively, the (generalized) impossibility proof can be viewed as a
orollary of Gleason's theorem (see Section 5.6). For assume a function f
exists mapping all rays of a Hilbert space 11 onto {O, 1}, which has value 1 for
exactly one ray of each set of mutually orthogonal rays which span 11 . Such
a function would, in Gleason's terminology, be a frame function, and from
his theorem it follows that, provided 11 has dimensionality higher than two,
there exists a density operator D on 11 such that

f(a) = Tr(DP,,)

for each ray a (and associated projection operator P,,).


Now consider the spectral decomposition ofD: D = 2.j bj Pj • We can find a
set of mutually orthogonal rays spanning 11, such that each Pj of this
decomposition projects onto one member of the set. (However, if not all the
coefficients bj are distinct, this set will not be uniquely specified by D; see
Section 1.14.) The function f represented by D will take value 1 for exactly
one member of this set: call this ray i. Then

f(i) = 1 = Tr(DP i) = Tr( L bjPjP)


J

Since PjP i = ° except when i = j, and Pt = Pi' it follows that

1 = Tr(biP i) = biTr(Pi ) = bi

°
But D is a density operator; we have bj ~ for all j, and 2. j bj = 1. Thus D is
the projection operator Pi' and hence, for any ray a in 11 distinct from i,
P" =1= Pi, and so

f(a) = Tr(PiP,,) <1


J70 Till ' JlIll" prl'llIlitlll til 0111111/1111/ I'lt l' l/,Y

(Reca ll from Section 5.4 lh a l 'l'r(I',I',,) ( v" iP,v,, ), wh ere v" is a normali zt,u
vector in a.)
Hence i is the only ray in 7i assigned 1 by f, contra ry to our assumptions. It
follows that no such function exists.
All the proofs given here are open to the following objection. They ma ke
the assumption that a full (and hence nondenumerable) set of observables
exists for the space 7i (as does the version given by Bell; see Bub, 1974, pp.
69-70). Now, as we saw in Section 3.9, while this assumption may be
well-founded for the space (:2 of the spin-t particle, it may not be true in
general in quantum theory. Kochen and Specker's own proof, on the other
hand, makes no such assumption. They show that the required mapping
fails in three-dimensional space for a set of triples involving only 117 points.
As they point out, this avoids the objection that "it is not meaningful to
assume that there are a continuum number of quantum mechanical proposi-
tions" (Kochen and Specker, 1967, p. 70).

6,7 The Bell-Wigner Inequality


In 1964, J. S. Bell dealt another blow to the straightforward statistical inter-
pretation outlined in Section 6.4, by taking the discussion of EPR a step
further; he showed that the assumptions made by Einstein, Podolsky, and
Rosen did not simply show that the quantum-mechanical formalism was
incomplete, but led to results which were actually at odds with quantum-
mechanical predictions. Here is his argument in the form in which it was
la ter presented by Wigner (1970).
Let us assume that, for each particle in a (Bohm-style) EPR experiment,
the values of three arbitrary components of spin are all elements of reality.
Ca ll th ese components S!, 5 L 5:, and 5;, 5 ~, 5 ~. Then we can write the
values of these three components for the pair of particles in the form
(i,j,k;i,m,n), where i, j, k represent the values of S!, SL 5:, and i, m, n those of
5;, SL S~, respectively. Each of i, j, k, i, m, n can have two values (+ or - ),
and these are anticorrelated, so that if j = + then m = -, and so on. Hence
(+,-,+;-,+,-) is a possible assignment of values, whereas (+,-,+;+,-,-) is
not. There are then only eight possible assignments of values which can
have a nonzero probability of occurrence; we can label these assignments
1-8, and write:
p(l) = p(+,+,+;-,-,-) p(5) = p(-,+,+;+,-,-)
p(2) = p(+,+,-;-,-,+) p(6) = p(-,+,- ;+,-,+)
p(3) = p(+,-,+;-,+,-) p(7) = p(-,-,+;+,+,-)
p(4) = p(+,-,-;-,+,+) p(8) = p( , , ; I, I, I)
'1'111' 1',.0/111'111 (I/l'l'IIlll'rl ;1'8 171

Now

p[(S ,~,+);(S L -f )1 1'(3) + p(4)

p[(SL+);(S~,+)] = p(2) + p(6)


p[(S!,+);(S~,+)] = p(2) + p(4)

Since all these probabilities are nonnegative, it follows that,

(I. ,',) p[(S!,+);(S;,+)] s; p[(S!,+);(SL+)] + p[(St,+);(S;,+)]


This relation is known as "the Bell-Wigner inequality."
As we saw in Section 6.3, quantum mechanics gives us a formula for
,'omputing these joint probabilities. We have, from Equation (6.1),

l3ut consider the case when a, b, and c are coplanar, = 120 ac 0


, and the
direction b bisects the angle between a and c. In this case,

-1'21-""-
sm -(ac) =
1'2
-sm 60 0 3
=-
2 2 2 8

And, contrary to what the Bell-Wigner inequality requires,

We see that, although the derivation of the Bell-Wigner inequality given


IIl're starts from the anticorrelations predicted by quantum theory, its con-
clusion conflicts with other predictions that the theory makes. How does
Ihi s ha ppen? ]f we examine the derivation, we find that a cluster of assump-
Iions, largely unacknowledged, does most of the work within it. This cluster
of assumptions, therefore, is responsible for the divergency. The assump-
lions are (1) that the principles PVP, RFP, and FMP characteristic of the
HI.llisli cal interpretation all hold, and (2) that the properties of one system
.lfl· unaff ' ted by measurements conducted on the other. All of these would
111' rong('ninl 10 Finslein, Podols ky, and Rosen ; in fact most are either as-
172 rite [lIltrlm 'llIlio" oj }IIIIIIIIIIII '1'I11'Ot!!

sumed (explicitly or impli ill y) or 'nlailed by lh 'ir argum ·nl. 0 11 ' lively
we can refer to them as the assumplion of " local realism ."
Bell's result gives a surprising turn to discussions of EPR. It wa never
suggested by Einstein, Podolsky, and Rosen that quantum theory was wrong
in its predictions, but rather that it failed to satisfy a particular criterion of
completeness. But it now appears that to accept their conclusion is to mak
certain assumptions which are actually inconsistent with quantum theory.
Thus, if we test the theory's predictions for coupled systems, we are also,
surprisingly, testing a cluster of metaphysical assumptions. For, should the
theory's predictions be confirmed, and the Bell-Wigner inequality be vio-
lated, this would offer a severe challenge to these assumptions; one might
even be tempted to say that they were falsified.
I return to this topic in Chapter 8, but a few preliminary remarks are in
order. Since the difference between what quantum theory predicts and
what the Bell-Wigner inequality demands was first pointed out, a number of
experiments have been performed to see whether the inequality holds (see
Clauser and Shimony, 1978; d'Espagnat, 1979). The results, though not
unanimous, have largely borne out the predictions of quantum theory; we
may take the evidence of those favorable to quantum theory as particularly
significant, since the requirement that certain predictions are precisely real-
ized is more stringent than the requirement that a certain inequality obtains.
The consensus of opinion is that these results have been a remarkable test of
the theory, which it has survived.

6.8 Hidden Variables


The theorems of Bell and of Kochen and Specker make it clear that, if the
quantities (A,~) appearing in the statistical algorithm are indeed properties
of a system, then these properties don't attach to the system in a straightfor-
wardly classical way. However, the two papers in which these theorems
were originally presented addressed a different, though related question
(Bell, 1966; Kochen and Specker, 1967), the question of whether a hidden-
variable reconstruction of quantum mechanics is possible.
A "hidden-variable" theory, as the name implies, postulates that along-
side (or, more graphically, beneath) the measurable quantities dealt with by
the theory (position, momentum, spin, and so on) there are further quanti -
ties inaccessible to measurement, whose values determine the valu e
yielded by individual measurements of the observables. The quantum-
mechanical statistics are to be obtained by "averaging" over the values of
the hidden variables. The inaccessibility of these variables may be a contin -
gent and temporary matter, to be remedied as we develop new experim 'ntal
'1'111' I'/u/d"/II ul""II'It'rl"'.~ 17.1

I'J'O('('dUfl'S, orllw. t ' qu,IIIIHl l'. may b' in prin iplc in. 's iblc (see)ammer,
11)74 , p . 267).
Th 'suggestion th at th ere may be such " hidden variables" is as old as the
probabilistic interpretation of the state vector. It was made by Born (1962b,
p 825) a few months after he first proposed that interpretation: "Anyone
II issa tisfied with these ideas may feel free to assume that there are additional
I'....a meters not yet introduced into the theory which determine the individ-
11.11 event." But almost as old is the denial that such hidden variables can
\' is!. By considering sequences of experiments like the sequence VH, VHV,
.Illd so on described in the Introduction, von Neumann was led to believe
Ih ot the existence of hidden variables would contradict quantum theory.
For, on a natural account of hidden variables, these experiments would act
.. s quantum theory tells us they cannot, that is, as a sequence of filters which
would eventually yield a homogeneous beam; the value of the hidden
v.lriable would be the same for all its members, and it would be incapable of
Iwing split further (see Jammer, 1974, p. 267).
Von Neumann's book, The Mathematical Foundations of Quantum Theory
(1932, chap. 4), contains the first "no-go" theorem for hidden-variabl
Iheories (henceforth "HV theories"). A "no-go" theorem is a theorem to
show that no HV theory which satisfies certain constraints can reprodu ce
Ihe quantum mechanical statistics.
The constraints suggested by von Neumann have since been challenged
.IS overly stringent, and the theorems of Kochen and Specker and of Bell are
now considered much more decisive. Although a survey of HV theories
wou ld take us too far afield, I will indicate the kinds of HV theories which
th ese two theorems disallow. (For a survey see Belinfante, 1973, or Jammer,
1974, chap. 7; Bub, 1974, has a good discussion of certain no-go theorems.)
Kochen and Specker ask whether it is possible to construct a classical
phase space.o, involving hidden variables, which allows a "reconstruction"
of the quantum statistics. Recall from Chapter 2 that, in a classical theory, a
physical quantity A is represented as a real-valued function fA : .0 -IR on
Ihe phase space. Kochen and Specker require that the algebraic relations
obtaining among quantum-mechanical observables are preserved in the
algebra of these real-valued functions on .0.
The relations they consider are just those involving compatible (they
write "commeasurable") operators on the quantum-mechanical Hilbert
space 'H; to use a term we shall meet in Chapter 7, they require that the
partial algebra of Hermitian operators on 'H be embeddable in the set IRQ of
functions from a classical phase space .0 to the reals. It turns out that a
necessary condition for this embedding is that a mapping exists of the rays
of 71 (equival ntly, the projectors onto these rays) onto {O,l} such that, of
/ 74 '/.,,/' /1111 ,,/,,'1'1111 ;111111/ (.)1111/111111' '/'/1/'(11.'/

any mutually orthogonal s ' I of r<lys spanning '1/ , l'X.lClly one ray fe l'ivl.'s
value 1. But, as we saw in Se lions 6,5 and 6,6, lhefl.' are no such mappings ,
Hence no HV theory satisfying their requirements is possible,
To see exactly what kind of HV theory this rules out, we need to examine
the assumptions Kochen and Specker make, I drew attention to these as-
sumptions in Section 6.5, One of them in particular might be questioned,
namely the assumption that a Hermitian operator on a Hilbert space repre-
sents a unique observable, The proof rests on the requirement that, if a, {l,
and yare any three directions in physical space, then of the observables S~,
S'/J, and 5 ~, two must be given value 1 and the third 0, It is then assumed tha t
if we assign to 5;, say, the value 0 when we encounter it as a member of the
triple 5;, 5;, S;-for example, when we measure 5; + 5; + S;-then it
must also be given value 0 when it is viewed as a member of the triple 5;"
5;,,5;, where x' and y' are directions in space different from x and y, lt is
assumed, in other words, that the value to be assigned to 5; is not contextual.
A contextualist HV theory would not require this consistency of assign-
ment to 5; , On such a theory, a Hermitian operator which belonged to more
than one set of mutually compatible operators would not be taken to repre-
sent one single observable, Gudder (1970) has shown that (provided we
restrict ourselves to a single system) we can always, as it were, piece together
HV theories, each dealing with a mutually compatible set of Hermitian
operators, and thus produce a contextual HV theory,
Gudder's theorem shows no more than the mathematical possibility of
producing such a theory, nor did he claim more for it. It gives no physical
grounding for one, and indeed one may think that the move to a contextual
theory has sapped the project of much of its motivation,
This reservation apart, it's important to note the restriction to a single
system, For if Kochen and Specker'S result limits us to contextualist HY
theories, then Bell's theorem limits us to non local ones (as does a result by
Stairs, 1983b, which applies an argument like Kochen and Specker's to
coupled systems), A local theory is one in which the hidden variables de-
scribing spatially separated systems are independent of one another, How-
ever, as soon as we seek an HV theory to deal with composite systems, we
are faced with the correlations typical of EPR-type experiments, By aban -
doning Einstein's assumption that spatially separated systems are indepen-
dent of each other, and appealing to interactions between the systems
concerned, it may be possible to reproduce the quantum-mechanical predic-
tions for these experiments, However, it's not possible to reproduce them by
recourse to a classical probability space, and a forliori not by recourse to a
classical probability space wherein such frequenci s appear as relative fre
quencies of classical states,
'I'ltl' 1'/tI/III' II/ O/I ' IOIII'rl/I '1-I 17[,

Th l' Ih' l! h .. j IllIpll l\lIi ons ex t ' ntling beyo nd th e topi of IIV
"WIII'I' 1l1
I IIt 'mil's, ,1Ild
I disCIIHS th l'Hl' impli ca tio ns in ha pter 8. The conclusion to be
d rolw n (rom it in this section is tha t no local HV theory for quantum me-
\'h.lni S is po sible.
To sum up, any HV theory that reproduces the quantum-mechanical
', l.lIistics must be both contextual and nonlocal.

().9 Tnterpreting Quantum Theory: Statistical States and


Va lue States
II secms that quantum mechanics cannot, via an appeal to hidden variables,
1)(' reformulated as a theory whose underlying phase space is classical.
hi rthermore, a straightforwardly classical interpretation of quantum theory
Its 'If is ruled out. Where, then, are we to look for another? Come to that,
,lrmed with thimbles and care, what exactly are we seeking? To obtain a
Il lOre precise idea of what is involved in interpreting a theory, let us return to
01 suggestion made in Section 2.8, that to interpret quantum mechanics is to

st'c what kind of world is representable within the class of models the theory
(·mploys.
Recall that, on the semantic view of theories, a scientific theory provides a
representation, or model, of a certain domain. Thus geometrical optics pro-
vides a geometrical representation of the transmission, reflection, and re-
(raction of light, the Bohr theory of the atom a model of atomic structure.
Sometimes these models have a physical representation, sometimes they
are wholly abstract mathematical structures, but in both cases they supply
representations of the phenomena, or, as in the case of the Bohr model, of
(he structures postulated as underlying the phenomena. The Hilbert spaces
o( quantum theory are, obviously, of the second, abstract kind.
We interpret the theory by recognizing, in the models the theory provides,
' Iements of a particular conceptual scheme. For example, in the Hamilton-
Jacobi theory of classical mechanics for a single particle, the element w of
th e phase space is interpreted as an encapsulated summary of the pri-
mary qualities of the particle, and the mathematical expression - \lH(w)
1= (- aH/ax) - (aH/ay) - (aH/az), where H is the Hamiltonian function
(or the system] is interpreted as the forc e acting on the particle, such forces
being the effici ent causes responsible for the processes the theory describes.
Thus the theory is interpreted within a particular categorial framework. I
borrow the phrase from Komer (1969, pp. 192-210); a categorial frame-
work is a set of fundamental meta physical assumptions about what sorts of
e ntities and what sorts of processes lie within the theory's domain. The loci
classici (or the a rticul a tion of the ca tegorial fra mework of classical me-
176 Tht' 11111'IIIrl'lll iioll oj (..)111111111111 TItI'IIIY

chanics are Kant's M etaphysiclIl /'lIl/lIIllIliOIl S of Nalural Sciell ce and his ri


tique of Pure Reason. This catcgorial framework was well established prior Lo
the appearance of the Hamilton-Jacobi theory; correspondingly, the task of
interpreting the theory was that of looking for familiar sorts of things. If the
fit between the categorial framework and the models that the Hamilton-
Jacobi theory provided had been less than perfect-if, for example, there
had been nothing in the model to correspond to the concept of a primary
quality (or objective property), or if what was identified as an efficient cause
had allowed a multiplicity of effects (or of what were identified as effects)
-then the Hamilton-Jacobi theory would not have been classical me-
chanics.*
However, in the case of quantum mechanics, a very different situation
obtains. The theory uses the mathematical models provided by Hilbert
spaces, but it's not clear what categorial elements we can hope to find
represented within them, nor, when we find them, to what extent the
quiddities of these representations will impel us to modify the categorial
framework within which these elements are organized. To interpret the
theory is to articulate the categorial framework whose elements have their
images within it; we obtain an interpretation by the dialectical process of
bringing to the theory a conceptual scheme, and then seeing how this
conceptual scheme needs to be adjusted to fit it. Because there are several
solutions to this problem, there can be competing interpretations of the
same theory. (Compare Holdsworth and Hooker, 1983, who talk of one
"quantum mechanics" but several "quantum theories.")
The concept of a property can serve to illustrate this rather abstract discus-
sion . Does quantum mechanics allow us to say that a system "has proper-
Lies"? Certainly we can find represented in Hilbert space values of physical
quantities: the subspace L~ (equivalently the projector P~) represents the
value a of the observable A. But if these subspaces are to be interpreted as
properties, then, in addition to the now familiar state represented by a
density operator (and called variously the statistical state [Kochen, 1978] or
the dynamical state [van Fraassen, 1981b]), a value-state A (alternatively, a
micro-state [Hardegree, 1980]) must be attributed to the system. Regardless
of whether the statistical state is thought of as applying to individual sys-
tems or to ensembles of systems, the value-state must be thought of as
applying to individual systems. The value-state will be purely descriptive;
whereas the statistical state assigns a probability to each pair (A, a) (regarded
as an experimental question), the value-state will specify at any juncture

• " Classical mechanics" is here identified with a class (T, ,I,), (T2,12)' . . . of theories a nd
interpretations.
wlllr h o( tlW tl~· 1',111 1'1I 11 1)(' Il 'g,mkd as the sy~Hc m 'll prop 'rtie . A value-
11,,11' will Ihus I11Jp p •.in (1\,1/) (Into I or 0, depending on whether the system
POW-ll'IlSeS the prop 'rty in question or not, and so will resemble a classical
rl l II II'.
Two remarks need to be made about this value-state. In the first place, the
,1\lribution of properties it provides is over and above the work done by the
Ihl'ory simpliciter. We use it to yield an interpretation of the theory which
,1('nlmmoda tes the notion of the properties of a system, but another altema-
IIVl' is always open to us, that of finding a categorialframeworkin which the
!lollon does not appear. Second, even if we hang on to properties, the
concomi tant value-states cannot be just like their classical counterparts. For
Kochen and Specker's theorem tells us that, for most quantum systems,
IIwl" an be no function...1. mapping all pairs (A,a) onto 1 or 0 in accordance
with PV P - in other words, so that for each A there is exactly one value a for
whi ' h A(A,a) = 1. Any workable account of a value-state must therefore be
llIodified away from adherence to PVP. Different modifications will yield
dlf(erent interpretations of quantum theory .
. A number of these interpretations can best be explicated using the vocab-
tll.lry of "quantum logic"; partly for that reason the next chapter is devoted
It) that topic.
7
Quantum Logic

Various enterprises are subsumed under the heading quantum logic. Two
useful introductions to the topic, Mittelstaedt (1981) and van Fraassen
(1981a), appear in the same volume; more extended acconnts are given by
Beltrametti and Cassinelli (1981) and Holdsworth and Hooker (1983).
Common to all quantum-logical enterprises is the aim of giving, or utilizing,
an algebraic account of quantum theory.
In Sections 7.2-7.4 I define the algebraic structures that quantum logic
makes use of (Boolean algebras, partial Boolean algebras, and orthomodular
lattices), and show how these structures can be found embedded within
Hilbert spaces. In Section 7.5 I look at the work of a group of writers
(Mackey, Maczinski, Finkelstein, Jauch, and Piron) who have sought to
recapture the Hilbert-space formalism of quantum theory by looking at the
algebraic constraints to which the event structure of any theory must con-
form. Finally, in Sections 7.6-7.9 I show how quantum logic can bethought
of as a logic, in the sense in which that word is used when we speak of
"deductive logic," and I discuss whether a "quantum-logical" interpreta-
tion of quantum mechanics will allow us to salvage the notion of a property
of a system.
To illustrate how all these enterprises hang together, I start by examining
a very simple classical system, showing, first, how the algebraic structure of
a field of sets can coincide with the structure of the set of properties the
system can possess and, second, how this structure can also be viewed as a
logical structure.

7.1 The Algebra of Properties of a Simple Classical System


Consider a simple classical "system" consisting of a box with a tran spa rent
lid; the box contains a penny and a quart 'r ;lnd i ~ large enough for thl' c()in ~
to rattle around inside it. At any jun tun', (',\eh of Ilw coins ca n b' {'illwr
(.)1/1/1//1/11/ /.oX;1' 119

itl',lLis up or t.lil s 1IJ1. WI' n'prt'/H.:nt this on a two-dimensional classical


ph,lSC spa ce. (1Iuglll's, 19H I, presents this space in living color.) Using
sl.tndard Ca rtesian oo rdinates, let the set P of points such that y ~ 0 repre-
Sl'nt the experimental question (penny, heads-up), and the set P such that
,II "/ 0 represent (penny, tails-up). Similarly, let the set Q of points such that
\ ' 0 represent (quarter, heads-up), while the set Q such that x < 0 repre-
Sl' nts (quarter, tails-up): see Figure 7.1. Since the system is classical, these
l'xperimental questions are also possible properties of the system.
The state of the system is represented by a point w in the phase space; w
specifies which face of each coin is uppermost. For example, if w lies in the
IIpper left segment of the phase space, P n Q, then the penny is heads-up
,1 nd the quarter is tails-up. The phase space is classical, not (obviously) in the

Sl'nse that it involves position or momentum coordinates, but because ex-


Iwrimental questions are represented by subsets of the phase space. Note
Ihat not all questions are maximally specific; the question P U Q, for exam-
pI " receives the answer yes when the system is in any configuration except
(penny, tails-up; quarter, tails-up).
We ca n represent relations between various subsets of this phase space by
drawing a network, in which each node represents a subset. Part of this
network is shown in Figure 7.2; below the nodes representing P a nd Q is the
nod e representing P n Q, and above them is the node representing P U Q.
We now embed this in a diagram which displays all possible subsets of the
space obtainable by union and intersection from P, Q, P, and Q (Figure 7.3).

p P (penny, heads-up)
P P (penny, tails-up)

00
180 TIlt' IlIlerl"'l'Il/lio/l of )/11/1111/111 Tllt '() IY

Figure 7.2

At the top of the diagram is the whole space, and at the bottom is the empty
set $25.
If any point on the diagram can be reached from another by traveling
upwards along the lines of the diagram, then the subset represented by the
higher node properly contains the subset represented by the lower. Thus a
line running upwards between two points (possibly passing through others
en route) represents the relation of set inclusion.

Figure 7.3
(.)III111/l/l/Il.ox;r 18/

Figure 7.4

Each of these subsets, and hence each node, represents a possible prop-
('fly of the system and so the diagram also displays the relations between
Ilwse properties. Now, associated with each property is a sentence express-
i Ilg lhe fact that the system has the property in question. In fact, the sentence
{lJ ( P (where w is the state of the system) expresses the fact that the penny
is heads-up.
Lel p be synonymous with wE P, and q with wE Q. Then, using the
sl,)ndard logical connectives & and v for "and" and "or," we can write p & q
(or W P n Q, and p v q for wE. P U Q.* Clearly, to each node on the
diagram we can attach the corresponding sentence: to the nodes represent-
ing P and Q we attach p and q, respectively, and to the nodes representing P
.1Ilt! Q we attach - p and - q, where - is to be read as, "It is not the case
lhal .. ." Let LC be the set of sentences which can be formed from p and q
hy lIsing the connectives &, v, an~-. These three connectives mimic the
St'l lheoretic operations n, U, and ,as Figure 7.4 shows .

• Througho ul Ihis ('haplcr senlences and scnll'n . s hemata of the logical language will not
I,.. marked off by tl,mlolil'" marks or quos i quotalion marks . Quelle horreur.
"182 Till' /II/t'rflfl'/fl/illll IIJ (.,JIIIIII/IIII/ '1'''/'/111/

The lowest node on this diagrnm repn:scn ts p & p, which is { Iwnys


false, while the highest point represents the sentence p v - p, whi h is
always true. The lines on the diagram represent the relation of entailment
between sentences. Thus, for example, p & q entails p. (We write p & q I- p.)
Notice that more than one sentence gets attached to a given node. To the
lowest node, for example, we attach not only p & -p, but also q & - q,
(p & q) & (- p & - q), and so on. Thus, strictly, each node represents a class
of sentences, each of which is logically equivalent to all the others in the
class. We may say that each node represents a proposition.
Thus the same diagram can show (1) the set-theoretic relations among the
members of a family of sets, (2) the conceptual relations between the mem-
bers of a set of properties of a system, and (3) the logical relations holding
between the propositions in a certain set. These sets are isomorphic one to
another; they all share a common structure. Our next move is to give an
abstract characterization of that structure, of the kind discussed in Section
1.8.

7.2 Boolean Algebras


The structure shown in Figure 7.3 is an example of a Boolean algebra.

(7.1) We say that:B is a Boolean algebra if:B = (B,V,/\,.1,O,l), where B is a


°
set containing at least two elements, and 1 are designated elements
of B, V and /\ are binary operations and.1 a singulary operation on B,
sa tisfying the identities, for all a, b, c in B,

(7. / a) aVb=bVa a/\b=b/\a


(7.1b) a V (b V c) = (a V b) V c a /\ (b /\ c) = (a /\ b) /\ c
(7.1c) a V (a /\ b) =a a /\ (a V b) =a
(7.1d) a /\ (b V c) = (a /\ b) V (a /\ c) a V (b /\ c) = (a V b) /\ (a V c)
(7. Ie) a V (b /\ b.1) =a a /\ (b V b.1) =a

This axiomatization is due to Sikorsky (1964). At the cost of some redun-


dancy, it displays neatly the symmetry between V and /\; we say that the
axioms on the right are the duals of those on the left (and vice versa). Clauses
(7.1a) and (7.1b) say that both V and /\ are commutative and associative;
(7.1c) is known as an "absorption" axiom; (7.1d) tells us that /\ is distribu -
tive over V and conversely, and (7.1e) gives us the properties of the com
plementation operation.1 . The operations V and /\ arc known, respectively,
as "join" and "meet."
QIIIIIIIIIIII I.o,llic IHJ

From thl'Sl' ,I iOIlI'l II follows that, for all a and b in 8,

( / }) aV a /I a/\a=a
(/ .1)

In view of (7.3), there are elements of B, namely a Val. and a /\ a1., which,
,"though they are obtained from a single element a by Boolean operations,
do not depend on the choice of a. These are the designated elements 1 and 0
n'spcctively. We have then, by definition,

( / ,4)

We a lso find that, for all a and b in B,

( / !I) (a1.)1. = a

(a /\ b)1. = a1. V b1.

The id entities (7.6) are known as "De Morgan's laws."


An important, though elementary, Boolean algebra 22 ha s jllst two \'h'
il)l'nts, 0 and 1, as the subscript suggests. In all Boolean algebr.ls, .Ind h\ ' II I"I'
111 Z2'

( /.1)
OV1=1=lVO 0/\1=0=1/\0
OVO=O=O/\O
1V1=1=1/\1

Tlwsc equations completely characterize the Boolean operations on 2 2 ,


Any Boolean algebra :B can be homomorphically mapped onto 22 (Bell
,IIH1 lomson, 1969); that is, there are mappings from :B onto 22 which, as
Wl' say, " preserve the operations'~ V, /\, and 1. . Formally:

( / H) for any Boolean algebra :B = (B,V,/\/,O,l), there exist function s


II:B -. {O,I} such that for all a and b in B,

h(a V b) = h(a) V h(b) h(a /\ b) = h(a) /\ h(b)

TIlt' opl'rations V, /\, a nd I on th • right -hand sid s of these equati on ' ar'
0 11l'r.1 t ions on Z2 '
184 Tilt' 11I11' rIJrl'llIlic)H of )/1111/1111/1 'I'III'My

The importance of this mapping fur lassi a l logic is lear. o n::;idcr Lh >
Boolean algebra ~16pictured in Figure 7.4; Lhis is the algebra of the set n 16 of
propositions expressible using just two atomic sentences, p and q, together
with the usual connectives. If we think of the two elements, 1 and 0, of Z2 as
true and false, respectively, then each of the homomorphisms of ~16 onto Z2
offers a systematic way of assigning truth-values to the propositions of n 16
(or, more precisely, to the sentences expressing them). On these assign-
ments, as we shall see, the connectives &, v, and - are the familiar truth-
functional connectives given by truth tables in any introductory logic text
(such as Kleene, 1967, p. 9). In the case of ~16' there are just four such
homomorphisms, each corresponding to a possible assignment of truth -
values to p and q.
Each homomorphism is associated with one of the four atoms of ~16' that
is, with one of the points immediately above 0 in the diagram. Each homo-
morphism maps just one of these atoms onto 1, together with all the points
lying above that atom. The set of these elements is said to form an ultrafilter
on ~16. Figure 7.5 shows the elements of ~16 which are mapped onto 1 by
the homomorphism associated with the atom a. The remainder are mapped
onto O.

o
Figure 7.5 A typica l ultrafill cr o n 00'0 .
(JIIIIIIIIIIII tONic 185

The gen ''',lIiZ.ltll''l of Ihls t' ample to any atomic Boolean algebra is
rl lr,l ightforwarJ , bul SOIIll' preliminary work is required.
In the discussion of '13 16 , I have talked of the atoms as points "immedi-
.tldy above" O. We need an algebraic specification of that relation. Note first
IIt .. t, for all a and b in B,

(I tJ) b = a V b if and only if a = a 1\ b

Wt' use this biconditional to define a relation R on B: we say that

(1 111) aRb if and only if b = a V b (if and only if a = b 1\ a)

The relation R is reflexive, transitive, and antisymmetric. That is, for all a
.Int! b in B,

( I 1111) aRa
(I 1111) aRb and bRc together imply aRc
(I I I c) aRb and bRa together imply a = b

SlI ch a relation is known as a partial ordering. We write a =5 b when aRb; =5 is


I It ' relation represented by the lines of the Hasse diagram, as it is called, of
,/i ll, in Figure 7.3.

(I 12) We say that a is an atom of 13 if a "* 0 and, for all b in B, b =5 a implies


b = 0 or b = a.

Whi le all finite Boolean algebras are atomic (that is, they contain atoms),
/,nlne infinite ones do not. I will restrict present discussion to atomic Boolean
.lIg 'bras, although, in fact, (7.13) and (7.14) below are perfectly general
resu Its.
I\n ultrafilter U on an atomic Boolean algebra 13 is a set of elements of B
n> nl aining just one atom a and all points b such that a =5 b. We find that, if U
IS .111 ultrafilter on a Boolean algebra 13, then, for all a and b in B,

(I l.lll) a V b C U if and only if either a E U or b E U (or both);


(1 1.1/1) (1 /\ b U if and only if both a E U and b E U;
( / I.Ie) (II U if and on ly if a 6; U .
186 'J'/II' 111lt'II"'t'/lIlioll II/ )///111111111 '/'IIt'OIY

There is a one-to-one corresponde nce betwee n the s<.'l of ullrafilters Oil '/3
and the set of homomorphisms of '/3 onto Z2 su h that, if U is an ultrnfilleron
13 and hu is the corresponding homomorphism, then, for all a in B,

(7.14) hu(a) = 1 if and only if aE U

From (7.13) and (7.14) we can see why, if we have a Boolean algebra of
propositions, the connectives of the language expressing them behave
tru th -functionally.
The definition of a Boolean algebra given by (7.1) is purely structural, and
so the theorems (7.2)-(7.12) are completely general; no interpretation of V,
A, and 1- is assumed, nor are there restrictions on what B may contain. To
emphasize the general nature of a Boolean algebra, and to provide an
example which will be useful in the next section, let us look at an interpreta-
tion of 13 16 very different from those we have considered.
Let .A be the algebra (A,LCM,HCF,COMP,I,210), such that A contains
the sixteen numbers 1, 2, 3, 5, 7,6, 10, 14, 15, 21, 35, 30, 42, 70, 105, 210,
while the two binary operations on A yield the lowest common multiple
(LCM) and the highest common factor (HCF) of any two numbers in A, and
COMP(a) = 210/a, for all a in A. This algebra is isomorphic to 13 16 , that is,
we can attach each number to a node on Figure 7.3; the maximum element
(1) of this algebra is 210, the minimum element (0) is 1, and the atoms are the
primes 2, 3, 5, and 7.
Noneth eless, among all the possible realizations of Boolean algebras, one
Iype o f reali zation has a privileged status: we know from a representation
Ilwor ' m du e to M . H. Stone that every Boolean algebra is isomorphic to a
fi eld of sets (Bell and Slomson, 1969).
The significance of this theorem for our present purposes is this. The
presentation in Section 7.1 may suggest that, because the propositions of a
classical theory are represented by subsets of a phase space, their algebraic
structure, or logic, is Boolean; however, it is more accurate to say that,
because their logic is Boolean, they can be represented by the subsets of a
phase space.

7.3 Posets and Lattices


Quantum logic deals with a wider class of structures than that of Boolean
algebras. Accordingly, in this section we look at the effect of applying
successive constraints to a very basic sort of structure, a partially ordered sel,
or poset, The effect of these constraints is shown in Figure 7.6, which shows
Hasse diagrams of structures which get eliminated at ea h ste p.
() II 1111 I II 11/ I.0Xi,. IH I

A blf 0 bOa"
a b-'-
0

I~
B
E
0 0

C ~a
c ~b
F
c-'-

0 a c

I ',s"re 7.6 Some finite posets and lattices. A is a poset with no maximum element: (7.18)
f.lll s. (7 .18) holds for B, but B is not complemented: (7.19) fails . (7.19) holds for C, but C is
lIul orlhocomplemented: (7.20) fails. (The arrows show how complementation works.)
(7.20) holds for D, but D is not orthomodular: (7.22) fails . E is a poset with maximum and
minimum elements, but it is not a lattice. (Nor is A.) F is an orthocomplemented distribu-
IIV<' la ttice; it is a Boolean algebra. Compare Figure 7.3 and Figure 7.8 .

1/ I .';) .A. = (A,:::;) is said to be a partially ordered set (poset) if A is a nonempty


se t and:::; is a reflexive, transitive, and antisymmetric relation on A
[see (7.11)].

We do not require that, for all a and b in A, either a:::; b or b :::; a. (A set for
which this holds is said to be totally ordered by:::;.) In the rest of this section,
.// is taken to be the poset (A,:::;).
If a a nd b are elements of A, then there may exist an element c such that

I I. 16n) a :::; c and b :::; c;

(7. 16/J) if n :::; d and b :::; d, then c :::; d.

I':lcment c is then known as the supremum of {a,b}: c = sup{a,b} =


1/ V /I. Likewise an clement e may exist such that
188 TI,l' illll' rlJrl' /lIlillll of (JIIIIIII II III '('/11'111'.'1

(7.17a) e :5 a and e :5 b;

(7.17b) if f:5 a and f :5 b, then f e.

In this case e is the infimum of {a,b} : e = inf{a,b) = a 1\ b.


The supremum of {a,b} is also known as the least upper bound of {a,b}, and
the infimum of {a,b} as the greatest lower bound of {a,b}. As intuitive exam-
ples of these bounds, think of LCM and HCF in the Boolean algebra of
numbers given in the previous section (in which a :5 b provided that a is a
factor of b). Note, however, that though we use the symbols 1\ and V for
sup and inf, we cannot (yet) identify them with the binary operations on a
Boolean algebra.
A poset may have a maximum element, I, or a minimum element, 0, or
both, such that, for all a in A,

(7.18) a:51

A poset is said to be complemented if it has a maximum and a minimum


element and if, for all a in A, there exists an element a.L in A such that

(7.19) a 1\ a.L = 0

These equations should be read, "Sup{a,a.L} exists and is equal to I," and
" lnf{a,a.L} exists and is equal to 0."
.A is said to be orthocomplemented if it is complemented and, for all a in A,

(7.20a) (a.L).L = a;

(7.20b) a :5 b implies b.L :5 a.L.

For an orthocomplemented poset, De Morgan's laws hold for sup and inf
wherever they are defined; see (7.6). Notice that, if .A is orthocomple-
mented, then sup{a,b} is defined if and only if inf{a,b} is.
We can define a relation of orthogonality on an orthocomplemented poset
by the following condition:

(7.21) a 1- b if and only if a:5 b.L

(7.20) guarantees that this relation is symmetric-in other words, that a 1- b


implies b 1- a.
An important constraint, which will get more attention in the next section,
(1/111111/1111 1.1}8il' 189

1/ Ih.ll of llr'thOlIlOthd.lIll y . Thl' following is known :-I S th e ortil olllodu /ar


Itll'lrl; l y.

( I Jl) a b implies b = a V (b 1\ a.L)

To define an orthomodular poset in a way applicable to infinite posets we


Iw('d also the notion of orthocompleteness. We first extend the definitions of
i'l llprcmum and infimum in an obvious way to countably infinite sets {ail of
\'Il'ments of A. Then,

(I J.1) .A is said to be orthocomplete if it is orthocomplemented and every


pairwise orthogonal countable subset of A has a supremum.
( / 11) .A is said to be an orthomodular poset if .A is orthocomplete and the
orthomodular identity holds.

It may be that sup{a,b} and inf{a,b} are defined for all pairs, {a,b}, of
. \'It-ments of A. In that case, .A is said to be a lattice. We can now regard Vand
1\ as binary operations onA and refer to them (unsurprisingly) as " join" and
" ITIeet."
Notice that the lattice condition is independent of conditions (7.18) -
(7.20) and (7.23)-(7.24). We can apply these constramts to the class of
lattices to obtain, successively, lattices with maximum and minimum ele-
ments, complemented lattices, orthocomplemented lattices, orthocomplete
I.lltices, and orthomodular lattices.
It is easy to show that, for all lattices, clauses (7.la-c) of the definition of a
Boolean algebra hold (that is, commutativity, associativity, and absorption),
,1S do (7.2) (idempotence) and (7.10), which now appears as a theorem
rather than a definition. (7.le) and (7.4) hold for complemented, and (7.5)-
(7.6) for orthocomplemented lattices.
A lattice for which (7.ld) holds is known as a distributive lattice. An
orthocomplemented distributive lattice is a Boolean algebra.
Now let .A be a complemented distributive lattice. Take a, bE A such that
(I b. Then

b = b 1\ (a V a.L) [(7.le)]
= (b 1\ a) V (b 1\ a.L) [(7.ld)]
= a V (b 1\ a.L) [(7.10)]

Thus th modular identity is a special case of distributivity. Hence all ortho-


complete distributive lall i es art' orth olllod ular. Il ow 'ver, th e onv 'rst' is
not true: in the next section I des ribe a lalti e (foigurc 7.8) th at is orlh omod
ular but not distributive.

7.4 The Structure of S('Ji)


We now have a vocabulary in which to give an algebraic account of the set of
experimental questions in quantum mechanics; in conformity with standard
usage, I shall call this set the set of quantum events, or just events. Since each
quantum event is representable by a (closed) subspace of a Hilbert space,
quantum logic involves giving an algebraic characterization of the set 5(7£)
of these subspaces.
5(7£) forms a lattice .£(7£); it is partially ordered by inclusion, and for any
pair of subspaces, Land M, there is a greatest subspace which is common to
both and a least subspace which contains them both. We may define meet
and join on .£(7£) by:

(7. 25a) L /\ M = L n M
(7. 25b) LV M = n{N: N E 5(7£) and L ~ N, M ~ N}

Notice that the latter is not the union of two subspaces, but their span. The
union of two rays, for example, contains just the vectors in the two rays;

Figure 7.7 Some subspaces of 1R3.


Ulltll/IIIIII Logic 191

Figure 7.B The lattice G 12 •

~ ince it does not contain all linear superpositions of these vectors, it is not a
11I.bspace. The span of two rays is the plane containing them both and it is
this which, in lattice-theoretic terms, is the join of the two rays .
'/I is the maximum element and {O} the minimum element of £.(71). The
closure (see Section 1.16) of the set of vectors orthogonal to L forms a
/i ll bspace L..L , which is the orthocomplement of L, obeying (7.19) and (7.20).

Thus the set of subspaces of '}f forms an orthocomplemented lattice. It is


Ilot distributive, however, as we can see by considering a selection of spaces
of 1~3 . Consider the subspaces shown in Figure 7.7. These are the subspaces
generated by two triples of orthogonal vectors, {x,y,z} and {u,v,z}. We use
,In obvious notation: Lx is the ray spanned by x, Lxy the plane spanned by X
,lnd y, and so on. Note that four of the vectors, x, y, u, and v, lie in one plane;
thus Lxy = Luv = Lxu = Lyv, and so on.
The lattice G12 of these subspaces (named after Greechie; see Beltrametti
.1I1d Cassinelli, 1981, p. 102) is shown in Figure 7.8. In this diagram, each
one-dimensional subspace is shown immediately below its orthocomple-
ment.
Now consider the subspace Lx 1\ (Lu V Lv), Since Lu V Lv = Luv = Lxy , we
have Lx 1\ (Lu V Lv) = Lx 1\ Lxy = Lx. On the other hand, since Lx 1\ Lu = {O},
and Lx 1\ Lv = {O}, we have (Lx 1\ Lu) V (Lx 1\ Lv) = {O} V {O} = (O}.1t follows
th at,

and so G 12 is not distributive, and, a fortiori, neither is .L(II~P). However, the


orlhomodular identity (7.22) holds of .L(1R3), and, indeed, of the lattice
.1 ('/I) o f subspa cs of any Hilbert space 7f. Such lattices are orthomodular.
"1 92 '/'///' !lIlcr/lre/III;OI/O! )/1111/111111 '/'/1/'/"'.11

Our first characterization of ('/I), then, is that it ha the stru cture of an


orthomodular lattice.
I noted in Section 7.3 that all distributive lattices are orthomodular, and
it's also true that within any orthomodular lattice we can find sublattices
which are distributive. In particular, the set of subspaces which can be
generated from any set of mutually orthogonal rays spanning a Hilbert
space 71 by join, meet, and (ortho)complementation forms a distributive
sublattice of .L(7i).
G12 , for example, contains two distributive sublattices of eight elements,
each isomorphic to the lattice shown in Figure 7.6(F); one is generated by Lx,
Ly , and Lv and the other by Lu, Lv, and Lz . These two sublattices are, so to
speak, "pasted together" (the term is Bub's) at the points {O}, 1R3, Lz , and L~.
Each of them is a complemented, distributive lattice-in other words, a
Boolean algebra-the elements of which are mutually compatible sub-
spaces of 1R3 .
This gives an alternative way to characterize algebraically the structure of
the set of subspaces of a Hilbert space. Rather than describ~ it as an ortho-
modular lattice, we may describe it as a partial Boolean algebra (PBA). (The
definition given here is equivalent to that in Kochen and Specker, 1965; for a
review of work on PBAs, see Hughes, 1985a.)
Consider an indexed family 13 = {13 j : i E I} of Boolean algebras: 13 j =
(Bj,V j, 1\/;,0;,1;). (I is a set, possibly infinite, of convenient indices.)
13 is said to be a Boolean manifold (Hardegree and Frazer, 1981) if

(7. 26a) if ;,j 1, then there is a k E I such that B; n Bj = Bk ;

(7. 26b) for all ;,j E 1,0; = OJ and 1; = I j ;


(7. 26c) if a,b E B; n Bj , then

a V; b = a Vj b

13 is said to be a partial Boolean algebra if 13 is a Boolean manifold and,

(7. 27) for all a,b,c E U{B;}, if there are i,j,k E I such that

a,b E B;

then there is an m E I such that a,b,c E Bm.

Let 13 be a partial Boolean algebra. We define partial operations, V and /\,


on 13 by:
(..Jllt,,'IIIIIII .IIj,(if 19.J

(I H) Ie, for I'lOI1W , ( I , tl , /1 ( Bt,lh n

a 1\ b = a 1\; b

/\ 'omplementation operation is defined on :B by:

/\ partial Boolean algebra is thus a set of Boolean algebras pasted together


ill a consistent way, so that, where two or more Boolean algebras overlap,
I h('i r operations agree with each other. This consistency is assured by (7.26).
The condi tion (7.27) is sometimes called the coherence condition (Hardegree
.1Ild Frazer, 1981, p. 57).
The set 5(7£) of subsets of a Hilbert space constitutes a partial Boolean
.llgcbra :B(7£), within which each maximal Boolean algebra :B; is generated
hy a set of mutually orthogonal rays spanning 7£.
We have, it seems, two ways to characterize 5(7£), as an orthomodular
I.lltice and as a PBA. What exactly is the relation between these two struc-
lures? And, further, does either of them fully characterize 5(7£)?
The first question has been answered by two theorems due to Finch and
(; l1dder. It turns out that any orthomodular poset which satisfies a coher-
('nee condition is a PBA(Finch, 1969) and, conversely, that any PBA satisfy-
ing a transitivity condition is an orthomodular poset (Gudder, 1972). (The
co herence condition for posets is given by (7.31) below; a PBA is transitive if
{( ~ band b ::5 C together imply a ::5 c, for all a, b, and c in the algebra. For an
\·xa mple of an intransitive PBA, see Hughes, 1985b, p . 444, n. 11.) The class
(If oherent orthomodular posets thus coincides with the class of transitive

I'I3As. The difference between the lattice and the PBA is this: whereas the
I,lltice operations V and 1\ are defined for all pairs of points on the lattice,
the operations V and 1\ on a PBA are partial operations, defined only for
pairs of points, both of which are in the same Boolean subalgebra of the
I'BA . We call such points compatible, noting that in the PBA :B(7£) two
points are compatible in this sense if and only if the subspaces they corre-
spond to are compatible in the sense of (3.8). But now notice that (3.8),
suitably rewritten, gives a purely algebraic definition of compatibility:

(/ . .30) If a and b are elements of a poset, we say that a is compatible with b


(a$b), if there are mutually orthogonal elements ao, bo, and c in the
poset such that a = ao V c and b = bo V c.

ITh e orthogonality relation here is, of course, the algebraic relation defined
by (7.2 1).1This d ' finiti on allows the coherence condition mentioned above
10 bl' simply I'l l a l eti :
194 'J'ill' III/ I' l/i/'(' /(I/i oll 0/ ('> 111111111111 'J'i/ (,O IY

(7.31) An orthomodula r po ·l' t ./l is s.1iJ to be co her elll if, fo r a ll 0, b, a nd c in


A, a$b, b$c, and c$a together imply (0 V b)$c.

This condition on posets does the work of (7.27) (Hardegree and Frazer,
1981).
It turns out that the feature we noted in the case G 12 is perfectly general:
every maximal set of mutually compatible elements of a coherent ortho-
modular lattice is a Boolean algebra. Thus, to obtain a PBA from a coherent
orthomodular lattice L, we just define partial operations on L which are the
restrictions of lattice join and meet to pairs of compatible elements within L .
Conversely, there is a natural ordering definable on the transitive PBA
:B(7£), and there are unique extensions of the partial operations on :B(7£) to
meet and join with respect to that ordering; the resulting structure is a
coherent orthomodular lattice.
The second question remains open. It is not known whether there is a
purely algebraic way to specify those partial algebras (or those orthomodu-
lar lattices) which are isomorphic to 5(7£). The sorts of considerations at
work in Chapter 4 suggest that the most promising approach would be to
consider PBAs on which groups of transformations were definable which
reproduced the symmetry groups within Hilbert spaces. These transforma-
tions would map one Boolean subalgebra of the PBA onto another; recall
that a selection of subspaces 1R3 giving rise to Gl2 was obtained by taking one
orthogonal triple in 1R3 and rotating it about the z-axis to yield another. (See
Gudder, 1973, for work along these lines; see also Holdsworth and Hooker,
1983, pp. 135 - 136, for further references.)

7.5 The Algebra of Events


To the extent that the structure of a Hilbert space can be given algebraically,
an algebraic reconstruction of quantum mechanics is possible. The question
arises, what is gained by such a reformulation? One attractive possibility is
that we can thereby achieve more insight into the way in which the structure
of quantum mechanics relates, on the one hand, to that of predecessor
theories like classical mechanics, and, on the other, to that of possible
successor theories. But, from where we stand now, can anything useful be
said about the structure of as yet unformulated theories?
One approach to this project is to consider a linked pair of problems. First,
are there a priori algebraic constraints which the set of events dealt with by
any physical theory must satisfy? Second, what furth er constraints, peculia r
to individual theories, lead us to the Boolean algebra of events ch a racteristi
of classical mechanics, or to the non -Boolea n stru ctur of S('Jf ) we find in
quantum theory?
01111111111111.0Xic 195

Th ese problvills ,In' :HlllIl.lr to th ose broached in Chapter 3. However, that


-hapter did not se t Ollt to dedu e the algebraic structure of the set of experi-
mental questions (events) of a theory from an analysis of what constitutes
an experimental procedure. Rather, it addressed the question of whether the
algebra of events could always be embedded into a Hilbert space, and
sought the differences between classical and quantum theory in the extent
to which each utilized the machinery that Hilbert-space models made avail-
able. In other words, it started with the algebraic models with which the
present project, if successful, would conclude. It displayed the structure of
Hilbert space and looked at its suitability for representing a physical theory;
it did not deduce that structure from pretheoretical considerations.
My aim is to produce a formal specification of the algebra {; of events of a
theory; however, I will preface this with some discussion of the operational
procedures the algebra is to model and of the problems the approach en-
counters.
As in Chapter 3, I start with a schematic account of a preparation-mea-
surement procedure. (For a very careful account of a a -algebra of events
along similar lines, see Stein, 1972, pp. 374-378.) Let us divide measure
ments into two kinds. Those of the first kind yield results on the continuum
of real numbers, or within a small range of the reals. Typically we write" i =
2.21 ± 0.02 A" as a measurement of current. The ranges involved may
overlap: 2.21 ± 0.02 A overlaps with 2.20 ± 0.02 A. Experiments of the
second kind yield mutually exclusive outcomes, as when the spin compo-
nent of a fermion is measured as being either up or down . In each case the
set of possible outcomes is exhaustive. We take as the elements of the
algebra we are constructing, not outcomes, but events: as in Section 3.2, an
event is a set (possibly empty) of outcomes associated with one specific
measurement device.
The set {;A of events associated with a specific measurement A then forms
a field of sets, that is, a Boolean algebra i3A , whose operations are (as usual)
union, intersection, and complementation and whose maximum and mini-
mum elements are, respectively, the null event (the empty set) and the
certain event (the set of all possible outcomes of A). If we temper operation-
alism with idealization, we can say that each event in {;A will receive the
answer yes or no when A is performed. Note that, particularly in the contin-
uous case, we may want to extend i3A to a Boolean a-algebra, on which
infinitary versions of union and intersection are defined. For simplicity,
however, I will confine myself to the finite case from now on.
So far, each measurement procedure has been treated independently.
The whole set of events-the set, that is, of events associated with all
possible measurement procedures - has been carved up into Boolean alge-
bras, but no relations have been assumed to exist between events associated
1( 6 'f'I1I' i,III'rllrl'll1';l!/I oj ()/l1I1I11I1II '['1t1'MY

with different pro ·dures. Not " how 'v 'r, tha t a om pi em 'nta tion opera-
tion is everywhere defined, since ea h event a has a comple ment aJ in the
(unique) Boolean algebra which contains it. In addition, we may plausibly
identify the null events of all measurement procedures, and also the certain
events. The intuition at work here is that two events are identical if no
preparation will yield a different result for one than for the other. This rather
vague criterion will be made more precise shortly, but for the present it will
serve. For since we stipulated that for each measurement procedure the set
of outcomes was to be exhaustive, it follows that, whatever preparation
procedure is used, the null event, 0, will receive the answer no, and the
certain event, I, the answer yes, no matter what measurement we carry out.
Thus 0 is a family of Boolean algebras in one-to-one correspondence with
the set of measurement procedures. This family is pasted together at top and
bottom; it is an example of a very elementary kind of structure known as an
orthoalgebra. I defer discussion of such structures until Section 8.1; the
present question is, what further constraints can we lay upon 0? In particu-
lar, in order to relate events associated with different me3surement proce-
dures, can we make precise the criterion used just now, when we identified
all the null events associated with different measurements, on the grounds
that no preparation yielded a different result for one than for another?
Well, every preparation gives a certain probability to the various events of
0 . We associate with each preparation a state, w, which assigns a probability
w(a) to each event a of 0. This enables us to define a relation :5 on 0 :

(7.32) We say that a :5 b if, for all states w, w(a) :5 w(b).

If, further, we identify two events a and b whenever, for all states w, w(a) =
w(b), then :5 is a partial ordering on 0.
It seems that, without significant loss of generality, we have shown that 0
must be a poset-a poset, moreover, with maximum and minimum ele-
ments and on which a complementation operation has been defined. Alas,
dancing in the streets would be premature; without making some significant
assumptions we can't expect the complementation operation to mesh prop-
erly with the ordering relation. Consider, for instance, the experiment
shown diagrammatically in Figure 7.9, which consists of coupling together
two Stem-Gerlach apparatuses, one to measure S% and the other to measure
Sy, so that just one of the beams emerging from the S%apparatus, the z- beam,
say, passes through the Sy apparatus. (This example comes from Beltrametti
and Cassinelli, 1981, p. 145; see also Cooke and Hilgevoord, 1981.) If w e
consider the coupled apparatuses as one experiment, then there will b
three possible outcomes, z+ , y+, and y- .
Figure 7.9 Coupled Stem-Gerlach devices.

In this case we will find that, for all states w, w(y+) = w(y-). It follows
that, if a = {z+ ,y+} and b = {y-}, then b :5 a. Hence b Va = a. But since a and
b are mutually exclusive and jointly exhaustive, a = b.l.. It follows that b V
b.l. = b V a = a oF I , contrary to (7.19). Thus the operation ..l is not an
orthocomplementation with respect to :5.
How might one outlaw such experimental arrangements? One strategy is
to make explicit the assumption that we are dealing with measurement
procedures: we may demand that each event have an internal conceptua l
structure and be recognizable as an experimental question (A ,~) . Then
anomalous cases like this one are ruled out, on the grounds that the appa -
ratus does not measure a specific observable. In doing so, however, we lose
some of the generality we sought; we confine discussion to possible theories
couched in terms of observable quantities and their values. We also assume
that we can recognize which experimental devices provide measurements
of these quantities and which do not. To the extent that our project is that of
prescribing a logical form for the event structure of all successor theories,
these constraints seem unduly restrictive. Nonetheless, the approach is still
general enough to accommodate theories like quantum theory and classical
mechanics.
With these general considerations in mind, let us move to a more formal
mode. (The exposition essentially follows Mackey, 1963; see also Mac-
zynski, 1967.) We take as primitive notions those of observable and state; we
also use the resources of number theory, in particular, the notion of a Borel
set of the reals. (All physically significant sets of reals, and many others, are
Borel sets; see Fano, 1971, p . 215.) Let abe the set of observables, S the set of
states, B(IR) the set of Borel subsets of the reals. A pair (A,~), where, as usual,
A E a and ~ E B(IR), we call an experimental question.
Each state w defines a probability function on the set of questions, such
that for all A E a and for all ~, r E B(IR), w(A,~) E [0,1], w(A,¢) = 0,
w(A,IR) = I, and w(A,~ U r) = w(A,~) + w(A,r) provided A n r = ¢.
We identify two states (WI = w 2 ) if they give the same probability to each
question (A,~), and we identify two observables A and B if, for all wE Sand
It s "'},I' 1111/'///1'1'1111;1111 II/ ()/l1I III II III '/'Itt'my

for all il 8(~), w(A,il) w (IJ,t\) . I{l'spe tivl'ly, thcse i<kntificn tions stat '
that the set of questions and the set of stat '5 nrc complete.
We say that two questions are equivalent, (A ,M - (8,r), if, for all w S,
w(A,il) = w(8,I) . Each equivalence class of questions, [(A,il)], contai ns all
and only questions equivalent to (A,il). Modifying our previous usage, we
refer to an equivalence class of questions as an event; this modification does
not affect the substance of what is said. As before, let 0 be the set of events.
Clearly, any state w can also be thought of as a function on the set of
events such that, for all a in 0, if a = [(A,il)], then w(a) = w(A,il). We define a
relation of orthogonality on 0 as follows.

(7.33) For all a,b E 0, we say that a is orthogonal to b (a 1- b) if, for all w E S,
w(a) + w(b) ~ 1.
Now consider the following postulate (Postulate M).

(7.34) If {aj} is a pairwise orthogonal set of events of 0, then there is an


event b in 0 such that, for all states w E S, w(b) + w(a 1 ) +
w(a 2 ) + ... = 1.

The Mackey-Maczynski theorem (see Beltrametti and Cassinelli, 1981 ,


chap. 13.6) tells us that

(7.35) If Postulate M holds, then 0 is an orthomodular poset with respect to


the ordering ~ defined by (7.32).

[The ~ relation, remember, defined in (7.32) is such that a ~ b if, for all w E
S, w (a) ~ w(b).] Orthocomplementation on this poset is defined by:

(7.36) a = b..L if, for all w E S, w(a) = 1- w(b).

The existence of the orthocomplement of any event is guaranteed by Postu-


late M; clearly the strength of this postulate is considerable. How might one
justify it?
Consider the case when, forsomeAE 0, a=[(A,il)]and b = [(A,r)], and il
is disjoint from r . In this case, a 1- b. It is also plausible to assume that a
converse relation holds: that, if a 1- b, then there exists a single observable, A,
and disjoint Borel sets, il and r, of real numbers, such that a = [(A ,il)] and
b = [(AX)] . This would be true, forinstance, if a were the null event, since in
that case a = [(A, ¢)] for every observable A. Note that if neither event is the
(..IlIlIIlllIill / .(I,'\il' II)')

Ilull t'Vt' llt ,


t1H'1l IIII' pl ol lllll lll iat y of the assumption is in 'reas ' d by th ' ('x is
tt' llce of erta in Stoltl'S III S. Observe, for exa mpl e, wh at happens if there
\"xists a slate Wu su h th at wu(a) = w.(A,.::1) = 1. Then for any event b = [(B,r)]
such th at a .1 b, we have wu(b) = 0, and, to use the language of Sections 3.7
and 3. 8, B can be neither independent of A nor (in the quantum-r:,echanical
sense) incompatible with A. But if B is either functionally dependent on A or
otherwise compatible with A , then the assumption holds. Of course, in a
successor theory, this may not exhaust the list of relations between observ-
ables, but it is hard to envision a relation that would produce a counterex-
ample.
Given two orthogonal events, a = [(A,.::1)] and b = [(A,r)), associated with
a single observable A, we may reasonably postulate the existence of others,
specifically of the events c = [(A,Ll U r)] and d = [(A,~ - (.::1 U 1))], such that,
for all w E S, w(c) = w(a) + w(b) and w(c) + w(d) = 1. Considerations of this
kind do not compel assent to Postulate M, but nevertheless they do give it
plausibility.
Notice in this regard the effect of defining each event in r; as an equiva
lence class of questions, and thereby giving it an internal structure. Al
though the specification of the structure of r; contained in (7.32)- (7.36) is
independent of this definition, we look to the internal structure of events, on
the one hand, for a criterion for distinguishing well-behaved events from
impostors, and, on the other, for arguments to motivate Postulate M .
Let us now look back at the problem we started with, whether a priori we
can specify any algebraic constraints that the set of events dealt with by a
theory must satisfy. We see (1) that, if we think of these events purely
experimentally, then the event structure of any theory will be an orthoalge-
bra, and (2) that, given certain assumptions, the event structure of a theory
whose expression involves reference to observables and their values will be
an orthomodular poset.
A stronger claim than (1) has sometimes been made (for example, by
Finkelstein, 1969; Jauch 1968, chap. 5; and Piron, 1972) that the set of
experimentally specifiable events of any theory must form an orthocomple-
men ted lattice. As these authors point out, classical mechanics and quantum
theory both conform to this requirement; the lattice for classical mechanics
is characterized by the additional assumption of distributivity, and that of
quantum theory by the weaker assumption of modularity. However, there
are (to my mind) serious inadequacies in their accounts. In particular, to
claim that the set of all events has the structure of a lattice is to claim that, for
every pair of events a and b, there exist events a 1\ b and a V b which are the
infimum and supremum, respectively, of {a,b} with respect to a particular
20() '1'111' !/I/t'f'III'/'III!iO/lO! ()lIf/1I11I1II 'I'1t1'OIY

ordering of events. For th '8 • authors, 'Vl'nts < r 'sp ified in op'rational
terms; thus to make good their laim th ey need to giv a genem l pres ription
whereby, from two recipes -one for asking a and the other, possibly using
a totally different experimental arrangement, for asking b- there ca n be
generated two more, for a 1\ b and a V b, with the required properties. It is
this problem, of giving experimental definitions of the lattice-theoretic
operations, which resists adequate solution. *
The question arises: what further assumptions guarantee that the ortho-
modular poset (0,S;) suggested by the Mackey-Maczynski approach will be
a lattice? We find (Beltrametti and Cassinelli, 1981, pp. 118, 152, and 297 -
298) that

(7.37) If (a) (0,S;) is a separable orthomodular poset,


(b) 5 is a sufficient (a-)convex set of states,
(c) for all a,b E 0, if, for some w E 5, w(a) = w(b) = 1, then there
exists cEo such that c s; a, c s; b, and w(c) = 1,
then (0,S;) is an orthomodular lattice.

Briefly, (0,S;) is separable if every set of mutually orthogonal events in it is at


°
most countably infinite; 5 is sufficient if, for all a E except the null event,
there is awE 5 such that w(a) = 1; for an account of convexity, see Section
5.4. Although an assumption like (b) above was at work in our informal
justification of Postulate M, the trio (a), (b), and (c) are, to put it politely,
nontrivial. Indeed, assumption (c) virtually posits the existence of a lower
bound of the pair of events {a,b}.
Quan tu m mechanics conforms to the antecedent conditions of (7.37), and
so does classical mechanics, though, in the latter case, some work has to be
done to show that (0,S;) is indeed separable. What then distinguishes the
two theories, algebraically speaking?
In Section 3.9, the principle of superposition and the uncertainty principle
(there glossed as the existence of incompatible observables) were put for-
ward as peculiar to quantum mechanics. Each of these has an algebraic
counterpart, as follows. The superposition principle states that

(7.38) If r 1 and r2 are nonnegative real numbers such that r 1 + r2 = 1, then, if


wIt w 2 are pure states in 5, there exists a pure state W3 in 5 such that,
for all events a in o, w3(a) = r 1w 1 (a) + r2w2(a) .

• For detailed criticisms, see Hughes, 1982; note thattherelations on lines 27 and 31 of page
249 of that article should read "0 < w(q . ql.) < 1" and "T:$ (q . ql.)l., " respectively. See also
Holdsworth and Hooker, 1983, pp. 136 - 141.
()III/III"tII 1.11,11/1' OJ

A IlIIrl' sln/t'isdl'filwd II jill {' Irl'l1)JI point in th· onv 'x s ·tS. Incompati bil -
Ily is dd'ined thu s:

( /, 1</) Let a and b be any two orthogonal elements of 8 distinct from the null
event; then there exists a non-null event c in 8, distinct from both a
and b, such that c < a V b.

No lattice conforming to assumptions (a), (b), and (c), of (7.37) can be


distributive if either the superposition principle or the incompatibility prin-
ciple holds. Neither principle is true of the event structure of classical me-
chanics.
There is no doubt that algebraic reformulations of these principles add
something to our understanding of quantum theory. But no. interpretive
work is being done by such reformulations, Indeed, no such work can be
done by the algebraic approach as long as its aim is seen as that of recaptur-
ing algebraically the Hilbert-space formalism of the theory. Furthermore,
,my algebraic reformulation remains a partial reformulation of quantum
inechanics, for two reasons. The first is the gap, already remarked on in
' ection 7.4, between algebraically specifiable structures and the structure of
(71). The second, related, reason is the absence of a dynamical principle
from the reformulation. Although, as we saw in Section 3.10, under certain
assumptions the set of mappings It: S ---+ 5 describing the dynamical evolu-
tion of a system forms a group, it is only when these states are representable
in a Hilbert space that we can apply Stone's theorem to show that all these
mappings are functions of a single observable. To date, quantum logic has
provided no equivalent to Schrodinger's equation.

7.6 A Formal Approach to Quantum Logic


"Quantum logic" can refer just to the study of certain algebraic structures
and the probability measures definable on them. But traditionally logic has
been the science which investigates a family of notions-consistency, va-
lidity, entailment, and the like - all of which pertain to (sets of) sentences of
a language. Thus a set of sentences can be consistent, one sentence may be
entailed by another, and so on, In the remainder of this chapter I look at
quantum logic from this viewpoint, and I will use the phrase "quantum
logic" in this sense from now on.
We sa w in Section 7.1 that the set LC of sentences of a simple language can
be mapped onto the set of elements of the Boolean algebra .13 16 , and that the
logical relations between the sentences can be "read off" from the algebraic
() ' 'J'III' IIIII" /Irl'llllitlll til (,1l1lll1ltllll '1'//1'1".1/

relations between the cle m ' nl s of 'I I It" Th ' COJ1 I1l'cl ivl's of I h ' la nguLl!-;l' Llrl'
& (conjunction), v (disjunction), a nti (n ega tion), a nd the ma pping! ta king
sentences of Lc into elements o f 13 16 is such th a t, for all sentences
A, BE Lc,

(7.40) f(A & B) = f(A) 1\ f(B)


f(A v B) = f(A) V f(B)
f(- A) = [f(A)]-L

As in Section 7.2, the elements of .13 16 represent the propositions expressed


by the sentences of L c .
A full algebraic treatment of classical logic would consider every mapping
f of the (syntactically defined) set Lc into an arbitrarily chosen Boolean
algebra .13 which conformed to (7.40). Here we confine ourselves to a spe-
cific algebra and a single mapping, and so talk of consistent sets of proposi-
tions, and of one proposition entailing another, without doing violence to
these logical notions. Note, however, that the results (7.41)-(7.43) below
hold both in general and (a fortiori) for the particular mapping f we choose.
We found that the natural ordering::s; of the elements of the algebra
corresponded to a relation F of entailment among sentences of Lc: for
sentences A, B E Lc,

(7.41) A F B if and only if f(A)::S; f(B)

The following purely algebraic theorem also holds. Let .13 be a Boolean
algebra; then for all a, b E .13,

(7.42) a ::s; b if and only if every ultrafilter on.13 containing a also contains b.

Whence we obtain,

(7.43) A F B if and only if every ultrafilter on .13 16 containing f(A) also


contains f(B) .

Recall from Section 7.2 that the ultrafilters of .13 16 playa special role: th ey
represent maximal consistent sets of propositions. Each possible truth-as-
signment to the sentences of Lc is associated with a homomorphism of .13 16
onto Z2' that is, with a function that maps all and only the members of some
ultrafilter of .13 16 onto the element 1 of Z2' Only the propositions lying in th e
(..)/I1111111111/.0;.:il ' ) ()J

IIll r. lli ll t'r .lfl' oI MII/ ', III '(/ II H' v.l1I 1(' "Trll e" by the associa ted truth assig nme nt.
TillIs (7.43 ) is li lt' .dgdll\lit' equi va lent of:

( • •, 1/) A B if a nd only if B is true on every truth-assignment to Lc on


which A is true.

The algebra of propositions of classical logic is Boolean. The question now


IS Ihis: what are the characteristics of a logic, the algebra of whose proposi-
lions has the non-Boolean structure of S('JI)?
In the present section I will look only at the formal characteristics of such a
logic; no prior interpretation of the propositions of this logic is assumed.
This contrasts with what we did in Section 7.1, where it was always clear
wha t propositions we were dealing with: each node of ~16 represented the
f.lc t that the penny-quarter system had a certain property-that the penny
was tails-up, for example. .
With regard to the connectives, the situation is a little different. Since they
,He logical connectives they derive their interpretation from their fo rma l
behavior. But again, in contrast to the classical case, no prior interpreta tio n
is assumed. Whereas in the example used in Section 7.1 th e conn e li ws
were assumed to be the truth-functional connectives of classica l logic, no
such assumption is at work here.
Classically, entailment is usually defined by (7.44), in terms of truth -as-
signments. Given this definition, (7.41) and (7.43) appear as theorems,
capable of proof. In quantum logic, however, comparable statements ap-
pear as definitions of logical relations. Any interpretation of the connectives
is to be " read off" the algebraic structure; no independent route to it is
a vailable.
I will present results quite generally, using the structure G12 for illustra-
tion. From this one example, we can see straight away that there are two
a pproaches open to us. Like the set of subspaces of a Hilbert space, Gl l can
be considered either as an orthomodular lattice or as a partial Boolean
a lgebra. In this section a lattice-theoretic quantum logic will be described.
This may be called orthomodular quantum logic. I will indicate later how
this account needs to be qualified on the PBA approach.
Consider a language LQ containing a set LQ of sentences. These sentences
are mapped by a function f onto the elements of an orthomodular lattice L.
Assume, for example, that LQ contains the atomic sentences p, q, r, s, and t,
which are m apped onto the atoms of G 12 , as shown in Figure 7.10. LQ also
conta ins two binary connectives, 1\ and V, and a singulary connective,~.
We im pose a condition ana logous to (7.40), thus establishing a connection
204 T/J e ill/I' rIJrl! /fI/iIJII of )/111/1/11111 T/Jt'ory

Figure 7.10 Mapping of sentences of ~Q onto G12 •

between the connectives of LQ and the operations on the lattice. For all
sentences A, B E 2Q'

(7.45) f(A 1\ B) = f(A) 1\ f(B)


f(A V B) = f(A) V f(B)
f(~A) = [f(A)]l.

We can see from Figure 7.10 that, in our example, f(~p) = f(q V r), and
h nce that ~ p is equivalent to q V runder the mappingf. As in the classical
casl.', I will confine myself to a single mapping; thus, in what follows, I will
omil Ih e phrase " under the mapping!" which, ideally, should accompany
a/l s la l 'ments about logical relations between the members of 2Q. As be-
fore, th e res triction to a single mapping licenses talk of logical relations
between propositions, in this case, quantum propositions.
The orthomodular lattice ..£(7£) of the set of subspaces of a Hilbert space is
atomic; that is, there are elements of ..£(7£), to wit the rays of 7£, immediately
above the zero of ..£(7£). [(7.12) provides a formal definition.] In what
follows I restrict myself to atomic orthomodular lattices.
As in the case of an atomic Boolean algebra, an ultrafilter U on such a
lattice can be simply defined:

(7.46) U is said to be an ultrafilter on ..£ if there is an atom a of .L such that


U= {b: a ~ b}.

U is then the ultrafilter generated by a. In Figure 7.11 I show a typi a l


ultrafilter on Gl2 .
()l1l1l1ll1m l.o,lliC 70S

Nnw (7 .'1 2) ho ld fm j .t. for ,1 Boolean a lg 'bra; (or.111 (/, U, .1...,

I Ii II - II i( and on ly if 'very ultrafilter on L containing a also contains b.

WI' can use each ultrafilter U on L to define a truth-assignment u (or its


" ".dogue) to quantum propositions: for any a E L,

·111) We say that a holds under the assignment u if and only if a is in the
ultrafilter U.

!'Ill' (unction u: L ---+ {O,I} is the characteristic function of U; we write


°
1/(1/) 1 if a E U, and u(a) = if a ~ U. (7.47) now tells us that the semantic
j·"I,lilment relation on the set of quantum propositions will coincide with
I Ill' ordering relation on the lattice; for all a, b, E L,

1/ ./ (1) (/ FQb if and only if a holds whenever b holds if and only if a ~ b.

As Pu tnam (1969, p. 233) pointed out, in many ways the behavior of


qll,lntum connectives resembles that of their classical counterparts. The
1.IIIi e structure of L guarantees that, for any sentences A, B, and C of LQ'

1/.t1l1,,) A FQ A VB
if A FQ C and B FQ C, then A V B FQ C
II .',flb) A,B FQ A 1\ B
A 1\ B FQ B

/'iXlIrc 7. 11 Typi al ultrafilter on G' 2'


()6 'J'II/' Ill/elfln'/II/IOII II/ )//(//1/1/111 '1'11/'/11.'1

(7.50c) A F-Q ~~ A and -. .;1 I (J;1

I=QA V ~A

I=Q ~(A 1\ ~A )

[We write I=QA if A holds on all truth-assignments; note also that the upper
line of (7.S0b) involves a modest extension of our notation.] There are of
course casualties among the theorems of classical logic. Notoriously, A 1\
(B V q ~Q (A 1\ B) V (A 1\ q, since, in general, an orthomodular lattice is
not distributive. (Friedman and Glymour, 1972, provide an axiomatization
of orthomodular quantum logic which was proved complete by Hughes,
1979; see also Dalla Chiara, 1986, and Gibbins, 1987, chap. 9.)
More fundamentally, we may think, the assignments provided by the
ultrafilters on .L do not behave truth-functionally, as classical truth-assign-
ments do. That is to say, the truth-values of compound sentences are not
uniquely determined by the truth-values of their components.
Consider, for example, the assignment u determined by the ultrafilter Uq
on G12 which contains the atom q (see Figure 7.11). On this assignment q
holds, but the other atoms do not. The proposition p V q lies in the ultrafilter
Uq and therefore holds on this assignment. But p V q is identical with the
proposition s V t. Hence, on this assignment, we have

(7.51) u(s) = 0, u(t) = 0, and u(s V t) =1

but we also have

(7.52) u(p) = 0, u(r) = 0, and u(p V r) = °


Algebraically, the fact that the truth-assignments of orthomodular quan-
tum logic are not truth-functional appears as the absence of two-valued
homomorphisms on nondistributive lattices. Kochen and Specker's
theorem tells us that there are none such on .L('Jf); Jauch and Piron (1963)
have shown that the existence of such mappings from an orthomodular
lattice .L onto Z2 implies the distributivity of .L.
We can mimic some of the idiosyncracies of orthomodular quantum logic
within a classical modal logic. This is done by "translating" the propositions
of quantum logic into modal propositions. The "translations" all use the
modal operator 0, which can be read as, "It is necessary that . . . " (For an
introduction to modal logic, see Hughes and Cresswell, 1968.) We now
"translate" a given sentence in LQ by rewriting it, with its quantum connc -
tives replaced by classical, and with the necessity operator added at the front
(,)111111/11111 /.0;':;(' )()7

IIf [l1\' I'l ('I1I\.'IH'('; 1hili /I 1\


'Jlll'l trans lated asil(a & IJ)(aandpareA and B
II'writl 'n, with 'I.Issir.llt'olllll'clivcs replacing quantum connectives),
C l2 provides illustrations of the nonclassical features of quantum logic
which find analogues in these modal translations, Let u be the quantum-log-
11'.1 1 truth -assignment determined by the ultrafilter Uq , as before (Figure
7 II), and let v be a truth-assignment to a classical modal logic (54, say), We
°
1I.lve already seen that u(s) = = u(t), but that u(s V t) =1. Similarly, we
lIla y havev(Oa) =0 =v(OP), butv[D(a V P)] = l,asin the case whenais a
(,(lI1tingent proposition andp= -a, Again, from Figure 7,11 we see that
11(/) = u(~t) = 0, Likewise, if a is any contingent proposition we have
u(na) = v[O(-a)] = 0,
These remarks bring to mind Coders (1933) demonstration that intui-
lionistic logic can be translated into the classical modal system 54, In fact,
1).1lla Chiara (1986) has shown that a comparable result holds for quantum
h)gic and a modified Brouwer system, The modal translation she uses is,
however, more complex than the one given above, and nothing as precise or
,IS omprehensive as that result is being claimed here; I have merely pointed
out some formal affinities between orthomodular quantum logic and the
logic of a particular class of modal sentences,

7,7 An Unexceptionable Interpretation of Quantum Logic


In Section 7,6 we saw that, within the lattice of quantum proposi'tions,
ultrafilters can be used to define functions which are the analogues of
1ru th -assignments. Let us call these functions "valuations" to avoid making
unjustified assumptions. Each valuation u is the characteristic function of
some ultrafilter U [see (7.48)] on the lattice.
If L is atomic, as we assume, then each ultrafilter contains just one atom.
Thus, for each valuation u there is exactly one atom a such that u(a) = 1, and
for all bEL,

(7.53) u(b) = 1 if and only if a::s; b

Let us now cash this out in terms of quantum systems and their states, and so
obtain an interpretation of the propositions of a logic based on the lattice
,£,(71 ). We need first to distinguish three kinds of things: quantum events,
quantum propositions, and subspaces of a Hilbert space. The subspaces of a
llilbert space act as mathematical representations both of quantum events
and of quantum propositions. A quantum proposition is whatever is ex-
pressed by a sentence of quantum logic: just what this is we rely on our
interpretation to tell us. A quantum event (also called an "experimental
2(J8 '/'1// ' IlIlaprI ' llIl;OIl IIj (.)111111111111 'I'h, 'IIIY

question" ) is a pair (A,~) . Th . fa ' I Ihal we Me not at thi stag' giving any
further account of these entities does not mean that none is needed; on th
contrary, we are still engaged in the project, announced at the beginning of
Chapter 6, of gaining more insight into their nature, and, indeed, one reason
for seeking an interpretation of quantum logic is that it may help us to do so.
Propositions, events and subspaces are in one-to-one-to-one correspon-
dence; I will use lowercase italicletters a, b, c, . . . for propositions, Ea, Eb ,
Ee, . . . for the corresponding quantum events, and La, Lb, Le, . . . for the
corresponding subspaces of 'Jf. Strictly, the three sets form three isomorphic
lattices, but I will refer to all three structures indiscriminately as .£('Jf),
relying on context to make clear what the elements of the lattice in question
are.
Each atom La of .£('Jf) is a one-dimensional subspace of 'Jf and so repre-
sents a pure state of a system. Thus the set of pure states is in one-to-one
correspondence with the set of valuations of our quantum logic. Now let P a
be the projector onto the atom La, and for any element Lb of .£('Jf) (that is,
any subspace of 'Jf), let P b be the projector onto Lb' As we know, each
subspace Lb (alternatively, each projector P b) represents a quantum event Eb,
and every such event is assigned a probability by the state; if the state is P a
this probability is given by,

where v is a normalized vector in La. We know that

(7.54) Tr(PaP b) = 1 if and only if v E Lb if and only if La ~ Lb

An event Eb is assigned probability 1 by a pure state P a if and only if the


su bspace Lb includes La . But the latter holds if and only if the proposition b
lies in the ultrafilter defined by a. We see that P(Eb) = 1 provided that u(b) =
1, where u is the valuation corresponding to the (pure) state of the system.
A straightforward and unexceptionable interpretation of quantum logic
now presents itself. Let us unpack each quantum event E, so that E = (A,~);
the corresponding quantum proposition may be read as, "A measurement
of A will yield a result within ~ with probability 1." The truth or falsity of
this statement is determined by the state.
Given the possibility of interpreting quantum logic in this way, its resem -
blance to a logic of modal sentences is not surprising, since the sentential
operator, "There is probability 1 that . . . ," is the probabilistic equivalent
of the necessity operator D.
I have called this interpretation of quantum logic "unexceptionable." I tis
)111/11/1111/ /.oXic 09

,11 "0 lIn < mbitiolls. ()II I hl' pro pos 'd rcading of qua nlum propositions, these
pl'IlposiLions arc JU Sl a subsl'l of the predictions quantum mechanics makes
.t1lOlIllhe probabiliti s of quantum events, and quantum logic offers merely
.1 partial reformulation of quantum theory in the formal mode-that is, a
I(·formulation expressed in terms of sentences and the relations between
tlwm . But many devotees of quantum logic were after bigger game. In
p.lrlicular, they took the logico-algebraic approach to quantum theory to
offt'r a way, or various ways, to talk of the properties of systems. The next
'.('clion looks at one such proposal.

7.8 Putnam on Quantum Logic


For an example of a nontrivial logic based on orthomodular lattices, we tum
to llilary Putnam. Though his 1969 paper, "Is Logic Empirical?" only
sketched the outlines of such a logic, it presented with splendid vigor some
of the most ambitious claims made on quantum logic's behalf. The claims
made are these.

( I) Logic is an empirical science; some of the "necessary truths" of


classical logic could tum out to be false for empirical reasons (Pul-
nam, 1969, pp. 216, 226).
(2) Just as the general theory of relativity requires us to move to a non -
Euclidean geometry, so our best interpretation of quantum me-
chanics requires the adoption of a nonclassical logic (p. 234).
(3) By adopting a quantum logic we can retain a strong account of the
properties of a system (p. 229).

I will discuss (1) in Section 7.9; it turns outto be ratherless revolutionary a


t hcsis than one might think. The analogy proposed in (2) is suggestive, and I
will myself make use of it in Section 8.9; however, the sense in which I will
use the term "quantum logic" is some distance from Putnam's. This section
and the next will largely be occupied with claim (3), which I take to be false.
Indeed, in a correspondence quoted by Stairs (1983b, p. 588),* Putnam has
written that he no longer subscribes to it. In these sections "Putnam" will
r fer to the Putnam of 1969, continuous with, but epistemically distinct
from, his present counterpart.
In formal respects the logic Putnam advocates is that presented in Section
7.6; ware to " rea d the logic off from the Hilbert space 7-1" (p. 222), and the

• I a m much indcbt~d to th is paper.


210 rill! Il/lallrl' ll/liol/ oj (j/lllllllllil '1'/I Cory

set of subspaces of 7i is to b regarded as a lattice. This la tti e is nondistrib


utive (see Section 7.3); thus in the corresponding logic the distributive law is
not valid, and the inference from A 1\ (B V C) to (A 1\ B) V (A 1\ C) fails.
Putnam boldly asserts that "all so-called 'anomalies' in quantum mechanics
come down to the non-standardness of the logic" (his emphases); once these
are given up, he assures us, "every single anomaly vanishes" (pp. 222, 226).
The propositions represented by the subspaces of 7i are, for Putnam,
property ascriptions, and among the anomalies which will disappear with
the adoption of quantum logic are, presumably, those associated with such
ascriptions. As examples Putnam uses the values of position and momen -
tum. Observables like these, which have continuous spectra, are indepen-
dently problematic (see Teller, 1979), and so I will restate his position .in
terms of two noncommuting observables A and B, each of which has three
possible outcomes: respectively, a}, a2 , a3 , and b}, b2 , b3 • I will use these
lowercase letters to refer to the lattice points corresponding to (A,a j ), et
cetera, and also as sentences, "The system has the property (A,a})." I assume
further that the operators corresponding to A and B share no eigenvectors,
so that the lattice we are dealing with is the 14-element lattice shown in
Figure 7.12.

Figure 7.12 Fourteen-element orth omod ular lattice.


1,)11111"11111 l .oSil' 2 11

In this laltkt',

Si nce Putnam reads ' a1 V a2 V a3 ' as "The system has an A-value," he


rl')~ards the conjunction "The system has an A-value and the system has a
/I value" as logically true. It is thus a truth of (quantum) logic that every
observable for a system has a value (at all times). Note that a1 V a2 V a3 is
lru e even if the system is in a state which makes b1 (say) true. However, in
th at state the sentence

is true, but the sentence

is false, since, in the lattice,

reflecting the fact that no state makes either b1 and aI' or b1 and a2' or b1 and
113Simultaneously true. The fact that we cannot infer (II) from (I) is, of
ourse, one example of the failure of the (classical) distributivity law.
Consider now the objection to Putnam made by Harrison (1983). (He too
lalks of "position" and "momentum," and in the quotations below 1 have
replaced these words by "B-value" and "A-value," respectively.) Harrison
suggests first that, according to quantum theory, if a system has a determi-
nate B-value, it is false that it has any of the A-values specified in the second
conjunct of (I). He continues:
ll ence, if quantum theory is true, the truth of the first conjunct in (I) implies the
falsity of the second, and (I) itself must be false. Thus the very circumstance, that a
particle cannot have a determinate B-value and A-value, which implies the falsity
of (II), also implies the falsity of (I), and the difficulty for classical logic is removed.
(I'. 84)

But his argument from the falsity of aI, a2, and a3 to the falsity of a1 V a2 V
113 relies entirely on his treating V as classical truth tables prescribe. Clearly,
it is no objection to Putnam's system just to say that truth-table analysis tells
us that th truth of a disjunction requir's the truth of at least one of the
<.Iisjunct . This III 'r 'ly tells Putnam SOI11l'lhing he already knows: that his
sys lt'm is nol cbssicn l.
212 'I'll I' ill/apn ' /lIl io ll 01() I1I1I1/ 1I1I1 1'11 1'11 /1/

Again, consider Harrison's ob jl'cli on to Putnam's laim Lha La. V {/ 2 V {/


is a logical truth:
If the second conjunct in (I) is a logical truth, then quantum theory must be false, (or
quantum theory just asserts that a particle does not have to have an A-value a., or an
A-value av etc. for all the A-values there are. (P. 84)
And, of course, if quantum theory were false, then quantum logic would be
unnecessary. But once more, and for the same reason, the criticism fails .
Nonetheless, an important question emerges from Harrison's paper. Even
if we accept a1 Va2 V a3 as true, why should we read it as "The system has
an A-value"? Indeed, what is the content of the claim that the system has an
A-value if it can be accompanied by the four statements that (i) this A-value
is not aI' (ii) nor is it a2' (iii) nor is it a3, and (iv) these three values of A are all
the A-values there are? (See Stairs, 1983b, sec. IV.)
Certainly, Putnam runs into trouble when he makes the further (indepen··
dent) claim that, not only does the system have anA-value in all states, but
that " if I measure I will find it" (Putnam, 1969, p. 230). Assume, for the sake
of argument, that the system is prepared in a state which makes hI true, and
that an A -measurement now yields a3 • If I have simply "found" the A -value
of the system, then surely a3 was true of the system before the measurement,
along with hI. But, as we have noted, on Putnam's quantum logic the
conjunction hI 1\ a3 is always false .
We may relinquish the claim about measurement, even though it carries
away with it Putnam's purported resolution of the measurement problem,
and therewith much of the motivation of his project, but then we are left
with the odd notion of a " disjunctive property." Apparently the system can
have the property (A, a1 V a2 V a3) while having none of the "atomic"
properties (A,a1 ) , (A,a 2), or (A,a 3). Disjunctive properties are not wholly
implausible; in fact Teller (1979) has argued that all properties involving
continuous quantities are disjunctive, since quantum mechanics never spec-
ifies a sharp value for, say, momentum, but at most an interval within which
it lies. Nevertheless, in the case of an observable with a discrete spectrum,
the acceptance of disjunctive properties seems to dilute to insipidity the
claim that, for any system, every observable has a value at all times. The
package we have bought seems markedly less attractive than the product
which was advertised.

7.9 Properties and Deviant Logic


Let us review the situation . Section 7.6 gave a formal accoun t of quantum
logic. A set of sentences, closed under th e logica l co nn ec ti v es ~, V , and 1\, is
Ollflllllllli/'0/>lic

, lIppli '<.I wilh .1 N"!llllnlkll whi h maps them systemati ally onto the ele-
I1wnts of a la lli '. Th . purported logical relations between the sentences are
Ilwn read off from the algebraic relations which hold between these ele-
l\lents. But the logic that results constitutes an alternative to classical logic
onl y when the sentences of this formal language are given a specific inter-
pretation; as Section 7.7 showed, an unexceptionable, if unadventurous
interpretation of the lattice elements as modal propositions, D(A,d), is avail-
.Ible. Under this interpretation quantum logic formalizes a particular ac-
count of necessity; it supplements but does not supplant classical logic.
When Putnam says that the rules of quantum logic" conflict with classical
logic" and that the lesson to be drawn is that "we must change our logic"
(p. 221 ), he has another interpretation of the formal system in mind. As we
sa w in Section 7.8 he reads the propositions of quantum logic as indicative
propositions ascribing properties to microsystems.
It is this interpretation that has given quantum logic that hint of philo-
sophical perversity- delicious or detestable according to taste - conveyed
in the phrase "deviant logic." On the one hand, these are propositions of a
kind to which, prima facie, we would expect classical logic to apply; on the
other, they are just the statements which results like the Kochen and
' pecker theorem tell us behave in a nonstandard way: given an exhaustive
list of the possible values of each observable for a system, at no time can we
truly ascribe exactly one of these values to each observable.
Now this problem is going to be faced by anyone who offers an interpre-
tation of quantum mechanics which involves ascribing properties to sys-
tems. And no matter what kind of account is given of why the properties
behave as they do, this account will always have a counterpart in the formal
mode. Assume, for example, that the account posits states of affairs which
ca n or cannot obtain. Then, corresponding to each of these states of affairs
there will be a statement which mayor may not be truly asserted. Con-
straints on possible states of affairs will appear in the formal mode as
restrictions on what may be truly said about them. It follows that anyone
who talks of the properties of systems is committed to some version of
"quantum logic."
Witness Harrison, whom we met inveighing" Against Quantum Logic" in
Section 7.8. He writes, "1 had always supposed that, according to quantum
theory, . .. [if] a particle's position is determinate, it is false that it has any
of the velocities specified in [an exhaustive list of velocities]" (Harrison,
1983, p . 84). In other words, in quantum mechanics the truth of one atomic
proposition - an ascription of position-entails the falsity of another-
any speci fi c ascription of velocity. Now any systematic account of such
entailment on titutes a logic; furth er, since no classical conjunction of
214 '/'11(' 1IIII'Ip,.('IIIIIO/l 0/ (.)/111111/1111 'I'III'IIIY

atomic propositions i a contradiction, this logic will bl' nonclassical. Thus


the arguments Harrison presents do nol speak against quantum logic, but in
favor of one system rather than another.
In fact, the quantum logic proposed by Reichenbach (1944, secs. 29 - 33)
was concerned with this very question: how should we formalize the rela -
tion of mutual exclusivity, or, as he called it, complementarity between
ascriptions of precise values to incompatible observables. Reichenbach 's
solution was to move to a three-valued truth-functional logic. Sentences
could be true, false, or indeterminate; sentences expressing complementary
propositions were such that, if one received the value true (or false), then the
other received the value indeterminate. Conjunctions of such sentences
were perfectly well formed, but they could never receive the value true.
Reichenbach contrasted his three-valued logic, not with the algebraic
analysis of Birkhoff and von Neumann (1936) (the ancestor of all algebraic
approaches), but with the Copenhagen interpretation of quantum theory,
or, as he termed it, "the Bohr-Heisenberg interpretation" (Reichenbach,
1944, p. 139). The account of property ascriptions offered by this interpre-
tation has a markedly operationalist flavor. According to Bohr, one may
ascribe properties to a system, but the concepts involved (position, momen-
tum, and so on) are not applicable to the system at all times. Each becomes
applicable only when certain experimental conditions are realized:
Closer examination reveals that the procedure of measurement has an essential
influence on the conctitions on which the very definition of the physical quantities
rests. (Bohr, 1935b, p. 65)
Note that, as so often, Bohr is here making a point about the conditions of
meaningful discourse. These conditions are contextual; if we are dealing
with an experimental procedure designed to measure, say, momentum,
then we cannot talk meaningfully of the position of a system. Bohr writes of
" essentially different experimental arrangements and procedures which arc
suited either for an unambiguous use of the idea of space location, or for a
legitimate application of the conservation theorem of momentum" (Bohr,
1935a, p. 699).
Bohr's account is amenable to formal presentation (though this runs
contrary to his own views on semantics; see MacKinnon, 1984). Bub (1979,
p. 118) suggests that
. . . Bohr regards the notion of truth as meaningful only in the context of a Boolca n
possibility structure, i.e., to ascribe a property to a system only makes sense with
respect to a structure of possible properties which form a Boolean algebra. [n the G ISt'
of a quantum mechanical system this possibility structure is non-Boolean. The "I'
plication of the classical notion of truth, or the attribution of physical propcrtiefl to
Nih It .1 H y ~ l<'lll , I'I 'qlll,,· / clnssi al men flilring syslem, which fixes a
,dl ' '''I1 t'I ' 10 .1
1,,11 11<'101.11' l3ooll'.111 .llgl'hl'll III I hl' non Ooolcan possibility structure.

i'lli' resulting logic of property ascriptions is strongly reminiscent of one


),roposed by Kochen in 1978.
()n Kochen's account, the set of possible properties of a system is subdi-
vuh'd into Boolean subalgebras, each of which comprises a set of available
),Iopcrties, as we may call them. Which properties are available at a given
I111H' is determined by the interaction the system has most recently under-
,',Pill'; each interaction will leave the system with a set of available proper-
III's, a nd this set has the structure of an interaction algebra (Kochen's term).
Tilt, spin -t particle is a particularly simple case; the interaction algebras are
I'd('h associated with a direction a in physical space, and have just four
"h'm 'nts apiece: {¢ ,(5",+t),(5",-t),(5",,±t)}. Hence any such set is a set of
.Iv.lilable properties.
Among the available properties, only some (at most a half) will be actual;
Iypi ally, a spin-t particle may have actual properties {(5 x ,-t ),(5x ,± The m.
I,f\'perty ¢ is never actual; it is available only in a purely technical sense. To
11 ~lt' the terminology of Section 6.9, the system has, at any time, a value-state,
. 1 . When this is maximally specific-and sometimes it is not, as in the case of
I lit' completely unpolarized electron (see Section 8.6)-A picks out an ultra -
IlIl t'r in the (Boolean) interaction algebra.
Th e value-state in tum determines the statistical state. A new interaction
wi ll leave the system with properties in a specific new interaction algebra,
),ulthe transitions are not deterministic; each of these (new) available prop-
I,rties is assigned a probability of occurrence by the (old) statistical state. The
~; I.)li s tical state assigns a probability to every possible property-in other
words, to every property in every interaction algebra. Statistically, the var-
iotls Boolean algebras all hang together in a familiar way. The family of
Ih 'se algebras forms the partial Boolean algebra characteristic of 5(71), each
property is representable by a subspace of 71, and, as usual, the (pure)
stotistical states are represented by normalized vectors or by projection
operators onto rays of 71.
On this interpretation the descriptive and the dispositional aspects of
sl:ltes are distinguished; these two functions of a classical state are per-
formed by two distinct kinds of state. In this division of labor the value-state
giv 's us information about present properties and the statistical state tells us
what we may expect from future interactions.
The logic of property ascriptions that emerges is nonclassical; the set of
propositions which ascribe properties to systems forms a partial Boolean
.dgt·bra . Wilhin Ihe language we us to expre s these propositions, not all
2 16 'f'lll' IlIlclllrl'llIllolI II/ )/11111111111 1'111'11/11

sentences can be meaningfully COlllll'c ll'd ; Ihe coone lives 1\ :lI1d V are Ihus
"partial connectives" in the sam' s 'os ' Iha t the operations on a PI3A are
partial operations. At any time only one maximal Boolean ubalgebra of
propositions applies to the system. The ultrafiJters on that subalgebra act as
two-valued truth-assignments to the propositions within it, and to each
ultrafilter corresponds a value-state. Among the propositions within this
sub algebra the laws of classical logic obtain. The propositions that lie out-
side it may conveniently be given some third truth-value, neither true nor
false, to indicate that neither they nor their negations are true. (Hughes,
1985b, gives a detailed account of the semantics of this logic.)
Algebraically, this is precisely the quantum logic that Bub finds implicit in
Bohr's writings. This is not to say that Bohr and Kochen share a common
interpretation of quantum theory. Rather, they offer interpretations which
differ both in detail and in the metaphysical attitudes they express. In the
first place, whereas on Kochen's account the Boolean interaction algebras
are selected by any kind of interaction, on Bohr's view the classical nature of
measuring instruments gives measurement interactions i\ special status.
Secondly, the ontological commitment urged by Kochen is not shared by
Bohr, who indeed took pains to distance himself from others (also associated
with the Copenhagen tradition) who held that physical attributes were
"created by measurement" (Bohr, 1949).
Nonetheless, formally the logics are exactly the same; on the partial
Boolean semantics they employ, sentences conjoining a position ascription
and a momentum ascription are not well formed, and hence are meaning-
less. Thus, although this algebraic logic can be made to collapse to a three-
valued semantics, what results is very unlike the logic Reichenbach pro-
posed as an alternative to the Bohr-Heisenberg interpretation. As we have
noted, on Reichenbach's logic, conjunctions of complementary propositions
are perfectly well formed, though never true; furthermore, unlike the col -
lapsed algebraic logic, Reichenbach's is truth-functional.
These analogies and disanalogies, however, serve only to underscore our
previous conclusion: that much of the debate between advocates of quan -
tum logic and their opponents has been misdirected. If Kochen, on the one
hand, and Bub, acting on Bohr's behalf, on the other, can start from radically
different interpretations of quantum theory and yet produce formally iden-
tical quantum logics, then this adds strong support to the view that, what-
ever interpretation we adopt, the logic of property ascriptions to quantum
systems will be nonclassical. The choice we confront is not between adopt-
ing, for example, the Copenhagen interpretation and embarking on " the
heroic course" of changing our logic (Putnam, 1969, p. 222); it is betwe n
adopting a deviant logic and eschewing the notion of a property.
)/111111/1111 I.lIxir 17

Though it is 11.111 1"'"/\ In 1ll' li eve that ta lk of properties makes heroes of us


.rll, w ' may wl'lIl'nquirl' what work is being done by this notion in any of the
proffered interpretation . The answer is, surely, very little. Rather, a meta-
physical nostalgia is prompting various responses to the question, how can
we make room for the notion of a property within quantum mechanics? If,
for example, a particular interpretation-cum-logic either yielded something
resembling the Precise Value Principle or resolved the measurement prob-
lem, then there would be clear-cut reasons, not only' for preferring it to the
others, but for accepting it. But none do so.
The most we can say is this. If we retain the notion of a property, then
either (a) the possession of properties associated with an observable A rules
ou t the simultaneous possession of properties associated with observables
incompatible with A; or (b) we have to make sense of the notion of a
disjunctive property, so that, for example, a particle can have the property
(Sx,±t) but neither the property (Sx,+t) nor the property (Sx,- t). Kochen,
Bohr, and Reichenbach adopt alternative (a), though for different reasons;
Putnam, along with other advocates of a lattice-theoretic approa h, is
. forced to alternative (b). Neither alternative, however, is very enticing.
In Section 6.9 I described the task of interpreting quantum theory as th at
of finding, within the models the theory provides, images of th e leme nts o f
a categorial framework. The search for properties has yielded only pallid,
scarcely recognizable variants of these creatures. Perhaps we should ca ll off
the hunt, acknowledge that properties are the unicorns of quantum theory,
and confess that none of us is innocent enough to capture one. In doing so
we need not condemn all of quantum logic, specifically algebraic quantum
logic, as misguided. Even if an emphasis on sentential quantum logics may
have proved unhelpful, a more general algebraic program remains. And,
just as earlier we distinguished between formal sentential logic and the
interpretation of the sentences it manipulated, so now we can distinguish
the core of the quantum-logical program from our interpretation of that core
(this distinction is due to Stairs, 1983b, p. 578). The core is the idea that the
non -Boolean algebraic structures appearing in quantum theory provide the
key to our understanding of the quantum world. This core can be retained
even when we jettison the interpretation which regards the elements of
these structures as properties of systems, the promise of which has proved
illusory. On another interpretation, quantum logic provides, in Bub's terms,
a non-Boolean possibility structure for quantum events. This interpretation
is the subject of the next chapter.
8
Probability, Causality,
and Explanation

The term probability has, up to now, been treated as though it were entirely
unproblematic. Surely this is too optimistic by far. There is, for instance, the
problem of the interpretation of probability: does it represent a degree of
belief, or a relative frequency, or a mysterious propensity, or something else
again? The view taken in this book is that nearly all the probabilities ap-
pearing in theoretical quantum mechanics are objective probabilities. That is
to say, they inhere in the world and do not simply reflect the degrees of
belief of an observer; rather, they determine what this degree of belief
should ideally be: if an event E is assigned an objective probability of, say,
0.1, then a fully informed observer should assign a subjective probability of
0.1 to E and place her bets accordingly (see Lewis, 1980). I wrote just now
that " nea rly all" quantum-theoretic probabilities are objective. The possible
exceptions occur when a system is in a mixed state. If we adopt the igno-
rance interpretation of a given mixture, then we assign a subjective proba-
bility to each of the pure states represented in it, and each of these in turn
assigns objective probabilities to events. Heisenberg, for one, suggested that
the interplay between objective and subjective components of probability
assignments could be made to do interpretive work, and I discuss his sug-
gestions in Section 9.5. Note, however, that, as we saw in Section 5.8, not all
mixtures can be given the ignorance interpretation.
Leaving aside the possible exception of mixtures, I will assume that quan-
tum theory deals with objective probabilities. However, I will not discuss
how the concept of objective probability is to be interpreted (see, for exam-
ple, Giere, 1973, 1976; Skyrms, 1980, chap. IA; van Fraassen, 1980, chap.
6), but will instead focus on a problem raised by quantum mechanics for the
mathematical theory of probability. Quantum mechanics requires us to
modify this theory, or rather to generalize the mathematical account of it
given by Kolmogorov (1933). But, surprisingly, this revision yields rcmnrk
1'lIIllIIllili/l/, ( 'IIU I/Illilll, IlIltil : I/Jllllllllioll 219

" hi " IWlldil s; il IlI·lpl. W I 10 pl'Ov idl' t'x pl.lJ1.ltions o( th e" .l U 'a l a nomalies"
which best't quantulll IIwory . O r so I shall suggest.
I{unning th rough this hapte r, in wh a t I hope will be a euphonious
" Hint 'rpoint, a re three main themes: (1) the generalization of probability
11l,'ory, (2) th e " causal anomalies" of quantum mechanics, and (3) the reso-
Iill io n of these anomalies in terms of generalized probability theory. A
discussion of scientific explanation appears as a coda.

H. I Probability Generalized
The lassical presentation of probability theory was given by Kolmogorov
( 1933). On this account, probabilities are assigned to sets. In Kolmogorov's
origin al presentation, these sets were said to be subsets of a set E of " elemen-
I.lry events." These " elementary events," however, played no further part
111 the discussion; following standard practice, I will use the term event to

1'1.( 'r to any subset of a set E to which a probability is assigned. If a proba bil-
II y is assigned to two events A and B, we also require it to be defined for their
'II nion, A U B, for their intersection, A n B, and for their comple me nts, E - A

,lnd E - B. That is to say, a probability function is defined on a fi eld 'J of


subsets of E.

(1/ , I ) We say that the triple (E,'J,p) is a classical probability space if 'J is a
field of subsets of E and p is a function p: 'J --+ [0,1] satisfying
(11 111) p(E) = 1 and p(¢) = 0;
(Ii. I II) p(A U B) = p(A) + p(B), for all A,B E 'J such that A n B = ¢.

In fac t it's now usual to define the measure on a a-field of sets, that is, one
which is closed not only under finite union and intersection, but also under
(denumerably) infinite union and intersection. In this case (8.1b) becomes:

(H. 1iJ·) If {A;} is any denumerable family of pairwise disjoint members of :J


"*
(that is, if Aj n Aj = ¢ whenever i j), then p(U j{Aj» = ~jp(AJ

Of course, if (8 .1b*) is confined to finite families {A j }, then it reduces to


Kolmogorov's original axiom.
We see tha t a classical probability measure is a (countably) additive real-
va lu ed set function.
Now th e " probabilities" defined by quantum-mechanical states are not
defin ed on sels but on quantum events (A ,L\) ("experimental questions").
Thus, in one obvious way, they don ' t conform to Kolmogorov's definition.
2 () '['lte 11I1/" I,r/' llIlillll II/ ()/l1I1/11I1II ,/,11/'/11 1/

This would be trivia l if th e a lg 'brni . stru tur' of th • ~p t of quantum l'V ' nt s


were isomorphic to a field of sets, tha t is, if th e a lgebra of quantum 'vents
were Boolean. For, as we noted, the fact that Kolmogorov defines events a
sets of "elementary events" plays no part in the ensuing mathematical
theory. What is important in his account is that the algebraic structure of th e
set of events is that of a a-field of sets, that it is a Boolean a-algebra. In fact,
from the point of view of classical probability theory, by defining a probabil-
ity field in terms of a field of sets rather than a Boolean a -algebra, Kolmo-
gorov loses no generality (contra Popper, 1959, app . *iv; see Bub, 1975),
since, by Stone's theorem, any Boolean algebra is isomorphic to some field
of sets. (See Section 7.2.)
As we saw in Section 7.4, however, the algebraic structure of i:he set of
quantum events is non-Boolean; the set of subspaces of a Hilbert space can
be regarded either as an orthomodular lattice or as a transitive partial Bool-
ean algebra, within which not all pairs of elements are compatible. It seems
that the functions assigning probabilities to quantum events are, paradoxi-
cally, not probability functions at all, at least, not in Kolmogorov's sense.
The importance of this was pointed out by Suppes (1966); clearly, we need
to generalize the concept of a probability function so that it is defined on a
wider class of algebraic structures than the class of Boolean a-algebras.
Within this wider class, a a-field of sets, on the one hand, and the set 5(71) of
subsets of a Hilbert space, on the other, should appear as special cases.
I will confine myself here to a generalization of finitely additive probabil-
ity functions, defined on orthoalgebras.

(B. 2) (A,..l,EEV·,O,l ) is said to be an orthoalgebra if A is a set containing


designated elements 0 and 1,..l is a binary relation on A, E9 is a partial
binary operation on A such that a E9 b exists if and only if a ..l b, 1.
is a singulary operation on A, and, for all a,b in A,
(B.2a) if a ..l b, then b ..l a, and a EB b = b E9 a;
(B.2b) a ..l 0 and a E9 0 = a;
(B.2c) a ..l a1. and a E9 a1. = 1;
(B.2d) a ..l a1. EB b only if b = 0;
(B.2e) a ..l a E9 b only if a = 0;
(B.2f) if a ..l b, then a ..l (a EB b)1. and b1. ..l a E9 (a E9 b)1..

These axioms are due to Hardegree and Frazer (1981). From them we may
derive the following theorems:
['w/I11/li/ily, 11111111111.11, (1//I/I ;,I/"(//Il/li(111

(II ,III) 0 1
I; I I 0;
(11 ,1/1) (nl)1 n;
(II ,It') n CD b = n EIJ c only if b = C;

(lI ,ld) a E9 b = 1 only if b = aL ,

The symmetric relation 1- is known as the orthogonality relation, the opera-


t ion CD is known as the operation of orthogonal sum, L is the complementation
°
operation . Note that is the only element orthogonal to itself,
We have already met one example of an orthoalgebra in Section 7,5, and it
will be useful to review that account here. (As then, the reader is referred to
Stein, 1972, pp. 374-378, for a more careful account.) Assume that we can
l'onduct any one of a number of experiments, each of which has a number of
llIutually exclusive possible outcomes. The set E of events is then generated
(rom the set of possible outcomes of all experiments, to form an orthoalge-
bra, as follows.
An event is any set of outcomes associated with a single experiment. Two
l'vents are orthogonal (e 1- f) if they are disjoint sets of outcomes associated
with the same experiment. For any pair of orthogonal outcomes, e and [,
their orthogonal sum, e EI1 f, is defined as the union of the two events. Note
that this operation is not defined for two events associated with different
l'xperiments; EI1 is thus a partial operation on E. The set of all possible
ou tcomes associated with a particular experiment is the certain event for tha t
l'xperiment. The complement eL of an event e is the set-theoretic complement
o( e relative to the certain event for the experiment in question. The empty
set, ¢, is the null event, and is common to all experiments; it is orthogonal to
,III events, and is the zero, 0, of the orthoalgebra. The certain event for any
experiment is also identified with the certain event for all others; it is the
unit, 1, of the orthoalgebra.
Though the elements of this particular algebra are all sets, the structure
(,,' = < E,1-,E9/,0,1 ) is clearly an orthoalgebra and not, in general, a field of
s ,ts. But it does have some properties not shared by all orthoalgebras. For
instance, within 0, the operation EI1 is associative: for all e, [, and g in E,

e EIJ (f EI1 g) = (e EI1 f) EIJ g

whenever these operations are defined. Successive constraints on orthoal-


gebras yield a hierarchy of algebraic structures. (See Hardegree and Frazer,
1981; for a summary, see Hughes, 1985a.) A Boolean algebra is an associa-
tive orthoalgebra in which all sets of elements are jointly compatible: a set B
o( eI ments of an orthoalgebra .A. is said to be jointly compatible if there
222 'J'/II' 1lIll' lf/n'lalioll 0/ 0111111111111 '1'11/'1111/

exists a set C of pairwis . orr h 0).;01) ,I I lI1l'mlwrs o( ,/1 sti ch lh a t 1.'.1 h Jl1cmbl'r /1
of B is the orthogonal sum o ( OHll' su bse t o( ; in other words, (or ea h
bE B, there exist C1 ,C2' . . . ,c" C such that b = CDic/. Wh en B is th e pair
{a,b}, this condition reduces to the familiar definition (7.30).
In the "operational" orthoalgebra g sketched above, we could regard all
events as compatible if all the possible experiments could be performed
simultaneously without interfering one with the other. In that case the
algebra of events would be embeddable within a Boolean algebra, in fact
within a field of sets.
Less stringent constraints than the requirement of universal joint compat-
ibility yield the transitive partial Boolean algebras (equivalently, coherent
orthomodular posets) of quantum logic.
We now define a generalized probability function p.

(B.4) A function p:A -+ [0,1] is said to be a generalized probability function


if the set A forms an orthoalgebra (A = <A,J.)B, 1-,0,1 and »,
(B.4a) p(O) = 0, p(l) = 1;
(B.4b) for all a and b in A, if a J.. b, then p(a EB b) = p(a) + p(b).
An infinitary version of this is not problematic (see Gudder, 1976). It re-
quires us to define an operation of infinitary orthogonal sum on an orthoal-
gebra, defined for countable sets of pairwise orthogonal elements; implicit
in this definition is the condition that the orthoalgebra be associative.
Any orthoalgebra A contains Boolean algebras as substructures. The
res triction of a generalized probability function p on A to a Boolean subal-
gcbra of A is a Kolmogorov probability function. In fact:

(B. 5) If 13 is a partial Boolean algebra, any function p: B -+ [0,1] whose


restriction to a Boolean subalgebra of 13 is a Kolmogorov probability
function is a generalized probability function on 13.

8.2 Two Uniqueness Results


The probability functions we have dealt with throughout this book are
functions p: 5(7/) -+ [0,1] mapping the set 5(7/) of closed subspaces of a
Hilbert space 7/ into the interval [0,1]. Since 5(7/) forms a partial Boolean
algebra, but not a Boolean algebra, these functions are generalized probabil-
ity functions rather than Kolmogorov probability functions. Within this
PBA, however, there are (maximal) Boolean subalgebras. In fact, any set of
subspaces which can be generated from a set of mutually orthogonal rays
1'l'IIllIIllilily, ('111111111111/, Illltil :" /J/lll1l1lioll J

Hpdnning '/I by . !,dll, II1ll'rSl'clion, and ortho omplcmentation forms a


Boolean <llgebrLl, and lhl' r 'slriction of any generalized probability function
(Cpr:) to this subalgebra is a Kolmogorov probability function. In the termi-
nology of Section 5.5, any GPF on 5(71) is a frame function, and we have a
representation theorem for all functions of this kind.
Gleason's theorem tells us that the set of GPF's on a Hilbert space 71 of
dimensionality three or higher is in one-to-one correspondence with the set
of density operators on 71; to the GPF p there corresponds exactly one
density operator D such that, for every subspace L of 71 and associated
projection operator P, we have

(X 6) p(L) = Tr(DP)

Note that if 71 has dimension two, while each density operator on 71 yields a
G PF, the converse does not hold, witness the probability function on 1R2
which assigns 1 to points in the first and third quadrants and 0 to point in
Ihe second and fourth.
Gleason's theorem is a very strong result; the measures supplied by Ihe
density operators on 71 are the only natural extensions of classical probabil-
ity functions to the non-Boolean structure of the set of quantum -mechanical
propositions.
In 1977 Bub (1977) pointed out another highly significant result, that the
non-Boolea'n structure of 5(71) also necessitates a revised account of condi-
tional probability. In classical probability theory every Kolmogorov proba-
bility function p defines a conditional probability measure IP; the probability
P)(AIB) of event A conditional on event B is given by

IP(AIB) = p(A n B) [provided p(B) =1= 0]


p(B)

For any given nonzero event B, the function IP(XIB) (where X is any event in
£) is itself a classical probability measure. In fact, it is the only classical
probability measure on the set E of events such that, for all A in E,

If A ~ B, then IP(AIB) = ~~;;


Thus, in the classical case, for events A contained in B, conditionalizing on B
just involves a renormalization of p to p', where p'(B) = 1.
Now let p be a generalized probability function on 5(71), with corre-
sponding d nsity operator 0, and let La be a subspace such that p(La) =1= O.
224 TIll' 11I11'rIIY£'llIli(l1l IIf ()11111/11I111 'f'I1/'/I/'Y

Then there exists a unique PF UD(XI/ .,,) on ('//) sllch that, whenever LA C.
LB ,

IP(L IL ) = p(LA )
A B p(LB)

The proof of this is given in Appendix B.


By Gleason's theorem, this GPF is representable by a density operator DB'
In Appendix B it is shown that

DB = _P...!B,--D_P--.:B~
_
Tr(PBDP B)

where P Bprojects onto LB'


The denominator is just a normalizing factor, to ensure that DB has unit
trace. By the properties of the trace [see (5.6)] and idempotence, we obtain

From (8.6) it follows that, if LA and LB are subspaces of 71 with projection


operators PA and PB' then

(8.7) [Liiders' rule]

Note that in (8.7) there is no restriction on LA; we do not require that LA ~ LB'
However, we see that, as in the classical case, the Liiders rule gives the only
probability measure that, for events LA ~ LB, just involves a renormalization
of the GPF given by the operator D. This offers strong grounds for regarding
it as the appropriate conditionalization rule for GPFs on 5(71). Additional
grounds for thinking of it as the natural extension of the classical condition-
alization rule appear from its behavior in two special cases (see also Bub,
1977, and Section 9.3).
First of all, consider the case when LA and LB are compatible. In this case
we have

where Pc projects onto LA n LB' (If LA 1.. LB, then Pc is the zero operator.)
1II'II IIII IIil ilYI ('IIIlIIlIltil/, ,,,,tll:.I/IIIIIIIII;Oll 22:,
lJ1iing ( .6) WI ' o hl ol lll

(H. 8)

By taking compatible subspaces LA and LB we remain in one Boolean subal-


gebra of S('li ); in this case, whatever our approach to quantum logic, LA 1\ LB
is well defined, and is equal to LA n LB' For such subspaces the Liiders rule
reduces to classical conditionalization.
Let us look at another kind of situation where we can meaningfully speak
of the conjunction of two quantum events. (To reduce the number of sym-
bols floating around, I will use projection operators to represent these
events.)
Consider a composite system with two components a and b; the states of
the composite system will be represented in the tensor-product space 'li a ®
'lib. Let pa be a projector on 'li a representing a quantum event associated
with system a, and pb a projector on 'lib representing a qu antum event
. associated with system b.
Assume that the density operator Don 'li a ® 'lib represents th e sta t o(
the composite system. Then the joint probability of pa and pI, is given, in
accordance with (8 .6), by

The probabilities of the individual events are given by

p(pa) = Tr[D(pa ® Jb)] and p(Pb) = Tr[D(Ja ® Pb)]

where I" and Jb are the identity operators on 7-/" and 'lib, respectively.
Now by Liiders-rule conditionalization,

where D' = D(Ja ® Pb).


Using the properties of the trace, operator multiplication on 7-/a ® 'lib,
and idem potence, we see that

Tr[D(Ja ® pb)(pa ® Jb)(J" ® Pb)]


lP(pa IP b ) = -----''---'---'-'------'-'-----'-''.
TrD'
= Tr[D(pa ® Pb)]
TrD'
226 '/'li e III/ crllrt' /II/ioll IIf () /III11/1I111 '1''' 1'111 .'1

But TrD' = Tr [D(IQ® Pb)] = ,,(Pi,), and il foll ows tha t

(8.9)

exactly as in the classical case.


Before we leave this formal development of generalized probability
theory, one thing should be emphasized. The conditional probability given
by the Liiders rule is a probability of a quantum event Q giver, another
quantum event P. Though each event can be regarded as a pair (A,Ll), this
internal structure of events is irrelevant to the generalized probability
theory given here. In particular, nothing in this discussion of quantum
conditionalization bears directly on the question of whether the expression
p(A,Ll) should itself be regarded as a conditional probability and be read as
"The probability that a result in the Borel set Ll will occur, given that a
measurement of A takes place" (p(RL1IMA), for short). I postpone this ques-
tion to Section 10.3.

8.3 The Two-Slit Experiment: Waves and Particles


A discussion of quantum theory which made no mention of the two-slit
experiment would not quite be Hamlet without the Prince; nonetheless, it
might be thought an eccentric departure from tradition. However, I include
the experiment here not from a desire to preserve ancestral pieties, but
because of its relevance to our present concerns, a relevance which will
appear in Section 8.4.
In the experiment-or rather in the idealized version of an experiment*
- a source E emits electrons at a steady rate toward a sensitive screen S.
Between the source and the screen is a diaphragm, in which there are two
slits, A and B. Three experiments are performed. In the first, a, only slit A is
open; in the second, b, only slit B is open; in the third, c, both A and Bare
open. The time of each experiment is long enough for averaging effects to
come into play, and in each case the distribution of "hits" on the screen is
recorded. The distribution pattern for c (shown at the far right of Figure 8.1)
is not just the sum of the patterns for a and b, as it would be if the electrons

• A neutron interference experiment which is the exact analogue of the two-slit experiment
has been performed by a group led by Summhammer. It is simply described in Leggett (1986).
1'/"llIIllilily, '11 I/ SIt/1 III, /11111 1:.1 1"11 11111 i(lll J27

Figure 8.1 The two-slit experiment; curves show distribution of "hits" in experiments a
and b and (far right) experiment c. (From Feynman, Leighton, and Sands, 1965.)

behaved like classical particles. Instead, it resembles the interference pat-


terns characteristic of waves that have passed through two small apertures
(see, for example, PSSc, 1960, pp. 286-294). That is, if we take a small area
X of the screen and write

NA =number of hits on X per unit time in experiment a


NB = number of hits on X per unit time in experiment b
NAB = number of hits on X per unit time in experiment c

we find that

On a wave interpretation of the interaction between source and screen


this is perfectly explicable. If two waves spread out from A and B, only at a
few places on 5 will they arrive in phase; for the most part they will arrive
somewhat out of step-in fact when the "crest" of one exactly coincides
with the "trough" of the other the two will cancel each other out.
Each of the two classical models of causal processes, the particle model
and the wave model, offers a partial description of the source-screen inter-
action, but neither is fully adequate. Either model, taken on its own, leaves
us with a "causal anomaly." (The term is Reichenbach's, 1944, sees. 6, 7.)
Anomalous on the wave account is the fact that electrons are individually
228 rill' I,II/"I,tl'll/Iio/l II/ (,,)/11111111111 '/'11"1111/

detectable at the screen; for this " coll.lps· o f the wave pa ket" th e ae ount
offers no explanation. But the parti I' a ollnt fares no better. Let LI S assllme
that, when both apertures are open, a number N~ of particles reach X per
unit time after passing through A, and that a number N~ reach X per unit
time by passing through B. Then, since on a wholehearted particle analysis
each particle reaching X must pass through exactly one aperture,

But we know that

and so, either N A =1= N~ or N B =1= N~.


On this account, the causal anomaly lies in the fact that the opening of B
either affects the number of electrons passing through A or else affects the
propensity of these electrons to strike the region X on the screen; the parti-
cles passing through A mysteriously "know" whether B is open or not.
Each model accounts adequately for some of the phenomena, but neither
accommodates them all.
One response to this, and similar anomalies, has been to say that quantum
mechanics requires us to forswear a unified description of nature. According
to Hanson (1967, p. 43), "The Copenhagen interpretation of quantum me-
chanics is the view that fundamental nature is indivisibly bipartite-the
wave-particle duality." Despite the fact that it has been held by some of the
th eory's most distinguished practitioners, this view turns out to have
slender justification. Let us look at it as it appears in the writings of Niels
Bohr. (He discusses the two-slit experiment in Bohr, 1949, pp. 216-218.)
Bohr took the necessity of using two seemingly incompatible descriptions
of phenomena as a general epistemological principle and called it the Princi-
ple of Complementarity. In the case of the wave-particle duality, the princi-
ple takes this form:

(8.10) As a description of microentities and microprocesses, neither a parti-


cle description nor a wave description is fully adequate. Between
them, however, they form a complete, complementary description.

Underlying this principle is a doctrine which we may summarize as follows .

(1) Conditions for the applicability of scientific concepts are determined


by the experimental situation (see, for example, Bohr, 1935a, p. 699).
J'mJlllllilily, III/Ill/Illy, IIlItf [:" I,llIlIlIlio/l 29

(2) Lixp'rin1(.'111 i' III Iw III111mbiguou sly des ribcd only in classical te rms
(see Bohr, 11)49, p. 209).
(3) "Any given application of classical concepts precludes the simulta-
neous use of other classical concepts which in a different connection
are equally necessary for the elucidation of the phenomena" (Bohr,
1934, p. 10).

The limitations on classical concepts announced in (3) are due to the inde-
terminacies associated with the quantum of action: there is always a "finite
and uncontrollable interaction between the objects and the measuring in-
struments in the field of quantum theory" (Bohr, 1935a, p. 700), and this
precludes simultaneous ascription of, for example, position and momentum
to a particle. I discuss Bohr's interpretation of these indeterminacy relations
in Section 9.2. (For a full and sympathetic discussion of Bohr's views, see
Hooker, 1972.)
We have met (1) already, in Section 7.9; on Bohr's account, the use of a
particular concept (such as momentum or position) presupposes the exis-
tence of a particular physical situation; only when that situation obtains is
discourse involving that concept meaningful. Similarly with the wave-par-
ticle duality. The language associated with a particle model of physical
processes acquires meaning in specific experimental contexts. Further, con-
cepts are readily linked to particular models-momentum to the wave
model (as the wave number of the wave), and position to the particle model.
Bohr's position is elegantly summarized by Petersen (1963, p. 12):
In the language of physics there are various sets of concepts such as space and time,
and the so-called dynamical concepts like momentum and energy. Corresponding to
these different sets of concepts are different types of measuring tools. For example,
to determine the position of the object, one must use rulers firmly attached together
to form a reference frame. On the other hand, to measure an object's momentum one
may let it collide with a freely movable body of known mass, and then measure the
resultant velocity of the test body . . .
In quantum physics we use the same concepts [as in classical physics1and thus the
same measuring tools, but . . . the dissimilarity between the measuring tools be-
comes crucially important. Here we cannot use the different types of instruments in
combination. We cannot combine the information about the system that we get from
one type of instrument with the information we get from another. Therefore a
quantum physical phenomenon is characterized by the type of measuring instru-
ment we use. Two phenomena obtained by observing the same system with two
different types of instruments are mutually exclusive. Bohr called this logical rela-
tion of exclusion complementarity.
The lesson to be drawn is that if we use the same set of concepts in
23 0 TIll ' Ill/alm' /II/ioll oj (.)111111/111/1 '/'11/'01.'1

quantum physics as in c1assi a i, th en WI.' can never obtain a unified dcscrip


tion of the behavior of a system . But this prompts the obvious questio n: wh y
should we accept (2)? Why are the concepts of classical physics to be ac-
corded privileged status? Bohr writes (and italicizes), "However far th e phe-
nomena transcend the scope of classical physical explanation, the account of all
evidence must be expressed in classical terms." And he argues that only by a
"suitable application of the terminology of classical physics" can we de-
scribe our experiments to others without ambiguity (1949, p . 209).
This seems wholly implausible. It may be that we need to use a classical
categorial framework to describe experiments, and hence that such a frame-
work is implicit in the formulation of quantum theory; in fact, I will argue as
much in Chapter 10. Nonetheless, this is a far cry from saying (i) that the
only vocabulary we can meaningfully employ is that of classical physics, a
vocabulary familiar to physicists at the end of the nineteenth century, and
(ii) that its operational meaning must forever remain unchanged.
One could challenge this (and with it much of what Bohr says) by denying
that there were such things as "operational meanings" (see Hempel, 1954,
for a discussion). Similarly, Bohr's views would be instantly rejected by
anyone who subscribed to a principle of meaning incommensurability, the
view that with a move from one theory to another all terms suffer a radical
change in meaning. (This view is discussed by Hacking, 1983, chap. 5.) For
example, of the move in question, from classical physics to quantum me-
chanics, Schrodinger (1935, p. 155) remarked gloomily, "Does not one get
the impression that here one deals with fundamental properties of new
classes of characteristics, that keep only the name in common with classical
ones?" But one can take less extreme views and still disagree with Bohr.
Without inconsistency, indeed with considerable plausibility, one could
maintain (a) that to know what a term means is, among other things, to
know the experimental contexts in which its use would be appropriate, (b)
that there is some preservation of meaning between theories, and (c) that
new theories both bring new concepts into physics and modify the mean-
ings of the concepts that are retained.
According to Rosenfeld, "The classical concepts to which Bohr appeals
directly . . . are (in the last resort) not formalizable, but immediately given
(as part of common experience)" (pers. com.; see Daneri, Loinger, and
Prosperi, 1962, p. 298). Hooker (1972, p. 135) too suggests that classical
concepts are "regarded by Bohr as being refined versions of our ordinary,
everyday concepts" and that this is the reason Bohr accords them a privi-
leged status. But this position (though not its ascription to Bohr) is unten-
able. Take, for instance, as ordinary and everyday an instrument as an
I 'f'O1/f/IJilily, CIII/IUI/,ly, 1I11t1I;.I/J/lll1l1liO/l .II

oImml'll'r. I Cdn, wil houl ("rlher elaboration, usc the term ali/meier in de-
scribing an experiml'n l to any living physicist. This is because we share a
common theoretical vocabulary which includes electric current. But the con-
ce pt of electric current is not a " refined version" of any concept at all that
was available to, say, Galileo. Still less can Bohr's claim be made on behalf of
magnetic flux density or electrostatic potential, both perfectly good classical
concepts.
My point is simply this, that the vocabulary of the experimenter and that
of the theoretical physicist (to make a dubious distinction) have always been
intertwined; as terms for radically new concepts enter the theoretician's
vocabulary, so they will enter the experimenter's. This process did not stop
at the stroke of midnight on December 31, 1899.
What holds for the concepts used in the theory holds also for the models
in which they appear. One problem which a new theory should not be
called upon to answer is why it makes only partial use of the models used by
its predecessors. Given the historical matrix from which quantum me-
chanics emerged, it is not surprising that a great deal of ea rly quantum
theory was expressed in terms of wave and particle concepts. For ev 'ry
physicist at the turn of the century, these were ready-to-h and pie '8 of
theoretical equipment. For sound pragmatic reasons physicists wer loa th to
discard them. In 1900, however, with Planck's attribution of particle prop
erties to electromagnetic waves, they began to be used in unorthodox ways;
Planck's move was mirrored twenty-five years later by de Broglie's attribu -
tion of wave properties to electrons.
What, then, of the so-called wave-particle duality that results? I say more
about this duality in Section 10.2; however, we can agree with Bohr that
each model, while proving heuristically valuable, offers only a partial anal-
ogy to the behavior of light and matter. Further we can agree (how could we
not?) that the two models are mutually at odds. We can deny, however, that
there is any radical epistemological lesson to be drawn from all this.
These episodes in the prehistory of quantum theory do not teach us to
abjure a unified understanding of quantum phenomena in favor of a doc-
trine of epistemological complementarity, according to which we are com-
pelled to move to and fro between two incompatible ways of picturing the
world. They teach us merely that neither of these ways is fully adequate. We
can draw a different conclusion than did Bohr, even while agreeing with
him that "The two views on the nature of light are rather to be considered as
different attempts at an interpretation of experimental evidence, in which
the limitations of the classical concepts are expressed in complementary
ways" (1934, p. 56).
23 '/'11/' IIII/" I,rrlll!illll II/ ()II/IIIIIIIII '1'1'/'//1.'1

8.4 The Two-SLit Experilllellt: 'o/J(litiollal Probabilities


Schrodinger (1953), with whom the con cpt of the " wave fun tion" origi
nated, maintained to the end of his life that it should be thought of as a
mathematical representation of a physical wave, even if thjs compelled us to
a dualist ontology. In contrast, in 1926 Born pointed out that one could
interpret wave functions in probabilistic terms. In a discussion of collisions
between electrons and atoms he proposed that the function 4.>(a,p,y) gave
" the probability that the electron will be thrown out in the direction given
by the angles a,p,y," and added in a celebrated footnote, " More careful
consideration shows that the probability is proportional to the square of the
quantity 4.>" (Born, 1926a, p. 865; Wheeler and Zurek, 1983, p. 54).
Effectively, this is the interpretation of the wave function (or state func-
tion) used in this book. For Born it was the key to providing a unified particle
interpretation of quantum theory: "If one wants to understand [collision
processes] in corpuscular terms, only one interpretation [the probabilistic
interpretation of 4.>] is possible" (Born, 1926a, p. 865). But, initially at least,
the use of probabilities still leaves the particle interpretation with the anom-
aly we met in the last section.
Let A be the event that the electron passes through aperture A, B the even t
that the electron passes through aperture B, andX the eventthat the electron
strikes the region X of the screen. Then A v B is the event that the electron
passes through either A or B, A & X is the event that the electron passes
through A and strikes region X, and so on. We can write down three condi-
tional probabilities:

(B. 7 7) p(X IA) = p(X & A)


p(A)

(B. 12) (XIB) = p(X & B)


P p(B)

(8.13) p(XIA v B) = p[X & (A v B)]


p(A v B)

By expanding (8.13) we obtain

p[(X &A) v (X & B)]


(B. 14) p(XIA v B) = p(A v B)

and, since

(8.15) X & A and X & B are mutually exclusive events,


1I follow s Ih.1I

(II 16) (XIA v B) = IJ(X & A) + p(X & B)


P p(A v B)

For simplicity we consider the case when

(II / 7) p(A) = p(B)

Again, since A and B are mutually exclusive,

(Ii 18) p(A v B) = p(A) + p(B) = 2p(A)

,) nd so

(11. /9) p(XIA v B) = p(X &A) + p(X & B)


2p(A) 2p(B)
1 1
= '2 P(XIA) + '2 P(XIB)

If we cash out (8.19) in terms of the relative frequencies with which th e


event X occurs in experiments a, b, and c (see Section 8.3), we get

(The factor t disappears because we deal with twice as many electrons in


experiment c as in a and b.) But, as noted previously, (8.20) is at odds both
with quantum theory and with experiment.
One thing that the derivation of (8.19) reveals is that the problem is not
just a problem for any particular model of causal processes. For equations
(8.11)-(8.19) were established without mention of the particle model; they
dealt solely with the probability of an event X conditional on other events A,
B, and A v B. They give us a purely probabilistic analysis, albeit one that can
be supplemented by a causal story involving particles.
Where might the derivation of (8.19) be challenged? First, forsaking the
particle model, one might deny (8.15); if A andB (and hence A &X, B &X) are
not mutually exclusive, then the additivity law appealed to in (8.16) and
(8.18) doesn't hold.
Attempts to check (8.15) experimentally have distinctly odd effects. Let us
assume, for exa mple, that counters are set up immediately behind the aper-
tures A and B to register whether both events can take place simultaneously.
234 '{'III' {IIIN/m "1I1i11 1l II/ (..)111111111111 TItI'IIIY

Then we find that each ele Lron arriving <lL the screen has Lriggcl"l'd ·xactly
one of the counters, and exclusivity seems to be verified . But the pr 'se n e o f
the counters also destroys the pattern at the screen; when they are presenL
(8 .20) holds (see Feynman, Leighton, and Sands, 1965, vol. 3, pp. 1.6 - 1.9).
This effect is certainly peculiar, and, as with many quantum effects, it is
tempting to see it as symptomatic of a deep-seated epistemological recalci -
trance at the quantum level. But I think this temptation should be resisted.
The experiment with counters is designed to answer a specific question: are
the events A and B mutually exclusive? The answer it gives is unambiguous:
theyare.
Of course, our interest in this particular question is a by-product of our
search for an account of the interference pattern at the screen, and a remark-
able effect of the experiment is that this pattern is replaced by another.
Nonetheless, although we would like to know why this effect takes place,
that's a separate problem. Even if we couldn't solve it there would be no
obvious reason why we shouldn't take the evidence the counters supply at
face value; they show that A and B are indeed mutually exclusive. (A similar
point is made by Fine, 1972, p. 25.)
If we accept this result, then we need to locate the problem in the deriva-
tion of (8 .19) elsewhere. Putnam (1969) suggested that the illicit move in
this derivation is that from (8.13) to (8.14) (see Section 7.8). As he pointed
out, this move is an application of the distributive law, which doesn't hold
within quantum logic. It is certainly true that if we reject the distributive law,
then the inference from (8.13) to (8.14), and hence the derivation of (8.19), is
bl ocked . The trouble is, this is a purely negative result. It merely tells us that
the additive pattern is not guaranteed at the screen. It gives us no reason
why, in general, the interference pattern occurs, nor why the interference
pattern becomes the additive pattern when the screen is moved close
enough to the diaphragm. (Compare Bub, 1977; see also Gibbins, 1981b,
and 1987, pp. 147-151.)
Clearly, the simple rejection of a particular law of logic will not supply
much in the way of an explanation of what goes on. And, from Putnam's
1969 paper, one might well think that the only important thing about
quantum logic was that it gave up the distributive law. However, as Putnam
has recognized (Friedman and Putnam, 1978), the quantum-logical ap-
proach can offer a much deeper analysis of the problem than this. This
alternative analysis suggests that, rather than sniffing suspiciously at indi -
vidual moves in the derivation of (8.19), we should reject the whole deriva-
tion. For the probabilities we are dealing with are assigned, not by probabil-
ity functions on a classical probability space, as the derivation assumes, but
by generalized probability functions defined on a Hilbert space. In fact Bub
I'm/Jllililily, Ollillfllily, 1/1It11 :.I/JI/IIllllilll/ 2.1h

h,lS s hown Ih.II , hy rl'pl.lcillg class ica l conditional probabilities by quantum


co nditional probabilities, and then allowing for one further factor affecting
th e probability of event X, we obtain the quantum statistics. (See Bub, 1977,
a nd 1979, pp. 100-104; Beltrametti and Cassinelli, 1981, pp. 283-285.)
The further factor is the evolution of the system's state between the dia-
phragm and the screen.
Since the calculations involve an observable with a continuous spectrum
(the y-coordinate of position), they look more complicated than those we are
used to. However, the principle behind them is very simple. We assume that
the electrons arrive at the diaphragm in a pure state 'P. At the diaphragm a
quantum event occurs. In each of the three experiments this event is asso-
ciated with the y-coordinate of position; events A = (y,Ll A), B = (y,Ll B ), and
A v B = (y,LlA U Ll B) occur in experiments a, b, and c, respectively (at least for
those electrons which make it to the screen). Conditionalization on any
event, using the Liiders rule, yields a new generalized probability fun tion
in the Hilbert space, in other words, a new state. Conditionalizing on Illl'
event A yields the pure state 'PA , while conditionalizing on B yields Ilw pun'
state 'PB • If at this point we appealed to classical probability IIwory, 11ll'''
conditionalizing on A v B would yield a mixture of 'I'A and '1'/1 , .I"d IIw
resulting probabilities would be half of the sum of those obtai lwd from 'I'A
and 'liB' in accordance with (8.19). The surprising and noncla ssi al k,ltufl'
of Liiders' conditionalization is this, that conditionalizing on A v B yields nol
a mixture of 'IIAand 'liB' but instead a superposition of the two, the pure state
'PA vB'
Now, if the screen were very close to the diaphragm, the probability of X
would be given by 'IIA in experiment a, by'll8 in experiment b, and by 'IIA v B in
experiment c. However, since an event at the screen occurs a time t after the
corresponding event at the diaphragm, we must use the Schrodinger equa-
tion to see how each of the three states 'PA, 'liB' and 'PAv Bevolves in this time
and calculate the probabilities of X accordingly. It is this temporal evolution
which produces the effects which prompt a wave account of the phenom-
ena, and which we can refer to as the "diffraction" of the state function.
Indeed, as we have noted, when the screen and diaphragm are very close
together, there are no such effects, and the probabilities given by 'PA and 'PB
add in a thoroughly classical way.
Let us run this formally. The spectral measure associated with the position
operator was discussed in Section 1.15. The projection operator corre-
sponding to event A is the operator P A such that, for the pure state 'P(y),
P A 'P(y) = 'P(y) for y E LlA
PA 'P(y) = 0 for y ~ LlA
236 'I'ltl' 11I/t'Illrt'/lllioll 0/ (..JIII/II/llill '1'1t1'lIIy
Conditionalizing on A (u sing lilt' Llid 'rs ruk) yi 'Ids a " trun at 'd" W<lVl'
function 'PA, which vanishes ou tside ~A' and within it is just 'JI renormal -
ized. Similarly, conditionalizing on B and A v B yields 'PB and 'JIA v 8, resp'
tively.
Now if

then A and Bare equiprobable, and

which is a superposition of 'PA and 'PB •


In the case when A and B are not equiprobable, we still get a superposition
of 'PA and 'PB, but one which is unequally weighted.
Now let V t be the evolution operator which modifies the state between
the diaphragm and the screen, and let ~x be the range of y-values covered
by the region X. Then, in the experiment c, the state of the system at the
screen is Vt'PA v 8, and, using the usual statistical algorithm (2.1) together
with the definition of inner product in U (Section 1.11), we obtain

(8.21) Pc(X) = Pc(y,Ll x) = (Vt'PAvBlPxVt'PAVB)


= IPxVt'PA v BI2

= Lx IVt ~ ('PA + 'PBfdY


Within the expansion of (8.21) we find the so-called interference term:

This term gives the difference between Pc(X) and 1-[(Pa(X) + Pb(X)] , It only
vanishes for all Llx when t = 0, that is, when the screen is very close to the
diaphragm.
Finally, consider the case when there are counters present. Assume, for
example, that, with both apertures open, the counter beside the A-apertur
registers an electron. Then, after the event A v B, anoth er event A has
occurred, to wit, the restriction of the electron to the region round th
A-counter. Provided that the counter is sufficiently close to the ap rturc for
l'III/I11/lilily, '(/ll/lI/l/ly, 111111 I:.1/111I//lIlio// 37

fl O signifi n nt l 'vo llltl \ Hl o f til l' state to 0 ur between the two, the effect will
b' a two-stage tra nsit ion,

With the counters present, the ensemble of electrons will be divided into
two subensembles in states 'l'A and 'l'B' At the screen this will 'give the
statistics of a mixture of the states Ut'l'A and Ut'l'B' and the additive pattern
characteristic of classical particles will appear. What was previously an odd
and inexplicable effect drops out quite naturally from the analysis.
How does this analysis relate to previous chapters, and where does it
leave us? The lesson of Chapters 6 and 7 was that we might be better off if
we dispensed with talk of the " properties" of a quantum system. Probably
the hardest property to free ourselves from conceptually is that of the
system's position in space. For if we stop attributing a position to a system at
all times, we will no longer be able to describe the electron in experiment c as
passing either through aperture A or through aperture B. Thus w will no
longer be able to regard it as mediating a causal process, at least insofar as we
require such processes to be characterized by spatio-temporal ontinuity.
We will be left with a story told, not in terms of causal processes, but in t ' rms
of quantum events and their probabilities conditional on other qu antum
events.
Of course, similar stories can be told for classical processes involving
probabilities. The difference is that in the classical case, when the probabili-
ties are Kolmogorov probabilities, causal supplements of these stories are
available; for an example, consider the way in which the derivation of (8.19)
earlier in this section could be supplemented by a causal account in terms of
particles. However, when quantum probabilities defined on a Hilbert space
are involved, no such causal supplementation is possible. Nevertheless,
contra Kant, this doesn't make a quantum story unintelligible. And, for all its
unfamiliarity, the account of the two-slit experiment outlined above has one
great merit: it tallies with the facts.

8.5 The Bell-Wigner Inequality and Classical Probability


Like the two-slit experiment, the Bohm version of the EPR experiment raises
questions both about causality and about probabilities in quantum me-
chanics. In Chapter 6, the problem posed by Bell's theorem was presented as
a problem for a version of local realism, the thesis that (1) quantum proposi-
tions (A,L\) represent properties possessed by individual systems and which
measurement revea ls, and that (2) the properties of one system cannot be
JJH '/'I/( ' 111//" 1'/'/' /1111111/ tI/ JIIIIII/,i/ll /'ii t'/l ii/

affected by wh at is <.I on ' to .1 st'l'o nd systt'J11 spaliall y s('para led fron, Ilw
first. Although Wigner's formul a tion of Ihe theorem, in terms of probabili
ties, was used, these probabilities were interpreted as th e tatistica l inler
pretation suggests, that is, as relative frequencies of the occurrence of p rop-
erties within an ensemble.
However, we can now redescribe Wigner's result in terms of probability
theory alone. His proof demonstrates that no probability function on a
certain kind of classical probability space can yield the probability assign -
ments of quantum theory.
As we saw, Wigner considers assignments of probabilities to sextuples
(i,j,k;l,m,n) . Each member of a sextuple is either + or -; i, j, and k represent
values of certain components of spin, S!, St, and S:, for particle 1, and I, m,
and n values for the same components of spin for particle 2, S;, S~, and S ~ .
These 2 6 sextuples provide a partition of a classical probability space, that is,
a set of mutually exclusive and jointly exhaustive events. It turns out that no
classical probability assignment to this partition can yield the quantum
statistics for S!, S;, et cetera.
Bub (1974, chap. 6) has argued that this version of Bell's theorem just
provides further evidence that quantum mechanics requires a nonclassical
account of probability. Indeed, the problem with the postulated probability
space is not far to seek. Effectively, the sextuples defining the members of
the partition are assumed to be sixfold classical conjunctions; thus the set
( S! ,+), (SL+), (S: ,+)}, for instance, is assumed jointly compatible. In the
event structure of quantum mechanics, however, this is just not so. Nor,
cru cially, can the quantum Hilbert-space structure be embedded into a
cl assical (Boolean) structure on which this partition might be defined; we
know this from Kochen and Specker's (extended) theorem. The postulated
classical probability space was therefore doomed to inadequacy, indepen-
dently of considerations involving coupled systems. Bub concludes that the
Bell argument "has nothing whatsoever to do with locality" and empha-
sizes the point by generating a similar inequality for a single particle, using a
classical partition with 23 members, each of form (i,j,k) (p. 83).
Accardi and Fedullo (1982) have done likewise. Nonetheless, as Bub now
acknowledges, more can be said about the two-particle version of Bell 's
inequality, in particular about the problems it raises for our concept of
causality.

8.6 Bell Inequalities and Einstein-Locality


Let us review and amplify the account of the Bohm version of the EPR
experiment given in Section 6.3.
1'10/11//11/1/1/, nil/III/lillI, II/Itl /'1///111"1111111 ~, I(I

1': ll'ctron pO:-.illOll P"It :, ,III' produced, with till' (,OI1lPllsi tl' Systl'Jll in till'
so called s ingll'l s pin s t.Hl' '1', This is a pure s lale in the te nsor produ cl s pan'
'I f" ® 'H P:

where v+ and v _ are the eigenvectors of some component of spin, S~ for the
electron, and u+ and u_ are the eigenvectors of the same component of spin
for the positron, S~ ,
To the vector 'I' corresponds the density operator D", on 'H' ® 'II" :

This yields the two reduced states D' and DP for th two compOlll'llt l III I h, '
coupled system; however, these are mixed states ra tlwr Ih ,\11 1'1111' 1.111 " (.11'"
Section 5.8). In fact, they are mixed states withouluniqlll' IlItho)',II I1 ,1I dl '
compositions; we have:

1 1
D'=-P'
2 a+ +-P'
2 a-

1 1
DP = - P~+
2
+ -2 P~_

where a and ft are any directions in physical space. If we represent possible


states of a spin-t particle by points on or within the unit sphere of II~P, as in
Section 5.3, then De and DP both lie exactly in the center of the sphere. That
is to say, the individual particles are completely unpolarized: whatever
component of spin is measured on, say, the electron, the probability of the
result +t is exactly equal to the probability of the result - to
However, as we saw in Section 6.3, a strong anticorrelation exists: for any
direction a, .

and, in general,

1 1 /"'0..
(8. 23) P'1'\IS'a' +.1.
2'
S~
,,'
+ .1)
2
= -sin 2 -
2 2 aft
The problem is, how are we to 'xpldin th se orrela tions? W a n ort oul
putative explanatory accounts into rough groups; interaction accounts sug-
gest that the correlations are due to interactions between the compon nt
systems after they have separated, while preparation accounts trace the
correlations back to the original preparation, either of the composite system
(type 5), or of the experimental set-up (type E). Each kind of account, it turns
out, runs counter to our basic beliefs about causality. (Note that a causal
preparation account would involve what Salmon, 1984, chap. 6, calls an
interactive fork. Ah, well.)
As an elementary example of an interaction account, let us hypothesize
that the performance of an experiment on one particle (the positron, say)
changes the state of the other. Assume, for the sake of argument, that the
a-component of spin is measured for the positron and found to have value
+t. Then the probabilities assigned to measurement results on the electron
of the same pair will change. Whereas we had, for anykection p,
p(5p,+t) = 0.5, the correlation now gives us p(Sp,+t) = sin 2taP But these
are just the probabilities assigned to events (5p,+t) when the electron is in
the Q_ eigenstate of spin (see Chapter 4).
On our hypothesis, the measurement on the positron has effected a
change in the state of the electron. Prior to the measurement it was in the
mixed state De; subsequently it is in the pure state P~_. However, the
hypothesis seems to raise as many problems as it solves. In particular, how
can we account for this interaction without contravening the special theory
of relativity (STR)? For it is a fundamental result of that theory, variously
called the principle of Einstein-separability or Einstein-locality, that no causal
signals can propagate at a speed faster than light. And, in the first place,
most of the experimental tests confirming quantum-mechanical predictions
for coupled systems have looked not at spin correlations for an electron-
positron pair, but at polarization correlations between photons; these pho-
tons travel (of course) at the speed of light, and so only a signal traveling
faster than that could pass between them (see Clauser and Shimony, 1978;
d'Espagnat, 1979). Second, even if the interaction involved an electron-
positron pair (and some have been done using proton-proton pairs), it
should be possible to perform an experiment on particle 2 which, although
performed later than the experiment on particle 1 in the laboratory frame of
reference, is nevertheless space-like separated from it (Taylor and Wheeler,
1963), so that, according to STR, no causal transaction could take place
between the two.
STR is one of the most firmly established and best corroborated theories
of modem physics. We should be, at least, deeply suspicious of a ny account
of the EPR correlations which violates it. However, as Bell (1964, p. ] 99)
I'ro/Jo/lilily, (II/mtllily, 1111111 :.\/1/111111/;011 "I

point 'd out, it would nol be n dire t ontraven tion of STR to postulate that
Ihe setting of one measurement device affected the results obtained on the
other. Such interactions would violate locality in one sense, in that the
devices would not function independently of one another, but it would not
necessarily violate Einstein-locality; the postulated interactions could prop-
agate at a speed less than that of light and achieve their effects before the
actual measurements occurred. The proposed solutionis, in effect, a prepa-
ration account of type E, and traces the correlations to the experimental
set-up. It recalls Bohr's dictum: "The problem again emphasizes the neces-
sity of considering the whole experimental arrangement, the specification of
which is imperative for any well-defined application of the quantum-me-
chanical formalism (Bohr, 1949, p. 230). Though Bohr is (again) making a
point about the conditions for meaningful discourse, rather than offering a
causal account of the correlations, any experiment which puts this particular
causal account to the test will also tell us whether Bohr's holistic resolution
of the EPR problem is adequate (contra Leggett, 1986, p. 44; for Bohr's
treatment of EPR see Bohr, 1935a, 1949).
Such an experiment, using correlated polarizations of photons, was flUg
gested by Aspect (1976). His idea was to change "rapidly, repeatedly < nd
independently the orientations of the polarizers." Each change of orien ta-
tion of a polarizer was to be space-like separated from the correspond ing
experiment carried out with the other. Aspect continued, "Thus one finds as
a consequence of the principle of separability [Einstein-locality] that the
response of one polarizer, when analyzing a photon, cannot be influenced
by the orientation of the other polarizer at the same time (when analyzing
the coupled photon)." The experiment was performed by Aspect, Dalibar,
and Roget. They reported that, "The result violates the generalized Bell
Inequality . .. and is in good agreement with [the quantum-mechanical
prediction]" (Wheeler and Zurek, 1983, p. 442n).
On the one hand, their result both undercuts Bohr's response to the EPR
paper and effectively rules out type-E preparation accounts of the statistical
correlations between the measurements. In order to avoid invoking super-
luminal signals, these accounts appeal to the prior configuration of the
apparatus; however, the statistical relations are the same even when there
is, so to speak, no prior configuration. On the other hand, the result also
confirms our earlier suspicions about interaction accounts. It suggests that
all interaction accounts of the EPR correlations, whether they trace these
correlations to interactions between the component systems or between the
measurement devices, will violate the principle of Einstein-locality.
A statement which is at the same time more general and more precise than
this has been proved by Hellman (1982a). He shows that, if any determinis-
24'J. '1'/1/' 111/1"1"'1'111/10/1 tI/ ()/It1I1/1I111 '/'ItI'lI ly

tically Bin le in -Io a l lheory givt'll ,lllli ('orreialion resulls (or lwo dill linci
observables, so that, for exa mple,

p[(S~,+) and (S;,+)] = a

and

p[(st,+) and (Sl,+)] = a


for distinct a and b, then the theory also yields a version of the Bell inequality
known as the CHSH inequality, and is inconsistent with quantum me-
chanics. (The CHSH inequality was first derived by Clauser et aI., 1969;
Hellman's proof uses a theorem by Eberhard, 1977.)
Einstein-locality is here precisely defined in terms of models of the physi-
cal theory T. These are the possible worlds consistent with T. We assume
that T specifies a background of a four-dimensional Minkowski space-time
(see Taylor and Wheeler, 1963, chap. 1). Pairs (x,t) in space-time, where x is
a position vector and t a time-coordinate, are referred to as events; thus
"event" is not here to be taken as synonymous with "experimental ques-
tion." We can talk of two models agreeing at an event e (that is, at a particular
point e in space-time) if the same sentences are true at e in each model.
Consider any event e and any "slice" S. through the backward light cone
from e. S. is part of a plane of simultaneity for some observer; it is a set of
events, all with light-like or time-like separation from e, and all prior to e.
(See Figure 8.2.) Then T is said to be deterministically Einstein-local if, for

Figure 8.2 Light-cone of event e.


eVl'ry {'Vl'ntl' .I1ll1 ('V(" Y ~" ,ln y two models agn' 'ing at all events in Sr also
agree at e.
Th e intuition behind this definition is that if the definiens is satisfied, then
a ny differences at e are attributable to differences in events that could,
according to STR, be causally related to e.
Notice that Hellman's theorem does not merely sharpen the problem we
run into if we try to explain EPR correlations by an interaction hypothesis. It
also tells us, first, that quantum mechanics is not a deterministically Ein-
stein-local theory, and, second, that no such theory can generate the quan-
tum-mechanical predictions. In the language of Section 6.8, it rules out the
possibility of a non contextual, deterministically Einstein-local hidden-vari-
able reconstruction of quantum theory.
As Hellman (1982b) emphasizes, his proof disbars deterministic Einstein-
local theories, but not stochastic Einstein-local theories. A stochastic Ein-
stein-local theory requires the probability of a particular measurement out-
come on particle 1 to be unaffected by whether or not a measurement is
conducted on particle 2. Hellman shows that the requirement of sto ha stic
Einstein-locality is not on its own sufficient to yield Bell-type inequalities.
As we shall see in Section 8.8, this is confirmed by the fact that quanlum
theory is itself stochastically Einstein-local.
To generate Bell-type inequalities we need to supplement this condition
by another; Jarrett (1984) suggests that the condition most frequ ently (and
often implicitly) invoked is essentially a completeness condition. "Complete-
ness" here is not to be understood in the sense in which Einstein used it (see
Section 6.2); like stochastic Einstein-locality, it is a requirement of condi-
tional statistical independence, but whereas stochastic Einstein-locality re-
quires the probability of a particular outcome for particle 1 to be indepen-
dent of whether or not a measurement is conducted on particle 2,
completeness requires it to be independent of the outcome of such a mea-
surement, given that the measurement actually takes place. To bring out this
difference, Shimony (1986) refers to the two conditions as "Parameter
Independence" and "Outcome Independence," respectively. I will state the
completeness condition (Outcome Independence) in terms of measure-
ments of spin, though Jarrett's presentation is more general. To avoid clum-
siness, however, a change in notation is called for. *
• " Parameter Independence" is also known as " Surface Locality"; see Section 8.7. The
variety of names may seem unfortunate; each was chosen to bring out one feature of the
condition in question. The formulation of the completeness condition I use is taken not from
Jarrett (1984) but from his 1989 paper, which contains a particularly good-and accessible-
di cussion of determinism, locality, and completeness.
24" '/'111' 11111", 1,.1'/111/1111 II/ 0111111111111 'l 'III 'lIIy

We write a r fo r "a n S" m '.1SlIrl'lll l'lll is pl,r(ormi..'d on Ilw l'll'clrull," .Int!


p Pfor " an Sp-measu re ment is p 'rform ed on Ihe po itron"; WI.' also write I '
for "the outcome of the electron-measurement is + L" and -f J1 for " th '
outcome of the positron-measurement is + t." Thus, when a' is the case, +r
is the event (S:" +t).
Now let A be the conjunction of all statistically relevant information that
the theory supplies via the state description of the systems plus their source.
Then the theory is complete provided that, according to the theory,

And, in this notation, the condition of stochastic Einstein-locality appears as


the pair of equations

(8. 24c) p(+'IA,a',pP) = p(+'IA,at)

All these probabilities are classical conditional probabilities. The conditions


(8.24a) and (8.24b) tell us that, given a certain preparation of the system plus
environment, and given certain settings of the measurement apparatuses
(that is, given A, at, and PP), the occurrence of a particular outcome of the
positron-measurement will not affect the probability of occurrence of a
pa rticu lar outcome of the electron-measurement, and vice versa. To quote
Jarrett (1984, p. 588):
The point is, that if the state descriptions are complete in the relevant sense, then
conditionalization on the outcome of a measurement on [one] particle entails no
further restriction on the physical possibilities that would serve to better define the
probabilities for the outcome of possible measurements on [the other].
With justification, he regards this kind of completeness as characteristic of
classical theories.
The Bell inequality can be derived from the requirement of stoch astic
Einstein-locality plus completeness (Jarrett, 1984, p . 582, though the proof
is not given). Since quantum mechanics contravenes the Bell inequality, but
is stochastically Einstein-local, it is therefore, in Jarrett's sense, incomplete.
But, as he says (p. 585),
Incomplete theories (e.g. quantum mechanics) are not ipso facto defective. On the
contrary, when the results of Bell-type experiments are taken into account, the tru ly
rt'mark.lbll' implkllt 11 11 "I 11, ,11' Theorem is that incompll'll'nt.'ss, in SO ITII' Sl' nSl', is n
genui n fea ture of the wllrld ItsdL

To take stock of th various possible accounts of the EPR correlations, we


have so far seen that,

(8. 25a) deterministic Einstein-local interactive accounts are ruled out by


Hellman's result;
(8. 25b) type-E preparation accounts are ruled out by the Aspect experiments;
(8. 25c) any kind of stochastic Einstein-local preparation account that in-
vokes complete state descriptions is ruled out by Jarrett's result.

In anticipation of Section 8.7 I might also add that the compl \ n 'ss
requirement (8.24) strongly resembles part of Reichenba h's • no of
Salmon's specification of what it is for A. to be the CO lll11l 0 11 CIIII S(' of \ wo
statistically governed events +e and +p. As we shall C', il is nll\ IIllly
deterministic causality which is threatened by th viol. lion of IIII' 1It' 11
inequalities.

8.7 Bell Inequalities and Causality


Bell's inequality, or, more accurately, Bell-type inequalitie , have been us d
in a variety of arguments. But, if we leave to one side the versions involving
just one particle, these arguments all have a common structure. (This was
pointed out by Shimony, 1981; for a comprehensive discussion of Bell's
inequality, see Cushing and McMullin, 1989.) The arguments all involve an
experimental situation in which pairs of particles are jointly prepared and
then separately tested. The inequality is then derived from two distinct sets
of premises. The first set, Pexp ' consists of statements of correlations (or
anticorrelations) between experimental results on the two particles, the
second, P met' of premises of a more metaphysical kind. From the union of
Pexp and P met the inequality I is derived:

Pexp U Pmet I- I

Quantum mechanics predicts the correlations of Pexp but also predicts results
at odds with I. Experimental results which bear out the quantum-mechani-
cal predictions thus tell us that I does not hold but that the premises in Pexp
do. It follows that some or all of the premises in P met must be discarded.
The usual Duhemian reservations of course apply. We might, for in-
stance, consider the allegedly theory-neutral correlation experiments to be
46 'J'll/' 111//" 1'/'('/1111011 0/ )/1/1/1/11'11 ,/,11,.111.1/

so infected with theorcli al ,1SSlIInpti ons th a t I'm" oultl bl' rescued (Sl'l'
Shimony, 1981). But in the cases to h nd there seem lillie doubt th a t Wl' an'
genuinely, and remarkably, putting metaphysica l theses to experimental
test.
Furthermore, it follows that as many different theses are being tested as
there are sets of premises from which Bell-type inequalities can be derived .
New derivations of I are thus interesting insofar as they start from different
premises and make explicit the set P met of assumptions at work. For example,
we have already seen tested (a) the thesis that the quantum statistics may be
reconstructed on a classical probability space (Wigner), and (b) the thesis
that quantum mechanics (and the world) is deterministically Einstein-local
(Hellman).
As with both of these examples, negative results do two related things.
They rule out certain kinds of reconstructions or amplifications of quantum
theory (hidden-variable theories), and they also rule out the possibility of
explaining the EPR correlations in certain kinds of ways. Hellman's result,
for instance, tells us that we will look in vain for a deterministically Ein-
stein-local account of them, Jarrett's that we should not accept any stochas-
tically Einstein-local account involving complete state descriptions.
A particularly striking derivation of the inequality, by van Fraassen
(1982), is closely related to Jarrett's. It tells us that the correlations are not to
be explained by reference to a common cause, and threatens any preparation
account which invokes that concept. In van Fraassen's derivation, p .xp con -
tains, together with the usual anticorrelation statements, premises which he
ails sta tements of " Surface Locality." We have already met them as state-
ments of Parameter Independence (stochastic Einstein-locality): they state
that the probability of a particular outcome of an experiment on one particle
is not affected by the fact that a measurement is being performed on the
oth er, whatever the latter experiment maybe. Van Fraassen makes the point
that these premises, like all the others in Pexp ' are, indeed, obtainable by
induction from experiment.
P met contains three different kinds of premises, labeled "Causality,"
" Hidden Locality," and "Hidden Autonomy." The notion of a common
cause which these premises are designed to capture is due to Reichenbach
(1956, pp. 160-161; see also Salmon, 1984, chap. 6, especially pp. 158 -
163). He sought an account of a causal mechanism which affected probabili-
ties, and so would be appropriate in a nondeterministic setting. In particular,
he wished to supply a causal account of statistical correlations.
He suggested that a correlation between events A and B is attributable to n
common cause C provided that C precedes A and B in time, and
l 'III II/1/lilily, ( '1111.'111/1/1/, 1111111:1/11111111/1011 )4 7

(II ' 11(1) I!(II & II/( ') /,(1'1 I( ') . 1'(11 1 )

(II i l,h) 11(11 & iJl e) /!(II Ie) · ,,(Blc)

(.'1.'1)(,) p(AIc) > p(AIC) p(BIc) > p(BIC)


(1Il're C is the negation of C.)
ondition (8.26a) may be rewritten as

(1/ I) p(AIc) = P(;(!I~~C) = p(AIB & C)

(II .JII) p(BIc) = p(A & Blc) = p(BIA & C)


p(AIc)

(H .27) tells us that, given C, the probability of A is unaffected by the occur-


!'('nce of B, and (8.28) that, given C, the probability of B is unaffected by the
occurrence of A. Similarly for condition (8 .26b). Thus, like Jarrett's com-
pleteness condition, conditions (8.26a) and (8 .26b) are requirements of con-
ditional statistical independence: given C, the two events A and B arc inde
pendent; likewise they are independent given C. These ca n be intuitively
justified as part of the specification of a common cause, either by the a rgu -
ments used in the last section on behalf of the completeness condition, or in
t'pistemic terrris. We may think that two events are independent if knowl -
('dge concerning one does not affect our estimate of the probability of the
other; condition (8.26a) then says that no extra information would be avail-
able, should A occur, which would affect the probability of event B, other
tha n that contained in the common cause C.
Van Fraassen's postulated common cause is represented by a "hidden
variable," A. The Causality premises are statements of conditional statistical
independence, like (8.26a) above, and Hidden Locality and Hidden Auton-
omy are designed to ensure the temporal priority of the common cause and
to locate it in the preparation procedure, rather than in, say, the orientation
of the measurement devices, which (as in the Aspect experiments) may be
established after the preparation.
From these premises van Fraassen derives the Bell inequality. It follows
tha t another casualty must be recorded in the list begun at the end of Section
8.6:

(11.29) Type-S prepara tion accounts invoking a common cause are ruled out
by van Fraassen's result.
i.'IlI I I/f' 111ft 1'1"1'11111111/ IIJ QIIII/IIIIIII TlII 'IIIY

However, R ' i hcnbac h's ,In,)l ysis is IIH' bl's l , arguably the onl y, ca usa l
account we have of statisLica l co rr -laLio ns between epa rated evenLs. No
preparation account, it seems, ca n both sa ve the quantum phenomena and
explain them in terms of causal processes. But, from (8 .25a), any interaction
account which does so will need to invoke superluminal causal signals.
Either way, the prospects for a causal explanation of these correlations look
bleak.

8,8 Coupled Systems and Conditional Probabilities


As was noted in Section 8.7, van Fraassen derives a Bell-type inequality
from five (sets of) premises. Three of these (Causality, Hidden Locality, and
Hidden Autonomy) constitute the set P met of "metaphysical" premises cap-
turing the notion of a common cause; the experimental premises of Pexp are
the assumptions of Perfect Correlation and Surface Locality. Van Fraassen
presents the latter assumptions within a very general experimental context,
but we can reformulate them without loss in terms of the electron-positron
pair that has served as our example of a coupled system. As before, we take
the system to be in the singlet spin state. Each of the assumptions is a set of
statements about the probability, under certain conditions, of an event
(5~,+), where a is an arbitrary direction in space. There are three such
probabilities involved: (a) the probability p(5~,+) simpliciter; (b) the proba-
bility of (5~,+), conditional on an 5p measurement being performed on the
system p; (c) the probability of (5~+), conditional on the event (5~,+) . We
denote these probabilities by Pa' Pb ' and Pc, respectively. Perfect Correlation
tells us that Pc = 0 (for all directions a in space), and Surface Locality that
P. = Pb for all directions a and Pin space [compare (8.24c»).
On van Fraassen 's account these statements are to be justified empiri-
cally: " These probability statements are directly testable by observed fre-
quencies" (1982, p . 30). The problem is, how are they to be explained?
I will offer a two-part answer to this. In this section I will show that the
probability statements are obtainable by straightforward application of the
Liiders rule for quantum conditionalization, and then (in Section 8.9) I will
justify the claim that this fact alone constitutes an explanation of them.
Throughout this discussion I will talk of a Hilbert space 71 as a probability
space; strictly, I should talk of the probability space isomorphic to the set
5(71) of subsets of 71 .
As a preliminary, let me deal with a possible objection to applying the
Liiders rule in this context. It might seem that in doing so one would b
guilty of an equivocation, for the Liiders rule gives conditional probabilities
on a non-Boolean set of quantum events whereas, when we regard Perf cL
I'mill/Mllly, ('1111 1,11111/, ,II,tll :,/IIIIIIIII III/ / 11/1)

'orr' I" lion oIlid ~l lli h .. '\' Loca lity ns empirica l principl's, the condi ti onn l
probabi litics "PIll'. 11 ing in Ih 'm are thought of as lassica l conditional prob-
abilities. (Note, in thi regard, that van Fraassen's analysis is entirely in
terms of a classical probability space. ) However, in the cases we are dealing
with here, it turns out that the two conditionalizations coincide. It was
shown in Section 8.2 that, if A and B are quantum events associated with the
two components of a composite system, then the Liiders rule reduces to its
classical counterpart (see also Appendix C); we have

IP(AIB) = p(A & B)


p(B)

Let us then return to the electron-positron pair in the singlet spin state D'l'
(see Section 8.6). This state of the composite system yields the two reduced
states, De and DP, for the components. The probability P. is given by

and for these states, for all a and p,

1
p(5~,+) = 2= p(5,;,+)

II/"...
e +- 51,; +) = -sin 2 - ap
p(5a' 'p' 2 2

Here, and in what follows, the function p is the union of three generalized
probability functions; it takes as arguments events associated with the elec-
tron, events associated with the positron, and conjunctions of an electron
event and a positron event, giving them the values assigned by the states De,
DP, and D'l', respectively.
Perfect Correlation follows trivially from (8.9):

(8.3 0)

And when a = p we obtain


(8.3 1) Pc = 1Pr<5~.,+)1(5~,+)] = 0
50 '1'111' [1I1/'rIJrl'llllillll "/ (.)//11111//111 '['11/'/1 1.11

Note in passing that the prob,lbili[il's 11(5;,,-1 ) and p(Stp, 1-) arc not stclt isti
cally independent; quantum mecha nics, as we expected, violates Jarrett's
completeness condition (8 .24a - b).
The account of Perfect Correlation given above starts from (8 .9), and this
in tum is derived by Liiders-rule conditionalization on a tensor-produ ct
space. Thus the electron-positron pair is treated as a whole even when the
two components are spatially separated. The correlation is not predictable
from the states De and DP of the two components, but from the state D't' of
the composite system; the system e + p is therefore not reducible to the sum
of its parts. Indeed, it is a consequence of the way that quantum mechanics
constructs the probability space 'J{e ® 'J{P for e + p from the probability
spaces 'J{e and 'J{P of the components that this is so. In this respect the tensor
product of two quantum probability spaces differs radically from the prod-
uct of two classical probability spaces. Stairs (1984, p. 357; see also 1983a)
puts the point admirably:

Because of the way Boolean algebras (or, more importantly, classical probability
spaces) combine, every measure on the product space will either render the systems
statistically independent or else will be a statistical mixture of such measures. On the
other hand, if the systems are associated with quantum logical fields of propositions,
then their product need not exhibit this feature. That is, there may be propositions
about the pair of systems which are neither equivalent to nor implied by qmjunc-
tions such as a & b, and there may be measures which are not decomposable into
measures which render the subsystems independent.

In genera l, more information is available when we specify a state D for a


composite quantum system than when we specify the reduced states D" and
Db of its components. For, as noted in Chapter 5, while every state D yields
unique states D" and Db for the components, the converse holds only when
D" and Db are projectors (pure states). Unless this is the case there is more
than one state D of the composite system which will reduce to D" and Db.
This" quantum holism" is at odds with Einstein's view that, once spa tia lIy
separated, the two systems could be regarded as independent, but, perhaps
surprisingly, it is entirely compatible with Surface Locality-that is, with
stochastic Einstein-locality. This principle, like Perfect Correlation, may
also be derived by straightforward application of the Liiders rule. It appears
as a result of a more general principle which applies both to simple and to
composite systems. Assume that, for a given system, a family of mutually
exclusive and jointly exhaustive compatible events exists, representable by
the set {Pi} of projectors on 'J{. (We have ~iPi = I.) We may think of {P i) as
the spectral decomposition of some observable A. Now let Q be any event
compatible with all the Pi' Then, for any initial state D of the sys tem,
I'II/Imllilill/, CIIIIIIII/III/, '1111// :.11'/111/111;111/ 15 /

This equation is provable, either as a corollary of (8 .8) or (more directly) as


shown below.

Tr(P .Dp .Q)


~ Po(Pi)IP(QlPi) = ~ Tr(DPi) Tr(DP;)

= LTr(PiDPiQ)
i

= LTr(DQP;)
i

(by properties of the trace, idempotence of Pi' and compatibility of Q with


1',).
But

L Tr(DQP i) = Tr L(DQP i) [by (5.5)]

= Tr(DQ L Pi)
i
= Tr(DQI)
= Tr(DQ)
= po(Q)

Equation (8.32) may be interpreted as follows. Consider an observable A


represented by an operator A and a quantum event represented by a projec-
tor Q compatible with A. Provided that we can treat each possible outcome
of a measurement of A as a quantum event Pi' the initial probability of Q
when the system is in state D is equal to its probability conditional on a
measurement of A taking place. The latter is calculated as a weighted sum of
conditional probabilities IP(QlPi), and the weight given to each of them is the
probability that the A-measurement will yield the Pi in question.
In the case of a composite system, events associated with one system are
always compatible with events associated with the other, since all pairs of
operators A ® I and I ® B commute. The application of (8.32) to the elec-
tron-positron system is thus straightforward. We take the set {Ie ® Pp+,
[e ® P}J- } to be the family {Pi} of mutually exclusive and jointly exhaustive
events. The observable I' ® Sp is then the observable A, and we measure A
by measuring Sp for the positron. By taking P~+ ® IP as the event Q, and
noting that

p(5;'" +) Tr(D"P;, I) = TrrD",(P;, I ® .'')]


252 '1'/1/' 1lIll'rl"I'llIlioll IIJ (jllllllllllll 'J'/I/'My

we obtain

(8.33) p(S~,+) = p(S}I,+)IP[(S~+)I(S~,+)] + p(S~,-)IP[(S~,+)I(S~,-)]


Given our interpretation of (8.32), this yields Surface Locality.
Note, however, the proviso expressed in this interpretation, that each
possible outcome of a measurement of A be treated as a quantum event P j .
The exact relation between quantum events and measurement outcomes
will be discussed in Section 9.4; for the present, we will treat their identifi-
cation as unproblematic.
The implication of Surface Locality is that no series of experiments per-
formed on the electrons of an ensemble of similarly prepared pairs could
give us information about whether any measurements had been performed
on the positrons of those pairs. For if we took such an ensemble and per-
formed an S}I-measurement on a subensemble of them, then the probability
of any event (S~ +) would be the same for the subensemble as for the whole
ensemble. (Pagels, 1982, pp. 143 -152, is very good on this.)
The derivation of Surface Locality just given shows why this is the case.
Assume that the pairs are prepared in the singlet spin state D'I" Then, within
the sub ensemble,

1
p(S}I,+) = '2 = p(Sp,-)

and

Thus the weighted sum of these conditional probabilities is given by

which is just the probability of (S~,+) within the whole ensemble.


In this illustration I have used the probabilities given by the singlet spi n
state. Note, however, that Surface Locality (unlike Perfect Correlation) ob-
tains whatever the state of the pairs, as can be seen from the derivati on of
(8.33).
I'III/III/lill ly, (tll/ Hllllly, 1I1It! 1:,\1"1111111;1111 75.1

In so ml' rl\ Iwd , III(' ill'l'Ollnl o( Ih . EPR correlations whi h this ana lysis
gives us res 'mbll's <Ill inleraction a count, in others a preparation account. It
is a preparation accou nt in that the source of these correlations is the prepa-
ra tion of pairs of particles in the singlet spin state. It is an interaction account
in that an experiment performed on one particle effectively changes the
state of the other; conditionalization on an event (~,+) associated with the
positron " projects" the state of the electron into the eigenstate a- of spin.
The proof of this last result, to be given shortly, provides a summary of this
section. And an examination of this proof will show the crucial respect in
which the present account of the EPR correlations differs from those pro-
posed in Section 8.6. There it was tacitly assumed that, whatever type of
account was forthcoming, whether interaction or preparation, it would tell a
ca usal story. But, as we saw in Section 8.7, there is good reason to think that
no causal explanation can yield the quantum-mechanical statistics, In con-
trast, the present account has no causal component. To recapitulate, it traces
the EPR correlations to three nonclassical features of quantum probability
spaces. The first is that, in these spaces, probability measures and den ity
operators are in one-to-one correspondence, the second is that conditi ona li -
zation on these spaces is given by the Liiders rule, and the third is the way in
which the tensor-product spaces associated with composite quantum sy -
terns are related to the spaces associated with their components,
As I commented earlier, this third feature tells us that the components of
such systems cannot be treated independently, even when they are spatially
separated from each other. But we can now see that this particular " nonlo-
cality" need not be thought to conflict with the Special Theory of Relativ-
ity.* No superluminal transmission capable of carrying information is in-
volved. If we deal with an ensemble of pairs, this fact is shown by Surface
Locality. In the case of a single pair, the occurrence of, say, the event (S:,,+)
associated with the electron could never on its own tell us that an event
(S~, -) had taken place. True, it would tell us that, if an ~-event of any kind
had occurred-that is, if Sa had been measured for the positron-then it
must have been the event (~,-). But this fact on its own is not inconsistent
with relativity theory, It is easy to imagine unproblematic, everydayexam-
ples involving pairs of billiard balls in which similar "superluminal signals"
are sent and received, as when, from a prior configuration, the red falls into
pocket a if and only if the black falls into pocket b.
It remains, then, to show how the event (S~,+) "projects" the electron's
state from De to P:'_ . Here I will give a purely formal account; the interpre-
tation of this " projection" is discussed in Chapters 9 and 10 .

• See Shimony ( 1980, p, 4); Jarrett (1 984, pp, 575 - 578); C ushing and McMullin (1989), But
se' also himouy (1'I1l6); and lion 10 ,2, b 'low ,
254 ril,' JIII/'Illfl'llIllII" "/ (..)//tllIllIlII 'J'II('olll

For any syst m, complex or colllpositt" in state 0 , th ' probability of any


quantum event B conditional on an event A is that given (via th e u ua l
statistical algorithm) by a state 0', obtainable from 0 by the Llidcrs rule . We
may say that the event A "projects" the system's state into 0' . In the case of
the composite system e + p we shall find that the event (S~,+) projects the
singlet spin state 0", of the composite system into the state 0', where
0' = p~_ ® p~+ ; hence, whereas O",reduced to the mixed states D e and 0 1'
(of e and p, respectively), 0' reduces to the pure states p~ _ and p~ +. The
effect of the event (9a,+) is to project the electron's state from De to P~ _.
To show that, indeed, 0' = p~_ ® p~+, we use a+and oc.., and a!f.and
a~, for the eigenvectors of the a-component of spin of the electron and the
positron, respectively, and define vectors on 7ft ® 7fP as follows.

v++ = a+® a!f.


v+_ = a+® a~

The set {v++, v+_, v_+, v __} is an orthonormal basis for 7fe ® 7fp. (See
Section 5.7.)
The singlet spin state 0", for the composite system is the projector onto the
1
vector Ii (v+_ - v_+).
According to the Liiders rule, conditionalization on (S~, +) projects 0",
into 0', where

0' = (Ie ® P~+)O~Ie ® P~+)


Tr[(O",(It ® P~+)]

We now evaluate O'vij for i = ±, j = ±. From (5.26), (S .2S)-and, for


(8 .39), from Section 1.12*-we obtain

(8.34) (It ® P~+)v++ = v++


(8.35) (It ® P~+)v+_ = 0
(8.36) (It ® P~+)v_+ = v_+
(8.37) (It ® P~+)v __ = 0
(8.38) O",v++=O
1
(8.39) O"'v_+ ="2 (v_+ - v+_)

• Equation (8.39) is easily obtained using the Dirac notation for proj 'ctors, since D",
tl v +_ - v _+) (v+_ - v _+I; see Dirac (1930, p. 25).
Whl'nn'

IInl ess i = - and j = +, in which case

ij

1
2

From (8.40) and (8.41), and the discussion at the end of cclion 1.1 3, w~'
obtain

(8.42) D' = p~_ ® p~+

This result is generalized in Appendix C.

8.9 Probability, Causality, and Explanation


The two experiments this chapter deals with, the two-slit experiment dis-
cussed in Sections 8.3 and 8.4 and the EPR-type experiment discussed in
Sections 8.5-8.8, have a number of things in common. Both give rise to
problematic quantum effects, and in each case the problem can be stated in
two ways. On the one hand these effects resist causal explanation; on the
other they involve probability assignments which cannot be embedded in a
classical probability space. However, straightforward analyses of both ex-
periments can be given in terms of generalized probability functions, func-
tions defined not on the subsets of a Kolmogorov event space, but on the
subspaces of a Hilbert space. In this section I will argue that such analyses
constitute genuine explanations of the effects in question.
This claim is open to an obvious objection. All that accounts in terms of
generalized probability functions do, the objection runs, is to deploy the
mathematical machinery of quantum theory in another guise. True, it is now
evident why Kolmogorov probability theory runs into trouble, but no expla-
nation is being offered, either of the interference pattern or of the EPR
correlations.
In particular, this objection mjght be made by someone who shared
Salmon 's views on scientific explanation . In Scientific Explanation and the
Causal Slru('/lI rl' Ilf Iil e World (1984), almon argues that we provide an
256 rite {1I/t'IIlfe/llliol/ oj )III/It/IIIII Tltt ,tlly

explanation of a phenom enon by tra cing th ' au al pro csscs th at bring it


about. And, as he shows by a wea lth of orroborative exampl s, this is just
what many explanations do. The consequence of this view, however, is tha t
certain quantum phenomena are simply inexplicable, despite the fa ct that
they are predicted by the best physical theory that we have. Satisfactory
causal accounts are available neither of the interference patterns character-
istic of the two-slit experiment, nor of the correlations observed in EPR-type
experiments. Further, these are not merely gaps in our knowledge, to be
repaired at some future date; we have good reason to believe that no such
accounts could ever be provided.
Notice that what we lack are accounts of the causal processes involved. It
seems quite natural to say, for instance, thatthe event (SL +) in the electron-
positron experiment causes the probability of the event (S~,-) to rise to unity,
or even to say that it causes that event to happen. Indeed to say so would be
in line with the statistical analysis of causation, which says, roughly, that
event A causes event B if, within otherwise causally homogeneous ensem-
bles, p(BIA) > p(B). (See Cartwright, 1983, pp. 22-26.) But it is one thing to
point to the positron-event (S~,+) as the cause of the electron-event(S~,-); it
is another to give an account of the causal process involved. Lacking the
latter, on Salmon's view we have no explanation of (S~,-).
Salmon responds to this problem in two ways. One response is to suggest
that a future physics may bring a new conception of what constitutes a
causal process; the other is to voice "the suspicion that explanations of
quantum phenomena may be radically different from explanations of mac-
roscopic phenomena" (p. 253n; see pp. 252 - 259).
Neither response is very specific. I will offer three comments on them: the
first is to agree that accounts of quantum phenomena in terms of quantum
probability functions will certainly be different from accounts of macro-
scopic phenomena, if only because such functions are not classical; the
second is to claim that these accounts will, nevertheless, not only be expla-
nations of the phenomena but be explanations of a general type found
elsewhere in physics, and which I will call structural explanations;· the third
is to suggest that any revamped notion of causal processes that a future
physics provides will be, at bottom, such an explanation.
The idea of a structural explanation can usefully be approached via an
example from special relativity. Suppose we were asked to explain why one
particular velocity (in fact the speed of light) is invariant across the set of

• I now find that McMullin (1977) has already used this phrase; he has in mind something
closer to Cartwright's simulacrum account of explanation (see below) than to my structu ra l
explanations.
1'10/1111111/11/, ('1I1I11I11i11/, 1111111 :.1111/1/1/111011 );'7

ilH'rti.l1 (r.1I1ll'11. 'I'IH' . IIl IWI'1" olfl'rl'd in th' last d' 'ad' of th e nine Lee nLh
l't'ntury was Lo lid th.llllw.lsuring rods shrank at high speeds in such a way
Ihat a measurement of this velocity in a moving frame always gave the same
value as one in a stationary frame. This causal explanation is now seen as
seriously misleading; a much better answer would involve sketching the
models of space-time which special relativity provides and showing that in
th ese models, for a certain family of pairs of events, not only is their spatial
separation x proportional to their temporal separation t, but the quantity xlt
is invariant across admissible (that is, inertial) coordinate systems; further,
for all such pairs, xlt always has the same value. This answer makes no
appeal to causality; rather it points out structural features of the models that
special relativity provides. It is, in fact, an example of a structural explana-
tion.
If one believes (as I do) that scientific theories-even those expressed in
highly abstract form-provide explanations, then one's account of expla-
nations will be tied to one's account of scientific theories. Consider, for
example, the view that an ideal scientific theory should be laid out axiomat-
ically, in the manner of Euclid's geometry, with particular results deducibl '
from general laws, and those in tum deducible from a few fundam nLal
axioms. This axiomatic view of theories ties in naturally with a " overing
law" account of explanation of the kind favored by Hempel (1965), who
suggested that one event, or set of events, could explain another if the
second could be deduced from the first, given the laws of nature. We may
contrast this view with the semantic view of theories appealed to in this
book. On this view a theory provides a set of models, and ground-level
explanation consists in exhibiting relevant features of these mathematical
structures.
The term ground-level is important. Explanation comes at many levels, as
does scientific theorizing. It is the foundational level which concerns us
here, since it is at this level that structural explanation occurs. Cartwright
(1983), who also distinguishes two levels of explanation, calls them
"causal" and "theoretical" (p. 75) and argues convincingly for a simulacrum
account of the former. She writes, "To explain a phenomenon is to find a
model that fits it into the basic framework of the theory" (p. 152). The
models she refers to here she calls "simulacra" to emphasize the partial
representation of phenomena which they provide. In Section 2.9 I distin-
guished models of this kind from the mathematical models which, on the
semantic view of theories, appear in the exposition of the "basic framework
of the theory." It is this second kind of model which is appealed to in a
theoretical explanation.
A related distinction, between different kinds of theories, was first made
fiH '/'III' 11I/('lpr('/lIlioll oj ()1I1111111111 'I'II('oIY

by Einstein, and has been cmph.l:-lizt·d by l3ub (1974, pp. viii, 143) and
Demopoulos (1976, p. 721). For thcsc authors, quantum mechanics, like
special relativity, is a "principle theory." Such theories may be contrasted
with "constructive theories," like the kinetic theory of gases, which show
how one theory (such as the phenomenological theory of gases) can be
embedded in another (in this case Newtonian mechanics). Principle
theories, in contrast, are foundational. They "introduce abstract structural
constraints that events are held to satisfy" (Bub, 1974, p. 143). They do so by
supplying models which display the structure of a set of events. The four-
dimensional manifold postulated by special relativity models the structure
of the set of physically localizable events; the Hilbert spaces of quantum
mechanics are models of the possibility structure of the set of quantum
events. (Here I echo Bub, 1974, and, in particular, Stairs, 1982; see also
Stairs, 1984.)
Whenever we appeal to a principle theory to provide a theoretical expla-
nation, I claim, the explanation consists in making explicit the structural
features of the models the theory employs. In the same way that we explain
the constancy of one particular velocity with respect to all inertial frames by
appealing to the structure of Minkowski space-time, we explain paradoxical
quantum-mechanical effects by showing, first of all, how Hilbert spaces
provide natural models for probabilistic theories (as in Chapters 3 and 4),
and, second, what the consequences of accepting these models are (as in the
present chapter).
A theory of the kind Salmon looks forward to, which brings with it a new
conception of causality, will also, presumably, be a principle theory. And
should we identify certain processes as causal in this new theory, and appeal
to them within scientific explanations, it seems likely that these explana-
tions will effectively be structural explanations; that is to say, in providing
them we will isolate a particular class of elements and relations within the
representations the theory provides.
9
Measurement

In the last three chapters we have seen the pairs (A, ~) treated variously as
properties of systems, as propositions in a quantum logic, and as quantum
events. The last interpretation seems the most promising: as we saw, talk of
the properties of quantum systems is problematic, and talk of th e proposi
tions -of a quantum logic is uninstructive unless these propositions ar'
themselves interpreted.
But these pairs were originally introduced as experimental questiol/s, to
which the theory assigned probabilities and to which individual experi-
ments gave the answers yes or no. In Chapter 2 the question (A,~) was
glossed as, "Will the measurement of observable A yield a result in the set ~
of the reals?" During the course of this chapter I will clarify the relation
between quantum events and experimental questions, but the main topic
addressed is the measurement process itself and the account of it available
in quantum theory: can the theory tell us what goes on when "a measure-
ment of A yields a result in the set ~"? As a preliminary, I discuss a principle
which has often been taken to imply a constraint on possible measurements,
namely, the uncertainty principle.

9.1 Three Principles of Limitation


In this section I distinguish three principles of quantum mechanics, each of
which derives from the existence of incompatible, mutually transformable
observables (see Section 3.8). There is no uniform nomenclature for these
principles; I will refer to them as the dispersion principle, the support princi-
ple, and the indeterminacy principle. The first two we have already met in
Section 6.1, but both can be set out more precisely, using the vocabulary of
quantum logic. Each of the three principles takes a particularly strong form
when we d al with the Fourier-connected observables, position (Q) and
26() '/'III' 11111'11"'1'11/1 i ll II III (..) 11111/11/ 111 '1'1/ /'0/,11

momentum (P), for a pa rticle 'onHlr" ilwJ to move in onc d imension. Rendl
that both these observables havc a ~) nl i nu o u s spectrum.

The dispersion principle:


(9.1a) There is no quantum state which maps the totality of quantum event
into {1,O}.
(9.1b) If, for a state D, PD(P,11) = 1, where 11 is a bounded subset of the reals,
then PD(Q ,r) < 1, for every bounded subset r of the reals.

The general principle (9.1 a) follows from Gleason's theorem (altema tively,
from Kochen and Specker's theorem); (9.1b) follows from a theorem proved
by Busch and Lahti (1985, pp. 66-67).
The support of a state, with respect to a given observable, is, intuitively,
the set which contains just the values of that observable to which the state
assigns nonzero probability. More formally,

(9.2) (A,11) is said to be the support of D with respect to A [we write:


sA(D) = (A,11)] if po(A,11} = 1 and if, for each (AX) such that
po(A,r) = 1, 11 ~ r.

Quantum-logically, the event which is the support of D with respect toA is


the lowest A-event in the lattice of events which is assigned probability 1 by
D.

The support principle:


(9.3 a) If A and B are incompatible observables, then there is a pure state D
whose supports with respect to A and B are not both atoms of the
lattice of quantum events.
(9.3b) If the support of D with respect to Q is (Q,11), where 11 is any bounded
subset of the reals, then the support of D with respect to P is (P,/R),
and conversely (Gibbins, 1981a; Busch and Lahti, 1985). (/R is the
whole real line.)

Apropos of (9 .3a), it is typically the case that operators A and B representing


incompatible observables share no eigenvectors; in that case, there is no
pure state D whose supports with respect to A and B are both atomic. The
support principle entails the dispersion principle.
A few preliminaries are in order before we can formulate the principle of
indeterminacy. Consider an ensemble of systems, all in the sa me s ta te.
M,'i/ll llr,'/II,'111 61

Unl~ss this slale is 1\ II l·jg' 'Il . I,lle o( Ih . obs rvabl A, measurements of A will
not yield the sam' vnlul' (or en h member of the ensemble, but a series of
different values, each occurring with a certain frequency, These values
sea tter round a mean, the expectation value (A) of observable A. In the case
when A admits eigenvectors we obtain (A) as in Section 2.4, by weighting
I

each of the different eigenvalues by the probability of its occurrence. Thus,


in this case,

When A has a continuous spectrum we write, for an ensemble in the state


o/(x),

(!J.4b) (A) = f:oo '1'* A'I' dx


Note that the value of (A) depends on the state of the system . If, for
example, the state of an ensemble of spin-t particles is Z+, then

Thus the average value of 5x is zero. As we saw in Chapter 4,

and so

(5", is the component of spin in a direction at an angle 1> to the z-axis.)


Any given measurement of the observable may differ (and in the exam-
ples above, does diffe r) from the mean . Writing Sx for the actual value
262 The II/Ierprelilliol/ tlj (JI/lllllillli 'l'l1I'ory

obtained in a particular experiment to measure the x-component of spin, we


see that

1
sx - (5)
x
= +-
- 2

The mean value of the square of these deviations from the mean is given by

(We take the squares in order that the differences between the observed
values and the mean should effectively be regarded as positive.) Now we
define the variance, <y o(5x) = ~5x' of 5x in state D as the square root of this
mean square deviation:

Thus, for the system in state z+ (in other words, such that D = P z +),

1
~5 = -
x 2

Similarly, ~5y = t, whereas ~5z = 0, since every measurement of 5z yields


the sa me answer. In general, let a be the result of a measurement of the
observabl e A. Then,

(9.5) (~A)2 = «a - (A»2)


~A is sometimes called the uncertainty of A . Given the state of the system,
~A can be calculated by an extension of the method used for (A). For
example, for the system in the z+ state:

either s<l> - (5<1» = ~ - ~cos<p (with probability COS2~<p )


or s<l> - (5<1» = - ~ - ~cos<p (with probability Sin2~<p )
And so
Mrl/ s llf'I ' /Il1 ' lIl (,.I

I lCOS 2
I (/>( I - COS1»2 + in2~1>(1 + cos1»2]
4 2 2

= ~[ 1 - 2COS1>( cos 211> - sin211> ) + cos2cf> ]


1
= -(1 - cos 2cf»
4
1
= -sin2cf>
4

It follows that

1 .
!1S", = -smcf>
2

Now let A and B be two Hermitian operators on a Hilbert space. We


define their commutator [A,B) as follows:

= AB - BA = C
[A, B) df

Note that C is itself an operator (though not, in general, HerrrUtian); for


example, we saw in Section 1.7 that [Sx,Sy) = iSz • It turns out that the
uncertainties in two observables A and B are related to the mean value of
their commutator. In fact, the product of the uncertainties in A and B is never
less than half the (absolute value of the) mean value of [A,B) (Jordan, 1969,
pp.84-85).

The principle of indeterminacy:


(9.6a) There exist observables A and B such that [A,B) = C * 0 and, for all
observables A and B, !1A'!1B ~ tl (C) I.

It is important to emphasize that all three quantities involved, !1A, !1B, and
(C), are state-dependent.
In our special case, that of the spin-t particle in state Z+, we saw that

1
!1Sx =!1Sy = -
2

and so
264 Til(' illi l'rprclll iioll of )//11/11/11// 'U/I 'Ol.'!

Now SxSy - SySx = is., and, si nce 'very measurement of 5. on a system in


state z+ yields +-t, it follows that

Hence, in conformity with the principle, we have*

All the values of spin used in this example are in natural units, that is, they
are multiples of Planck's constant, h. In the next paragraph, h has not been
"suppressed" in this way.
Consider the position and momentum observables Q and P, represented
by the operators x and - i h a/ax on the space of square-integrable functions
'JI(x) (see Section 1.11).
For any function '¥(x) we have

QP'¥(x) = x[ - ih a~;x) ]
But

PQ'JI(x) = -ih a[x'JI(x)]


ax
. a'JI(x). ax
= - lhx - - - lh'JI(x)-
ax ax

= - i hx a'JI(x) - ih 'JI(x)
ax

Clearly,

(PQ - QP)'JI(x) = - ih 'JI(x)

and so

[P,Q] = -ih

• By a slight abuse of notation, we use " ([S" Sy» " to signify th e expectation value of 1111'
observable represented by the operator [S" Sy)'
M. ' Ot/ II/'I ' /11I'/I' '} ()!;

Thus lh ' produ ct ll f till' \lIh'\'rl ,lilltil'S in P and Q is given by,

We see that, if for a certain state we can obtain a very small uncertainty in
lhe predictions made about momentum measurements, this will be accom-
panied by a correspondingly large uncertainty in those we make about
position measurements, and vice versa. The product of these uncertainties
never falls below a certain value.
P and Q are incompatible operators with continuous spectra. They are
nontypical in that there is no state for which the product of their uncertain-
ties lies below a certain (nonzero) value. However, whenever one of a pair of
observables A and B admits eigenvectors, then the product ~A . ~B can be
made as small as we wish by a judicious choice of state. For, if ai is an
eigenvalue of A, and Vi the corresponding eigenvector, then when the
system is in the state Vi' ~A = 0, and so, for any observable B, ~A' ~B = O.
For example, given an ensemble in the state x+,

1
~S = -
y 2

Note that this does not violate the indeterminacy principle, since [Sx,Sy] =
is,, and, for the state x+, (Sz> = O. Thus, in general, the indeterminacy
principle does not tell us that the product of the variances associated with
incompatible observables has a least value greater than zero. (For a careful
discussion of this, see Beltrametti and Cassinelli, 1981, pp. 24-26.)

9.2 Indeterminacy and Measurement


Of the three principles discussed in the last section, the principle of indeter-
minacy has received the most attention. Indeed, Busch and Lahti (1985,
p. 68) suggest that the support principle was not enunciated, and distin-
guished from the indeterminacy principle, until Ludwig did so in 1954.
Since both principles rest on the same fact-namely, the existence of mutu-
ally transformable incompatible operators in quantum theory-one may
wonder why it's important to distinguish them. The reason is that there has
been considerable conflict, not to say muddle, over the significance to be
a ttached to th e qu a ntities ~A which appea r in the indeterminacy principle.
tlA is d Chm'd ,l S th e variance evl)(A) in measurements of A conducted on
[I n l' nse mhl l' of p,lrtidl's in slal O . For some writ r (Poppe r 1982, pp.
266 '/'''1' tllll'lllrl'llllioll 0/ (JIII/IIIIIIII '/'''1'11/.'1
53-54, and Margenau, 1 ~5 0 , p. 375, for exa mple), this is th e ollly signifi
cance to be attached to it. We may ca ll this th e statistical reading of "A , in
contrast with the ontic reading and the reading under which it expresses a
limitation on measurement. (Note also that there is a fourfold classifica tion
proposed by McMullin and reported by Jammer, 1974, p . 79; see below .)
The on tic reading is more often used than mentioned. Very few writers
are as explicit as Davies (1984, p. 8):
It must not be supposed that the quantum uncertainty is somehow purely the result
of an attempt to effect a measurement-a sort of unavoidable clumsiness in probing
delicate systems. The uncertainty is inherent in the microsystem-it is th~re all the
time whether or not we actually choose to measure it.

Gibbins (1981a) offers some other examples. More typical is Bohr, who
oscillates between an implicit reliance on the ontic reading and an explicit
adherence to a reading in terms of constraints on measurements. Thus, in
his account of a thought-experiment involving a single slit in a diaphragm,
he writes (1949, pp. 213-214):
Consequently the description of the state involves a certain latitude Llp in the mo-
mentum component of the particle and, in the case of a diaphragm with a shutter, an
additional latitude LlE of the kinetic energy.
Since a measure for the latitude Llq in location of the particle in the plane of the
diaphragm is given by the radius a of the hole, and since () = 1I ua, we get . . .
just Llp = ()p = hi Llq in accordance with the indeterminacy relation . . .

As Popper (1982, p. 53n38a) has pointed out, Bohr's analysis of thought-


experiments like this one is remarkable for its reliance on classical models. In
the example above, u is the wave number of the "train of plane waves"
associated with the particle, and the "latitudes" Bohr speaks of are readily
identifiable as features of these waves. The passage continues,
. . . Due to the limited extension of the wave-field at the place of the slit, the
component of the wave-number parallel to the plane of the diaphragm will involve a
latitude Llu = lla = II Llq.

However, five pages earlier, Bohr has introduced the indeterminacy princi-
ple rather differently
The commutation rule imposes a reciprocal limitation on the fixation of two conju -
gate parameters q and p expressed by the relation

where Llq and Llp are suitably defined latitudes in the determination of these vari -
ables. (p. 209)
M"I11II1I'I'/11I'/11 )/)1

11('rl' IIll' 1.11 i lud( '/ III \' 11t)1 "I 1hI' qll.ln til il'HI h('JlIH('1VI'S, bill in I heir " d 'termi
n.llion"; Hohr is t' phdlly t'IH.lorsing Ilciscnb 'rg's view that the indeter-
minacy principle 'xpn,'ss 'S a limitation on measurement.
This interpretation of the principle was for many years the dominant one.
In Robertson's words (1929, p. 163),
The principle, as formulated by Heisenberg for two conjugate quantum-mechanical
variables, states that the accuracy with which two such variables can be measured
simultaneously is subject to the restriction that the product of the uncertainties in the
two measurements is at least of order h.

It was Robertson who first derived the indeterminacy principle in the gen-
eral form in which we now have it, so that it applies to any pair of observ-
abies representable in the same Hilbert space. The quotation above is from
his preamble to the derivation; the uncertainties are clearly identified with
the limits of accuracy obtainable in simultaneous measurements of these
observables. Yet, half a dozen lines into the derivation itself, we find Robert-
son writing,
The "uncertainty" L1A in the value A is then defined, in accordance with statistical
usage, as the root mean square of the deviation of A from [the] mean. (P . 163)

No account is given of why these uncertainties "defined in accordance with


statistical usage" should be identified with the accuracy to which a single
measurement can be carried out. Indeed, there seem to be two distinct
principles under discussion. The principle that Robertson announced his
intention of deriving (Heisenberg's principle, as we may call it) places limits
on simultaneous measurements. The principle that he in fact derived (the
indeterminacy principle) places limits on predictions.
These two principles may well be related, but before we can enquire into
that we need to know whether Heisenberg's principle is actually true. Is it
the case (1) that there are limits on the accuracy to which noncommuting
observables can be simultaneously measured, and (2) that the product of
these uncertainties has a lowest value of the order of Planck's constant, so
that, formally at least, Heisenberg's principle resembles the indeterminacy
principle?
Now (1) may be true, but (2) false. For example, it may be the case, as von
Neumann (1932, p. 230) suggested, that "simultaneous measurements [of
incompatible observables] are, in general, not possible." I will return to his
arguments for this in the next section. But, given his conclusion, it is a bit
surprising to find him, eight pages later, presenting and endorsing Heisen-
berg's arguments to illustrate why simultaneous measurements of P and Q
cannot be performed "with arbitrarily high accuracy" (p. 238). Surely, given
68 '1'//1' 11I1 1'/prl'llItioll 11/ (,J/IIIIII/II// 'J'//I'IIIY

von Neumann 's own con lu sionll, th e n~[\ sun th ey annut b' P 'dorm 'd
with arbitrarily high accuracy is that they can not be performed at all .
Be that as it may, let us look at Heisenberg's arguments. Thes are plausi-
bility arguments, the best-known of which (and the one used by von Neu-
mann) involves "Heisenberg's microscope" (1927, p . 174; von Neumann,
1932, pp. 239 -247). This is an idealized instrument similar in principle to a n
optical microscope, but which uses radiations of short wavelengths, like
y-rays, to form images of very small particles. If a small particle were in the
field of view of the microscope (see Figure 9.1) then it would be observed if a
photon (that is, a y-ray particle) struck it and were deflected upward into the
aperture of the microscope.
We can estimate the coordinate of position of the particle under observa-
tion by finding the position of the image formed by the instrument. Let () be
the angle sub tended at the aperture of the instrument by the particle. Then,
writing ~x for the uncertainty in our measurement of the x-coordinate of
position and A. for the wavelength of the radiation, we get

A.
~x~­
()

Now, when the photon strikes the particle, a certain amount of momen-
tum is transferred to the particle by the collision; thus any estimate we make
of the particle's momentum will have to allow for this. By our conservation
laws, we expect the momentum transferred to the particle to be equal and
opposite to the change of momentum of the photon. The trouble is, we don't
know exactly how much this is. If we knew the path followed by the photon

Figure 9.1 Heisenberg's microscope.


Mt 'lI l1 tll't""t'lI/ (,'I

Iilrough t1w 'ninO/wo pi ', IIlt ' n il would bl' easy to l'vallinle it, but w 'don ' t
know 'xn t1y wlw((' ill IIw ,lperturc the photon 'ntcrs th instrument. In
f,lel, making (J IMg(.' (in ord 'r to obtain a high resolution) has the effect of
II1creasing the un crtainty Llp in our estimate of momentum. We have

he
Llp - -
A.

and so

Llx·Llp - h

The product of these uncertainties is of the order of Planck's constant, as


Heisenberg's principle suggests.
But does this example indeed illustrate what I have called Heisenberg's
principle? In the first place, it says nothing about simultaneous measure-
ments. It shows that the price of obtaining a sharp value of, say, position is
that we transfer an imprecisely known amount of momentum to the pnrti
cle, thus rendering any previous estimate of its momentum inaccurate. (This
is McMullin's fourth reading of the uncertainty relation; see Jammer, 1974,
p. 79.) Second, it's not clear what the significance is of the fact that the two
quantities are, in the technical sense, incompatible. The general lesson to be
drawn from it is that any measurement may involve a disturbance of the
object we are looking at, and that, because these disturbances cannot be
made small compared with the quantities to be measured, we cannot ideal-
ize them away, as we do in classical physics.
To take stock, an on tic reading of the indeterminacy relation relies on the
partial picture of quantum effects supplied by the wave model. As Gibbins
(1981a, pp. 123 -125) points out, if any principle governing the localization
of a system in Q-space and P-space emerges from the theory itself, it is not
the indeterminacy principle but the support principle. Likewise, Heisen-
berg's arguments, which became part of the folklore of quantum theory,
offer slender grounds for reading the indeterminacy principle as a principle
which limits the accuracy with which incompatible observables can be
simultaneously measured. Grounds for suggesting that it does not do so are
provided by the fact, remarked on in Section 9.1, that for many pairs, A and
B, of incompatible observables, there are states such that the product
LlA . LlB is zero. It would be peculiar, but not I suppose wholly incredible, if
the very feasibility of certain (double) measurement processes were depen-
dent on the states of the systems under test.
Whether, a von Neumann claimed, there are independent reasons for
70 'f'I1I' /1I/('I'III'('/lIlillll of ()1I111111/111 '/ 'ltl'lIIy

thinking that incompa tible obs 'rvab les arc not commcasurablc is another
matter. Various authors have sugge ted oth erwise; in fa ct Margenau, both
on his own and later in collaboration with Park, proposed a numbe r o f
experiments whereby values for position and momentum could be obtained
simultaneously with no more limitation of accuracy than one might expect if
they were measured individually (Margenau, 1950, p. 376; Park and Mar-
genau, 1968, 1971). These proposals, however, have not gone unchallenged
(see Busch and Lahti, 1984).
1return to the question of simultaneous measurability in the next section; I
suggest there that, where it is forbidden, it is forbidden by the support
principle. Again, it is this principle, rather than the indeterminacy principle,
which summarizes fundamental features of the theory.
Some recent work by Busch and Lahti (1985) offers an interesting foot-
note to this section. They point out that, in orthodox quantum mechanics,
no joint probability measures exist on the set of pairs of Q-events and
P-events. That is, no probability function exists that maps all conjunctions
of Q-events and P-events into [0,1] and that reduces to the usual quantum -
mechanical assignments of probabilities to Q-events when the P-event is
the certain event (P,~), and vice versa (see Beltrametti and Cassinelli, 1981,
pp. 23-24). However, it is possible to define "unsharp" position and mo-
mentum operators with respect to which such measures are well-defined
(Davies, 1976). The operators are the usual Q and P operators on U, modi-
fied by functions f and g to become the operators Qf and Pg . (I omit the
technical details of the modifications.) The modifiers f and g are probability
density fun ctions with mean values equal to zero and variances Ilf and Ilg;
they are design ed to represent the fact that position and momentum mea -
surements are not sharp, that is, not localized at a point on the real number
line. An "uncertainty" relation now holds between Ilf and Ilg; we have

The significance Busch and Lahti attach to this result depends on an


assumption which is intuitively plausible, but hard to justify conclusively:
that joint probability distributions for two observables exist if and only if the
two observables are simultaneously measurable. If this is true, then the
"unsharp" observables Qf and Pg are simultaneously measurable, wherea
Q and P are not. But "in order to speak of simultaneous values (X,Y) of
position and momentum, one has to pay the price-that not both values
may be sharply defined" (1985, p. 73).
Mt'/I ll ll1l'/11l'lIl '17 /

9.3 I'roj('t' / illil 1'11.'; /11111/( '.':


Von Neum,"'11 (1932, pp. 223 230) gave a general proof that two incom-
pLltible observLlbles ar . not simultaneously measurable with arbitrarily high
precision. A simplified version of this proof in four steps can be given for the
ase when both observables have discrete spectra and there is no degen-
eracy.
(1) Assume that a particular experiment to measure the value of observ-
able A for a system gives result a. Then a second measurement of A per-
formed on the system immediately after the first will yield the same result.
(2) Immediately prior to the second measurement the result a has proba-
bility 1; thus the first experiment leaves the system in an eigenstate v. of A
(with eigenvalue a).
(3) Simultaneous precise measurements of observables A and B would
therefore leave the system in a state which was both an eigenvector v. of A
and an eigenvector Vb of B.
Assume that A and B are incompatible. There are two cases: (a) A and B
have no eigenvectors in common, and (b) A and B share one or more
eigenvectors.
(4a) If A and B share no eigenvectors, then no state is an eigenve lor of
both A and B. Hence no simultaneous precise measurements of A and B ca n
take place. .
(4b) No incompatible operators A and B share all their eigenvectors; thus
there would be values of the observables A and B which could never be
obtained in any putative joint measurement process. Such a process should
therefore not properly be called a "measurement process"; rather the kind
of measurement available in case (4b) would be like the measure of time
provided by a stopped clock, which is correct twice a day.
Note that the principle appealed to in step (4) is the support principle.
The first of the four steps is the one most often challenged. Von Neumann
sought to justify it by appealing to experiments by Compton and Simon on
"Compton scattering" (von Neumann, 1932, pp. 223-230), but whether
these experiments in fact offer much in the way of support for it is doubtful
(see, for example, van Fraassen, 1974a, p. 297). I will discuss the Compton
and Simon experiments in the next section; in the meantime I will enlist an
experiment suggested by Heitler (1949, p. 190) to give step (1) some plausi-
bility (see also Margenau, 1950, p . 3).
Suppose we pass electrons through a small hole in a diaphragm and then
allow them to hit a screen some distance away. Because diffraction occurs
(to use the vocabulary of the wave model), the probability distribution for
the electron will be spread out over a large area of the screen. Nonetheless
1 '1'/11' 1,,/{'rlm'll/litlll oj )111111111111 .,.III,tI/y

we ca n record the pot wh 'r any individua l clc tron strik 'S th e ::lcrt'ell . I(
we now replace the screen by two thin photographic plates, placed togcth r
and parallel to each other, the electron will go through both, and the mark
where the electron strikes the second will be very close to the mark where it
struck the first. What we have here are two consecutive experiments, both of
which measure the position of the electron in a plane perpendicular to th
axis of the experiment. The second yields the same result as the first, as step
(1) requires. Of course, this second experiment must be "immediately after"
the first; that is, between the two measurements the system's state must
neither be changed discontinuously, by interactions with other devices, nor
must it evolve significantly according to Schrodinger's equation. In the
example given, the further apart the plates are, the further apart the two
marks may be, because diffraction occurs again after the first impact.
One of the problems with step (1), however, is that some measurements
- perhaps most measurements - do not allow a second look at the system;
the electron, photon, or whatever is effectively annihilated by the measure-
ment process. Additionally, Landau and Peierls (1931) suggested that,
among experiments which allow repetition, we can find, and distinguish
between, those which yield the same result the second time as the first and
those which do not (see Jammer, 1974, p. 487n). Following Pauli (1933), we
call the former "experiments of the first kind." These considerations restrict
the scope of von Neumann's proof. They show that incompatible observ-
abIes are not measurable to arbitrarily high precision by measurements of
the first kind. Thus, although von Neumann's result is consistent with the
stronger claim, that no possible measurement could do the job, Margenau
(1950, pp. 360-364) could acceptthe proof and still maintain that simulta-
neous measurability of incompatibles is feasible. Note, however, that a
proof of the strong claim, resting on a particular account of the measure-
ment process, has been offered by van Fraassen (1974a, pp. 301-303).
Let us tum to step (2) of the argument, or rather its analogue for the case of
an observable with a continuous spectrum, like position. In the experiment
described just now, prior to striking the screen the electron behaves like a
wave; its probability distribution is spread out in space. The event of its
striking the screen is often called "the collapse of the wave packet." It is
sometimes described as a change in the properties of the electron - from
being spread out in space the electron becomes localized in a small region -
and sometimes as a passage from potentiality to actuality-of all the possi-
ble events associated with small areas of the screen, just one is actualized .
Von Neumann postulates that this collapse (however regarded) is accompa -
nied by a change in the state of the electron: the state changes in such a way
that a repetition of the experiment will with certainty yield the same result aR
M1 ' 01 1ilriW'1I1 7,1

Iwfor '. M.lrt;t'1I 1111 ( t II',() "" jlll/.I'd Ih is postu la ll' IIw I,ro/fel iOIl I'OSIIII (II I', and
il is now gener,)II n'ft'l n'" 10 by that name. W an revert to th ' LIse of ..111
observable with a disen.'1 ' spectrum to see a particularly simple instan e of
the postulate.
Let us assume that there is no degeneracy and that the result of the first
measurement of observable A is ai' Such an experiment is a maximal mea -
surement of A. If the original state is a pure state v, then the projection
postulate requires that the transformation

[equivalently, P (A"j). P ]
v Vj

takes place, where Vi is the eigenvector with eigenvalue ai'


In this maximal case, (A,ai) is the support of Vi with resp' t to 1\ (H(' I'
Section 9.1). Von Neumann's demand, that in all cases LI r p ' titioll of ,Ill
experiment will yield the same result as before, can be put in 1"lllll! III II
support requirement, as follows. Assume that a measur mt'nt of /I 1111'"11 / ,, I
the value of A within the Borel set Ll of thereals; we require thalllH' ... · I Ii II III',
state of the system have support(A,Ll) with respect toA . I will gl'IWI,d /1' 1111'
term projection postulate to refer to any rule governing tlw slalt' II II HIlIiIlIlI
induced by measurement which satisfies this requirement.
Effectively,' von Neumann's (generalized) projection PO(-l llll.lll' W,II 111 .11
after a measurement of A had localized the value of A within A, Ill\' 1'1t '/'"
tion operator P~ from the spectral decomposition of A would 1'1 1'1 v,' '\11 II ...
state description for the system (von Neumann, 1932, p. 2 1H; hit p" \ I I
example involves a discrete spectrum with degenera cy). I wrih' " /1 \'1 VI' I II
because only in the case when P~ projects onto a ray (whi II il'l to ny, will '"
the measurement is maximal) isit of trace one, and hen e a (k nil ity 0P\'f,llol
When it projects onto a subspace of higher but finite diml'nsion,) lil , WI' , ' ,111
normalize it so that the transformation becomes

(9.7) D (A,<i). D' = P~


TrP~

In the infinitely dimensional case-and so in any case where A has a


continuous spectrum - P~ cannot be normalized in this way; Bub (1979, pp.
73-74) suggests that in this case it would be consistent with von Neu-
mann's ideas to use the operator P~ to give the relative probabilities of
outcomes of future experiments.
Although this projection postulate meets the support requirement, it nev-
ertheless yields counterintuitive results. Consider, for example, a particle in
an initial state whi h assigns the probabilities Pl(Y) shown at the left side of
274 '1'/11' [1I11'r/lrl ' ltlliOIl oj () II III II 11111 '/ '/1/'111.'1

A A

C C
B B

Figure 9.2 Probability transition according to the von Neumann projection postulate.

Figure 9.2 to measurements of position along the y-axis. Assume further


that a first coarse measurement of y locates the particle within the regionAB .
Then, according to the von Neumann projection postulate, if a second, more
refined measurement is made, the probabilities would be giver. by P2(Y),
shown at the right of the figure, rather than by Pl(y) again. Note in particular
that, whereas there was initially virtually zero probability of detecting the
particle at point C, after the first measurement has localized it within AB
there is as high a probability of finding it at C as at anywhere else in the
region.
Now this is not inconceivable. Quantum mechanics teaches us so often
that the implausible happens that the unlikely becomes as likely as not.
Nonetheless, it is an odd feature of the von Neumann postulate that the
state of the system after a measurement is wholly independent of its initial
state, and in 1951 an alternative postulate was suggested. This takes the
form

(9.8) o (A,!.)) 0' = PtoPt


Tr(PtO)

According to this postulate, 0' is dependent on the initial state 0, and


nevertheless has support (A,il) with respect to A.
The form of (9.8) is familiar; it is the Liiders rule we met in Chapter 8.
There it appeared as a conditionalization rule for nonclassical probability
spaces, but Liiders (1951) first proposed it as an alternative to von Neu -
mann's projection postulate, that is, as a rule governing the changes of tate
induced by measurements. In Section 8.2 we saw why it was a natural
generalization of the classical conditionalization rule. As a projection postu
late it is characterized by the following features. (Teller, 1983, pp. 415 - 418,
provides a valuable discussion of them.)
M/'/l slI re /ll /' ,,1 J.lh

LelA a nd IJ be CO tlll l<l libl l' obSl'rv<lbks .


(a) A ording to th e rul e, if a measure ment of A is fo ll owed successively
by a measurement of B a nd a second measurement of A, then the two values
of A will coincide.
(b) Assume we prepare two ensembles of systems in state D . One ensem-
ble is just subjected to a B-measurement; the other is subjected to an A -mea-
surement followed by a B-measurement. Then, according to the rule, the
relative frequencies of the various B-outcomes will be the same for both
ensembles.
These results can be shown by using the arguments used in the proofs of
(8. 8) and (8.32), respectively. The support requirement appears as a special
case of (a), either by setting A = B and assuming that flip-flop results do not
occur, or by setting B = I (the trivial "measurement" which locates the value
of every observable within IR).
Stairs (1982, pp. 426-427) has shown that when the prem easurement
state is a pure state, then the Liiders rule is the only possible projection rul e
for which (a) holds. As he points out, this means that this rul e is th e p rojec-
tion postulate that best captures the classical ideal of a nondisturbing mea
surement; in the quantum case it is the rule of minimal disturba nce. Noll'
also that, if the transformation of states by measurement is governed by a
rule which guarantees, first, that (a) and (b) both hold a nd, se ond, th a t
measurement preserves the convex structure of the set of sta tes (sec Se tion
5.4), then the rule in question must be the Liiders rule.

9.4 Measurement and Conditionalization


In Chapter 8 the Liiders rule appeared as the natural extension of classical
conditionalization to the set of quantum events; in the last section it was
shown to be the projection postulate which most nearly approached a
classical account of measurement. In this section I will show how these two
uses of the rule may be brought together. For brevity I will use the term
" quantum event" sometimes to refer to a particular kind of event-as in
" the event (A ,~)" -and sometimes to refer to a specific occurrence of that
event; context will disambiguate between the type and the token.
Von Neumann's views on measurement, though not of course his choice
of projection postulate, suggest a natural account of these events. On this
account, a quantum event (A,~) involves the localization of the observable A
within the set ~ of the reals. The event is realized by an interaction between
a quantum system a nd a macroscopic apparatus. This interaction may be
one whi ch we wou ld ca ll a measurement, but it need not be. For example,
assume tha t w' seek to bring about a loca liza tion of the y-coordinate of
Ii> '1'/11' /1I/N/lrl'/II//0/l lI/l)'1II1I1111II '1'111'111.11

position of a parti Ie within .l !'Im,lll n.'gion L\ . We may eith or use a photo


graphic plate and wait until on 'parti Ie from an ensemble strikes the region
L\ of the plate, or we may use a diaphragm with a small aperture in it and
wait until a particle is detected on the far side of the diaphragm. In each case
the quantum event (y,L\) has taken place, and the probability that it would
occur is no different in the photographic plate experiment than in the other.
But unless the photographic plate is very thin (as in the Heitler thought-
experiment described in Section 9.3), the particle will not pass through it but
will be absorbed; to use Pauli's terminology, the experiment will not be a
measurement of the first kind, and the projection postulate will not apply.
Thus not all quantum events are events on which we can conditionalize.
On the other hand, we can and do conditionalize on the event (y,L\)
occurring in the diaphragm experiment, witness the discussion of the two-
slit experiment in Section 8.4. However, we may be justifiably reluctant to
call what occurs in the experiment a measurement. Certainly the diaphragm
alone does not measure the position of the particle; only when an additional
detection device is placed on the far side of the diaphragm will we know that
a particle was ever around being "measured." As Margenau has empha-
sized, in the absence of such a detector, the most we can say is that if a
particle passed through the diaphragm, then at the diaphragm its y-coordi-
nate was localized within L\. Thus not all quantum events are measurements.
For Margenau (1936, 1963) there is a crucial distinction between mea-
surements and preparations. Measurements yield a value for a particular
observable, while preparations produce an ensemble of particles in the
same state. According to Margenau, it was a defect of von Neumann's
analysis that it confused the two; the projection postulate suggested that an
ideal measurement on a system would not only yield a precise value for an
observable, but would also project that system's state into the correspond-
ing eigenstate of that observable.
It is a virtue of the "quantum event" interpretation of the theory-about
which more will be said in Chapter 10 - that it allows some reconciliation of
these views. The phrase "the localization of the observable A within L\"
conceals an ambiguity that is fruitful rather than fatal. Quantum events may
be measurements; they may also, via conditionalization, serve as prepara-
tions. On the one hand, the observable may be being measured and the
result found to lie within L\; on the other, the system may be being prepared
in a state D with support (A,L\) with respect to A. In a type-1 measurement
both would happen together, but there may in fact be no such events. [n
their absence, could any quantum event serve both as a measurement and as
a preparation?
Surprisingly, the answer is yes. As we saw in Section 8.8, if we hay a
Mt · II .~ lIrt · /I/( · /l1 77

coupl 'd sys t '01 of th~' killo Wll'O in EPR typ ' experi ments, then a n event
associa ted with o n ' 'ub y tern may be both a measurement of an observ-
a ble for that subsystem and an event which projects the other system into an
eigenstate of that observable. The Compton-Simon experiment to which
von Neumann (1932, p. 212) appealed for evidence in support of the projec-
tion postulate is similar in kind.
[n this experiment light was scattered by electrons and the scattering process was
controlled in such a way that the scattered light and the scattered electrons were
subsequently intercepted, and their energy and momentum measured.
Given the initial trajectories of a photon and an electron,
the measurement of the path of the light quanta of the electron after collision
suffices to determine the position of the centralHne of the collision. The Compton-
Simons [sic] experiment now shows that these two observations give the same
result. (P. 213)
The two observations need not occur simultaneously; if they do no t, w
can infer the result of the second from the result of the first. Prior to th e first
observation, we could only make statistical predictions about th ' second,
whereas after the first one has been made, the second is "a ir 'ady d ,t 'r
mined causally [sic] and uniquely" (p. 213).
As an argument for the projection postulate, this has recently come under
heavy fire. Van Fraassen (1974a, p. 297), for example, writes,
Upon what slender support dogma may be founded! In the experiment described,
measurements are made directly on two objects .. . which have interacted and
then separated again. The observables directly measured are ones which have be-
come correlated by the interaction . . . And on the basis of this, an inference is
made about what would happen if a single experiment could be immediately re-
peated on the same object!
Indeed, one wonders why von Neumann chose this particular experiment
for his purposes. Einstein, in contrast, was content to illustrate the projec-
tion postulate by two polarizers PI and P2 ; if their axes of polarization are
parallel, then any photon passing the first will also pass the second (Ein-
stein, in correspondence with Margenau; see Jammer, 1974, p. 228).
One motivation was von Neumann's desire to use the Compton-Simon
experiment to make a further point. The experiment shows that, contrary to
a suggestion made by Bohr, Kramers, and Slater (1924; see Jammer, 1966,
pp. 183 -188), the principles of conservation of energy and momentum
hold in individual cases and are not merely statistical laws. As von Neu-
mann (1932, p. 213) pointed out, this implies that the quantum world lies
somewhere between a purely statistical world and a wholly determined
IH 'I'ltl' II/Ie11ln'IIIIicill II/ (.)/1111111111/ 'J'1'/'/llY

world; for him the proje lion poSllIl.ll • was an cxpre ion of lhis inlt'fI1w
diate " degree of causality." With hindsight, we can reread von Neumann 's
argument as an argument not for his version of the projection po tulat .. bUl
for the Liiders rule viewed as a rule of probability conditionalization . Both
rules indicate where, within a statistical theory, deterministic correlations
may obtain.
That said, in the remainder of this chapter I will leave aside the connection
between conditionalization and measurement, and look solely at the latter.
In particular, I postpone a discussion of Teller's views until Section 10.1.

9.5 The Measurement Problem and Schrodinger's Cat


By the projection postulate, wrote von Neumann (1932, p. 217), "we have
then answered the question as to what happens in the measurement of a
quantity. To be sure, the 'how' remains unexplained for the present." And,
fifty years later, this second question is still with us.
On the" orthodox view" of measurement, as von Neumann's account has
come to be called, a system's state can evolve in two different ways. It can
change continuously through time: we may have, in accordance with the
Schr6dinger equation,

(9.9) Do - D t = VtDOV;-l

where V t is a unitary operator. It may also change discontinuously in ac-


cordance with the Liiders rule:

(9.10)

as a result of the quantum event (A,Ll). The "quantum event" interpretation


of quantum mechanics I propose in Chapter 10 accepts this "strange dual -
ism" (Wigner, 1963, p. 7) within quantum theory; however, many theorists
have found it unacceptable. In particular, one may ask what it is that physi -
cally distinguishes the kinds of interactions governed by Schr6dinger's
equation from those in which discontinuous changes (allegedly) occur.
The latter kind of state transition is, of course, not only discontinuous but
often nonunique. Assume, for example, that a system is in a pure state v
which is a superposition of eigenvectors {v;} of some observable represented
by A: v = ~iCiVi' Then a maximal measurement of A will yield, as a special
case of (9.10),

(9.10*) v- Vi with probability ICil2


M('I/ IIIII'/'/II/'ll/ 77Y

tr.lnsiliol\ 10 .Ill y \1111' olllw VI (or whi h CI I- 0 has a nonzero probabil-


.I lid ,1

II yof oc urrt'lll·l'.
As this shows, I he problem of the projection postulate is just one element
o( another, larger problem confronting quantum theory, the problem of
measurement. What account can quantum mechanics offer of the statisti-
'ally governed but individually undetermined events characteristic of mea-
surement processes? The problem has two aspects. First, whatever theoreti-
al account we give, the processes it describes may have more than one
possible outcome. Second, this account, though couched in quantum theo-
retical terms, must include some treatment of the classical measuring device.
Apropos of the second point, Schrodinger (1935, pp. 156-157) has
pointed out that we are led to bizarre conclusions if we try to apply the
quantum-mechanical formalism to a macroscopic object. He instances the
case of a radioactive atom and a detector. An alpha-particle within a radio-
active nucleus evolves into a superposition of states, so that as time goes on
there is an increasing probability of its being detected ou tside the nucleus. (I t
" tunnels through" the potential barrier which the nucleus provides; Bohm,
1951, pp. 240-242.) Schrodinger(1935, pp. 156-157) writes colloquially o(
the state being "blurred":
The state of a radioactive nucleus is presumably blurred in such a degree and fashion
that neither the instant of decay nor the direction in which the emitted a -particle
leaves the nucleus is well-established. Inside the nucleus, blurring doesn't bother us.
The emerging particle is described, if one wants to explain intuitively, as a spherical
wave that continuously emanates in all directions from the nucleus and that im-
pinges continuously on a surrounding luminescent screen over its full expanse.

But while we may accept this "blurred" picture of the microscopic system,
we cannot accept a similar picture of the macroscopic measurement appa-
ratus. Schrodinger continues,
The screen however does not show a more or less constant uniform surface glow, but
rather lights up at one spot-or, to honor the truth, it lights up now here, now there,
for it is impossible to do the experiment with only a single radioactive atom.

And, as a further illustration, he introduces the legendary creature who now


appears in every philosophical bestiary, Schrodinger's cat.*
One can even set up quite ridiculous cases. A cat is penned up in a steel chamber,
along with the following diabolic device (which must be secured against direct
interference by the cat); in a Geiger counter there is a tiny bit of radioactive sub-
stance, so small, that perhaps in the course of one hour one of the atoms decays, but
also, with equal probability, perhaps none; if it happens, the counter tube discharges
a nd through a relay releases a hammer which shatters a small flask of hydrocyanic

• Ti(' rli,' bh,llwr (19:19) discusses Ihe rel a tionship of Ihis animal to Buridan's ass.
280 '1'/1/' 11I11' llm'llIlio/l 0/ (./111111111111 / '1t1'01 .1/

acid. If one has left this en tire system to itsl' l( (or one hour, on would say th.1I till' Cdt
still lives if meanwhile no atom has de ayed. The first atomic decay would h. vc
poisoned it. The 'II-function of the entire system would express this by having in it
the living and the dead cat (pardon the expression) mixed or smeared out in equal
parts.

As a specification of indicator states of a measurement apparatus, "cat


alive" and" cat dead" may seem a trifle outre. But, as the previous paragraph
makes clear, Schrodinger's point is that, however these indicator states are
chosen, no superposition of them can exist, since they are states of a classical
measuring apparatus.
With Schrodinger's cat in mind, let us review the measurement problem.
We are looking for an account of a particular kind of process, whereby a
system 5 interacts with a measurement apparatus M; during the interaction
M evolves to a state indicating a value of some observable A associated with
S. We may call such an account an "internal account" of the measurement
process if the evolution is governed by Schrodinger's equation.
For simplicity let us consider an observable A with a discrete spectrum
{ava v . . . }. Then the account must satisfy the following requirements.

(9. 11 a) M must have a set {uO,U},U2 . . . } of possible states; Uo is the


ground state of the apparatus, and U},U 2 , . . . correspond to the
outcomes of the measurement (pointer readings) associated with
values a} ,a 2 , • • • of A. Since M is classical, the indicator states
{uo,u},u 2 , . . . } must be pairwise orthogonal, and no (nontrivial)
superposition of these states can be a state of M.
(9.11 b) As a result of a measurement of A, the state of M must evolve from U o
to one of U}, U 2 , . . .
(9. 11c) The probability that the transition Uo -- U} takes place must equal the
probability assigned to (A,aj) by the state v of the system S.

The projection postulate appears as a further requirement, independent of


requirements (9.11a-c).

(9. 11 d) Whenever the evolution takes M to state Uj, then it takes 5 to the
eigenstate Vj of A which has eigenvalue aj.

From (9.11c) it follows that, if 5 is in the eigenstate Vj of A which has


eigenvalue aj, then M must evolve to Uj during a measurement of A on S. In
general (9.11b) and (9.11c) require that at the end of the measurement
MI'lI ll llrl' IIIt'1I1 HI

pro 'ss M b 'in.l I'l !.ll<' III wilh probnbilily pv(A,a l ) . Let Pi = pv(A,a;); then the
general r quir 'm 'nt an b expressed by saying that, after the measure-
men t, M mllst be in the mixed state OM = L;PiPS" (where PS" projects onto u;).
Prima facie this does not violate (9 .lla) since, in contrast to superpositions,
mixtures of classical states are perfectly respectable.
Along these lines Heisenberg (1958, p. 53) wrote that
The probability function [of quantum mechanics] combines objective and subjective
elements . . . In ideal cases the subjective element . . . may be practically negli-
gible as compared with the objective one. The ph ysicists then speak of a "pure case."

Although in this passage Heisenberg doesn't use the term, we may add that,
conversely, a mixture is a probability function within which a subjective
element, "our incomplete knowledge of the world," may be represented.
Any measurement process, says Heisenberg (p. 54), produces an interplay
between these two elements:
After the interaction has taken place, the probability function ontains the objl'Cliv('
element of tendency and the subjective element of incomplct 'knowl 't\g(', l'Vl'n i( it
has been in a "pure case" before.

In other words, during the measurement process the apparatu s l'volwl'l


into a mixture of indicator states; of these one will be actualized, but whi h
one we cannot predict.
Heisenberg shows how the state of the classical measurement apparatus
can be described in quantum-mechanical terms. The question now is, can
we give a quantum-theoretical account of the process he describes? In
particular, how can this process start with S in a pure state v and with M in
the pure state uo, finish with M in the mixed state LiPiPS", and yet be
governed by Schrodinger's equation?

9.6 Jauch's Model of the Measurement Process


An account of such a process was given by J. M. Jauch (1968, chap. VI. 9).
On his account, while the measurement is being performed the system Sand
the measurement apparatus M form a coupled system S + M, whose states
are represented in a tensor-product space as follows.
Assume tha t the observable being measured is representable by the oper-
ator A on 7fS, and that there are just two values of the observable, eigenval-
ues of the eigenvectors v + and v_ of A. The measurement device is then
assumed to have (at least) three mutually orthogonal possible states, a
ground state and two indicator states. Let U o be the state before any mea-
surement takes pin (the ground state), U -f the state when the device regis-
R Tit/' TIII/ ' I/I/'i ' IIIIIOII ill )//11111111/1 / 'III 'P Ii/

tcrs a po itive v< III ,for A, ,lIld u IIw , t.11t' wlll'1l il fl'Mislers , I f1l'M,ltivl' v,lilll'
for A. We assume that the quantum n1l' h<lni a l formali sm ca n be applil'd 10
M, and that these three states are representable by ve tor u o, U1, and U2,
respectively, in a Hilbert space 7fM. No assumption is made that superpo i-
tions of uo, U1, and U2 are also possible states of M .
We represent the states of the coupled system S + M in the tensor-product
space tfs ® 7fM . Assume that, before the measurement begins, the system S
is in the pure state v, where v = c+v+ + CV_, and that M is in the state uo ;
then the original state of S + M will be '1'0 = v ® u o . During the course of the
measurement interaction this state will evolve continuously, according to
Schrodinger's equation. Accordingly, at the end of the interactio:;1, S + M
will again be in a pure state 'I' E tfs ® 7fM, where 'I' = U'I'o, and U is some
unitary operator on tfs ® 7fM. U must obey the following constraints: when
v = v+ (that is, when c = 0), we require that 'I' = '1'+ = v+ ®"+; when v =
v _ (that is, when c+ = 0), we require that 'I' = '1'_ = v_ ® u_.
In each of these two cases U takes '1'0 into a state of S + M reducible into a
pure state of S and the corresponding pure state of M . By the linearity of U
we obtain, for the general case,

However, this state, although a pure state of S + M, is in general not


reducible to pure states S and of M (see Section 5.8). In fact, using the density
operator notation, we have, for the state of the composite system,

where

and

The operators Pt, P~, P~, P~ project onto rays in tfs and 7fM containing,
respectively, v+, v_, u+, u_.
This seems to give precisely what we want. The measurement process
evolves according to Schrodinger's equation, but the final state of the mea-
surement device is a weighted sum of the indicator states. These weights are
Mln l ltrl' III I' ,,1 lH,1

(' X.l c tlI Ill' p rohll llllli " j w hll'h q u ,lnlulIl Ih i'O I y 01 :, /11 ) ; 111< 10 Il l\' COII'('S)lOll d
IlI g oul co l1l ('s (S\ ' (' ,' \'I' Ii O Il ,1\) .
Moreov 'r, as Jauch )loi nl s oul , we 'nn nlso show th at indicalor s tnt ' S arc
correlated with fi nal slales of S. foor a su me that we ca rry out a measur ' ment
I'.~ Q9 p M on the composite system in the state '1'. That is, we test for the joint
' ven t [(A,+);(A M , - )], where AM is the act of observing M. In this case,

(P~® P~)'I' = (P~® P~)[c+(v+ ® u+) + c (v_ ® u _)]


= c+(v+ ® 0) +c (O ® u _)
=0

(0 is here the zero vector in 11 5 ® 11M.) It follows that

('I'I(P~ ® P~)'¥) = 0

and the joint event has zero probability of occurren e.


The consistency of any further measurements wi th the (l IW Ih,11 h . I /'l ),('\ ' 1I
carried out-whether these further measurements arc ondu t'll'd 0 11 S O l ' 0 11
M -is thus assured; the projection postulate has a ppea r 'd wilhin II I(' ,111 , 11
ysis as an added bonus.
Alas, elegant as the treatment is, as an account of the transition fro m lh '
possible to the actual it won't do. The interpretation of mixed states w hi h
motivates it cannot be applied to the mixtures which appear within it. What
we would like to say, when we speak of the measurement device being in a
mixture of Pi' and P~, is that it actually is in one of these pure states but we
don't know which; in other words we would like to use the ignorance
interpretation of mixtures. But, as we saw in Section 5.8, this interpretation
cannot be used for those mixtures which arise from a reduction of a pure
state in a tensor-product space. (This argument is due to Feyerabend, 1962.)
For, to return to our example, the mixed state of S after the measurement
interaction has taken place is given by Ic+12p~ + Icl2p~ and that of M by
Ic+1 2pi' + IcI 2P~. On the ignorance interpretation of mixtures, this means
that the system is actually either in the state v+ or v_, and that M is actually
in the correlated state u+ or U _. The state of the composite system S + M is
then either v+ ® u+ or v_ ® u _, each of these having a certain probability.
But this means that S + M is in a mixture, contrary to our claim that it is in the
superposition '1'. It is crucial to the analysis that the final state of S + M is
indeed pure; if it is not, then the evolution of the composite system has not
accorded with Schrodinger's equation, and no internal account of the mea-
surement process has been given.
(1"< "ll' Irtrt-rr'rnrmnll (If I..I/UI/1f1/1/1 IIIt'OIl/

The move to an ana lysis of till' cOlllposil • syst'm in terms of .1 It'nsm


product space has not, th r for', done what wa hoped of it; th ." ollnps'
of the wave packet" remains as anomalous as ever.

9.7 A Problem for Internal Accounts of Measurement


As was pointed out in Section 9.5, the measurement problem remains a
problem whether or not measurement interactions are required to conform
to the projection postulate (9 .lld). The problem is crucially one of describ-
ing in quantum-mechanical terms an evolution of the state of the apparatus
M (or of the combined system S + M) which conforms to requirpments
(9 .lla-c). A promising candidate was presented in the last section, and was
shown to fail. In this section I will show that there is good reason to believe
that no internal account of such a process can be given.
Requirement (9.11a) stipulates that all the admissible states of M,
{uo,u1 ,U 2 , . . • }, must be pairwise orthogonal, so that M will behave clas-
sically. If this requirement is accepted, there are then two reasons for de-
manding that a corresponding requirement should hold for 5 + M.
In the first place, if being a classical system is a matter of scale, then the
classical nature of M will impose itself on any system of which M is a
subcomponent. Second, it seems plausible to require that the only admissi-
ble states D of 5 + M be those which reduce into admissible states of 5 and of
M. Now nontrivial superpositions of admissible states of M are prohibited; it
also seems justifiable to allow, as mixed states of M, only those mixtures
which can be interpreted classically, that is, only those which can be given
an ignorance interpretation. As we saw in Section 9.6, this would rule out as
states of M all nontrivial mixed states arising through reduction of a pure
state of S + M. The only admissible pure states of 5 + M would then have
the form v ® U; (where v E tfs and U; is an indicator state or the ground state
of M). If this is so, then any two admissible pure states of 5 + M, v ® U; and
v' ® U j' where i =1= j, will inherit the orthogonality of U; and Uj. The admissi-
ble pure states of 5 + M will fall within a set of pairwise orthogonal sub-
spaces of tfs ® 7fM, indexed by the admissible pure states of M.
This restriction on the pure states of S + M implies that no evolution of the
state of S + M which is governed by the Schrodinger equation can ever
involve a transition of the state of M from its ground state to one of its
indicator states. More precisely, let '1'; = v ® U; and 'I'j = v' ® Uj be two
orthogonal admissible pure states of S + M; then, although there exists a
unitary operator U on tfs ® 7fM such that U'I'; = 'I'j' this operator is not a
member of a continuous one-parameter group of unitary operators mapping
admissible pure states of 5 + Minto each other. That is, if'l'is to be restricted
Ml'lIIwrt ' III(' III Hfi

lo Ih(' s(' 1 of .ldlllissihl(' purl' slall'S, we an nol wrilc i(d'Y /d/) = H'Y, as this
would r 'q uire 'll to p< 55 through the " no-man's-land" between admissible
pure sta tes.
Since an internal account of the measurement process is, by definition,
one that conforms to Schrodinger's equation, it would seem that no internal
account conforming to (9.11a) can be given.
A way out is suggested by Beltrametti and Cassinelli (1981, chap. 8) and
independently by Wan (1980). Beltrametti and Cassinelli's strategy is to
distinguish between the mathematical account of the time-evolution of the
state vector and the interpretation of this as the evolution of a particular
kind of state. On their account of the measurement process, the state vector
'P of S + M evolves according to the Schrodinger equation. However, only
when 'l'has the form v ® ui(where v E ti s and uiis an indicator state of M)
does 'P represent a pure state; when it does not, it is interpreted as a (classi-
cal) mixture of such states.
Before assessing this account, let us see how quantum theory trea ts situa-
tions in which not every normalized vector in the relevant Hilbert spa e an
represent a pure state of a system.
A rule forbidding us to form a pure state by the superposi tion of othl'r
pure states is called a superselection rule. Such a rule restricts pure stat 's 10
those representable by vectors in orthogonal subspaces Lo, LI , . . . of lhe
Hilbert space 7i for the system; Lo, L1 , • • • are known as the superseiectioll
subspaces (sometimes the coherent subspaces) of 7i. In the presence of super-
selection rules, not every Hermitian operator on the space can represent an
observable (see Jordan, 1969, sec. 28; Beltrametti and Cassinelli, 1981, chap.
5). In fact a Hermitian operator A on 7i can represent an observable only if
each superselection subspace Li of 7i reduces A - in other words, only if
A'P ELi whenever 'P ELi' This condition holds if and only if every projector
in the spectral decomposition of A projects onto a subspace of some super-
selection subspace Li of 7i. It follows that, in the presence of superselection
rules, (1) any function of an observable A is reduced by every supers election
subspace, and (2) every projector P E representing a quantum event E pro-
jects onto a subspace of some superselection subspace (or is the sum of such
projectors); hence P E is also reduced by every superselection subspace.
Now consider a normalized vector 'P which is a nontrivial superposition
of two normalized vectors 'PI and 'P2in distinct superselection subspaces L1
and L2 of 7i: 'P = c1'PI +C2'P2 . Note that 'PI 1. 'P2 , andlc l 12 +lc2 12 = 1. LetP'I'
be the projector onto the ray containing 'P, and PI and P 2 the projectors onto
the rays containing 'PI and 'P2 , respectively.
In the absence of superselection rules the superposition 'P = C 1'P 1 + C2'P2
would not be statistically equ ivalent to the mixture D = Ic l 12P 1 + Ic 2 12P 2.
86 '/ '/1(' /1I/I'II'I'l'llIlioIlO/ (.) /11111111111 '/ '/;/'tll.I/

That is, there wo uld be a qU <H11ullI i'Vi' IlI t: for w hi h p",(t) I 1'1)([;). For le I I:
be represented by the projecto r I',; o n '/1 . Then Po(C) = Tr(P,;D); using a n
orthonormal basis which includes '1', and '1'2' we obtain

pdE) = ('I'lIPED'I'l) + ('I'2IPED'I'2)


2
= Ic l I ('I'lIPE'I'l) + IC21 2('I'2IPE'I'2)
On the other hand,

p-V<E) = Tr(PEP'I') = ('I'IPE'I')


= (Cl'l'l + C2'1'2IP E(C l 'l'l + C2'1'2»
= Ic l I2('I'lIP E'I'l) + Ic 212('I'2IPE'I'2)
+ Cl *C2('I'lIPE'I'2) + C2*C l ('I'2IPE'I'l)
In the presence of superselection rules, however, the cross terms vanish -
because (a) Ll and L2 both reduce PE' and (b) Ll and L2 are mutually
orthogonal - and we obtain

We see that, in the presence of superselection rules, 'I' and D are statisti-
cally equivalent. (Recall, in this connection, the discussion in Section 3.9.)
Thus although, in accordance with the superselection rule, 'I' may not
represent a pure state of the system, we may use it to represent a mixture; 0
and 'I' become two mathematically equivalent ways to represent the same
state.
Let us now return to Beltrametti and Cassinelli's account of the measure-
ment process. They too argue that 5 + M inherits the superselection rules
characteristic of M, and that the superselection subspaces of 'lis + Mare 'lis ®
LtJI, 'lis ® Lr, and so on, where LtJI, Lr, . . . are the rays in 'liM containing
the indicator states Uo , Ul , . .. of M (Beltrametti and Cassinelli, 1981,
p . 84, though their argument to this conclusion is not the same as the one
given here).
As in Section 9.6, we consider the case when the admissible pure states of
Mare u o, U+, and U_, and U+ and u _ correspond to the two values of a n
observable A associated with eigenstates v+ and v_ of 5, respectively. We
take the initial state of S + M to be '1'0 = v ® uo , where v = C+v+ + c v .
Like Jauch, Beltrametti and Cassinelli suggest that '1'0 evolves during th e
measurement process in accordance with the Schr6dinger equa tion, so tha t
M"lI s lIl'I ' IIII' 1I1 87

wht'r(' 'I', v, O¢ 1I , .IIHI 'I' v !Xl U . (J /Il rn Jauch, however, they do


nol assume lh al th e veclor 'I' muSl repre ent a pure state of S + M. If neither
c , = 0 nor c = 0, th 'n 'I' does not lie within a supers election subspace of
'lis " M . When it does not, Beltrametti and Cassinelli interpret it as a classical
mixed state. That is, they regard it as a mixture of '1'+ and '1'_, and interpret
this mixture according to the ignorance interpretation. The pure state '1'+
(respectively, '1'_) is assigned probability Ic+12 (respectively, leI 2). Thus in
the course of a measurement the objective probabilities built into the state v
of S evolve into the subjective probabilities associated with a classical mix-
ture of states '1'+ and '1'_ of S + M in accordance with (9 .11c). Each compo-
nent of this mixture reduces to pure states of S and of M ('1'+ to v + and 14,
and '1'_ to v_ and u _) so that the projection postulate is also satisfied.
This'offers a neat resolution of the measurement problem, which evades
the snag on which Jauch's proposal was shipwrecked. But it does so at
considerable cost. As van Fraassen has pointed out to me (pers. corn., No-
vember 1986), it faces a difficulty exactly analogous to that raised ea rli er in
this section for all internal accounts of measurement.
Consider the operator U that maps '1'0 into'!'. Since '1'0 7{s ® L~ bUl 'I' ~
'115 ® Lif, the superselection subspaces of '115 + M do not reduce U . li en U
can be neither an observable for S + M, nor a function of one. But, on the
Schrodinger picture of the evolution of states, U = e- iHt , where H is the
infinitesimal generator of the group {U/} (see Section 2.7), and is also the
operator corresponding to the total energy of the system. For all purely
quantum systems, H is taken to represent an observable quantity; when
supers election rules apply to quantum systems, H, like every other operator
representing an observable, is reduced by the superselection subspaces of
the system (Jordan, 1969, sec. 32). In other words, in order for Beltrametti
and Cassinelli's account of the measurement interaction to succeed, we
must postulate that, when we deal with a macroscopic system, either the
infinitesimal generator of the evolution group does not represent the energy
of the system, or the energy is not an observable for the system.
Wan (1980, p . 980) acknowledges this problem, and points in the direc-
tion of a couple of responses, but neither of these seems promising. On the
one hand he instances other theories (Dirac's Hamiltonian formulation of
general relativity, the Gupta-Bleuler formulation of quantum electrody-
namics) in which it seems inappropriate to regard H as the energy observ-
able; on the other he points out that, if the measurement apparatus is treated
as an infinite system, as it is in some other accounts of the measurement
process, then "the total energy of an infinitely large system is not something
having an obvious meaning which can be taken for granted."
But noti ,first of all, that these examples only address the issue of
whether H s hould r<'pr sent th encl'X!I o f the system; they do not touch the
288 rile III/I'rl/frlll/ioll oj (jllllllllllll '1'111 '011/

basic question of whether H s hould rt'prt'sc nt an oiJservaiJle. co nd, both


responses lose sight of the original proje t, which was not to establi h where
quantum theory is inadequate, but to show that a consistent, if schema ti c,
account of measurement can be given within the formalism of the theory.
Within orthodox quantum theory, the proposal that H does not represent
the energy observable is entirely ad hoc; it has no independent motivation,
examples from sundry field theories notwithstanding. And the second sug-
gestion, that the measurement system be idealized as an infinite system,
raises as many problems as it solves. To take one example, due to van
Fraassen (see Hughes and van Fraassen, forthcoming), why should it be a
permissible idealization to regard Schrodinger's cat (which contains about
10 24 particles) as an infinite system, if we may not regard a pot of liquid
helium, which is equally macroscopic but yet exhibits quantum behavior, in
the same way?

9.8 Three Accounts of Measurement


In Chapter 10 I will suggest that, if it is seen as a problem within quantum
mechanics, the measurement problem is insoluble. In the remainder of this
chapter I will give thumbnail sketches of three different accounts of mea-
surement; each of them, I will argue, while philosophically interesting, is
finally unacceptable.
I have contrasted internal accounts of the measurement process with the
dualist position presented by von Neumann. From what was said in Section
9 .7, it seems that in order to give an internal account, we must drop the
requirement that all the permissible pure states of M are mutually orthogo-
nal. But having done so, we then need to explain why the behavior of this
system seems to be classical. Two of the accounts of measurement I will look
at reject the requirement but give different explanations of the apparently
classical behavior of M; the third is a dualist account of a rather remarkable
kind.

THE DANERI-LOINGER-PROSPERI THEORY

The most sober of the three accounts is offered by Daneri, Loinger, and
Prosperi (1962). It may seem odd that I portray them as showing why a
macroscopic system merely seems to behave classically, since they write
that
In order that objective meaning may be attributed to the macro-states of large
bodies, it is of course necessary that .. . states incompatible with the macroscopic
observables be actually impossible. (P. 298; Wheeler and Zurek, 1983, p. 658)
It sounds as though, like Beltrametti and Cassinelli, they ar going to rule
out superpositions of indicator states as po sibl pure Mates of 5 + M . (H re I
MCII /lln ' IIIt ' ''' 2H9

am Slrl'lC'hing pr 'v i()IIS lIS.lgt' by usin g " indi olor lole " to refer not merely
lo statcs of M but to th os ' stoles of S + M which would be admissible on a
wholly classical picture.} Ilowcver, this is not what they do. Rather, they
show that, because 5 + M is a very large system, the pure states into which it
evolves behave like mixtures. Starting from the fact that a measuring instru-
ment is a system of many particles and with correspondingly many degrees
of freedom, they argue from thermodynamical considerations that, when
such a system is in a superposition of indicator states, the interference terms
characteristic of superpositions effectively cancel out (pp. 301/661 and
305/665). As a result, a superposition will be statistically indistinguishable
from a mixture with respect to all relevant observables. If we measure the
macroscopic system 5 + M (call it "1") by using another macroscopic system
("II"), then
A statistical operator . . . for the system I which corresponds to a pure state de-
scribed by a superposition of vectors belonging to different [macroscopic sta tes] is
equivalent, so far as the macroscopic observables on II are concerned, to a statistical
operator which is a mixture of the above macroscopic states. (Pp.314/674)
This resembles the move made by Beltrametti and Cassi n IIi (s ('ction
9.7). On both approaches, the state to which 5 + M evolves, and whi 'h is
given mathematically by a linear superposition, is shown to be indistin
guishable from one given by the weighted sum of projection operators. Th '
difference is this. Beltrametti and Cassinelli suggest that the sta te in qu stion
is a mixed state, Daneri, Loinger and Prosperi that it is pure; however,
according to the latter this pure state is statistically indistinguishable from a
mixture. But, unless we think that a state-function refers essentially to an
ensemble of systems, statistical indistinguishability is not enough. What
Daneri, Loinger, and Prosperi conclude is that an ensemble of macroscopic
quantum systems will behave like an ensemble of classical systems. As
Cartwright(1983, pp. 169-171) has pointed out, however, what we need is
an account within which individual systems exhibit classical behavior; if a
superposition of indicator states does not represent a classically permissible
pure state, then Daneri, Loinger, and Prosperi have failed to provide us with
one (see also Bub, 1968; Putnam, 1965).
In brief, their account does not produce the final state we want; Beltra-
metti and Cassinelli, on the other hand, show us the desired state, but in
doing so they make it unattainable.
THE MANY- WORLDS INTERPRETATION

Arguably the most fantastic of interpretations of quantum theory, certainly


the one most beloved by writers of fantasy, is the many-worlds interpretation
(MWI). Il ere I will follow de Witt (1 970) in presenting the interpretation as a
r 'solution of th ' problem of mcasurcmen t; however, I should mention that
for other advocates of the interpreta ti on, Everett (J 957) and Wh ec lcr( 19 7)"
its main attraction was that it offered "a reformulation of qu antum th eo ry in
a form believed suitable for application to general relativity" (Everett, 1957,
p. 141).*
On this interpretation a measurement interaction occasions a splitting of
this world into a large number of copies of itself. When the measurement
leaves S + M in a superposition, each of the indicator states represented in
the superposition is the state of S + M in at least one of the worlds. S + M
seems to behave classically because the observer is multiply cloned, to-
gether with the system; no clone has access to any world other than her own;
hence only one of the indicator states presents itself to anyone clone, while
the others present themselves to counterpart observers in other worlds.
Schrodinger's cat, predicted by the theory to be in a superposition of live
and dead states, is alive in some worlds, dead in others.
Since quantum systems are continually interacting with one another,
every world continually divides into different branches; each vf these
branches is a fully realized world, which in tum divides into other possible
worlds, and so on. To quote de Witt,
This universe is constantly splitting into a stupendous number of branches, all
resulting from the measurementlike interactions between its myriads of compo-
nents. Moreover, every quantum transition taking place on every star, in every
galaxy, in every comer of the universe is splitting our local world into myriads of
copies of itself. (De Witt, 1970, p. 161; page references are to de Witt and Graham,
1973)

Without irony-well, perhaps not wholly without irony-this can be


described as a wonderfully extravagant and poetic vision of the cosmos;
here imagination is bodying forth the forms of things not only unknown but
unknowable. But bold metaphysical speculation of this kind can be sub-
jected to various types of criticism. To impose a distinctly procrustean tax-
onomy, (1) the internal consistency of MWI can be challenged; (2) its philo-
sophical coherence can be doubted; (3) one can object to the lack of fit
between MWI and other physical theories; or (4) one can criticize it on
general methodological grounds. Criticisms of all these kinds have been
leveled at the many-worlds interpretation.
To consider an internal criticism first: after any interaction, so the account
runs, the world "branches" so that the interaction yields a number of possi-
ble worlds, or rather, as Everett emphasizes, a number of worlds all equally

• As J. P. Jarrett has pointed out to me, not all proponents of the " relative sta te" approa h
(Everett's term) accept the many-worlds interpretation of it; see, for exa mple, Ceroch ('1 984). I
discuss MWI from a slightly different perspective (a nd wilh grea ter charily) in Seclion 10.4 .
M CI/ HIII'e lll l' lt1 ' 9J

" a tu , I. " This br.lJ1Ching is determined by the s ta tes o f the systems in-
volved . Now a fea ture o f Eve rett's presenta tion is that, in an interaction, the
state of one system is specified with respect to the other; indeed, Everett
(1 957) called the interpretation the " 'Relative State' Formulation of Quan-
tum Mechanics." However, this specification of states is not symmetrical.
(This follows from an argument due to Cartwright, 1974.) In other words,
the set of possible worlds reachable from the perspective of one participant
in an interaction will not mesh with the set reachable from the perspective of
the other. There is thus no specifiable set of worlds into which the preinter-
action world divides.
Nice examples of criticisms of the second type are given by Healey (1984,
pp. 591-593), who spends several pages outlining the "antinomies" to
which MWI has been thought to give rise; with one exception, which I
discuss below, I will not rehearse them here. (Healey also discusses the
problem of space-time structure and the modal realist version of MWI; see
below.)
A criticism of the third kind has been voiced by Earma n (1 986, p. 224):
What has rarely been explored is the implication for space-time structure of taking
[MWI] seriously. To make sure that the different branches ca nnot interact even in
principle they must be made to lie on sheets of space-time that are topologica lly
disconnected after measurement, implying a splitting of space-time something like
that illustrated [in Figure 9.3]. I do not balk at giving up the notion, held sacred until
now, that space-time is a Hausdorff manifold. But I do balk at trying to invent a
causal mechanism by which a measurement of the spin of an electron causes a global
bifurcation of space-time.

Fig ll re 9.3 Splitting of space- time (from Earman, 1986, p . 225).


292 Tlie Ill/ a pre/alioll IIf ()/lII/I/II111 ,/,11/'/111/

No doubt the many-worlds theorist would reje t the de ma nd for a call sal
explanation, but, if he does, he needs to say what alternative he has up his
sleeve. Lacking one, he is open to the fourth kind of objection.
That is, even if advocates of MWI can respond to criticisms of the first tw
kinds, one is led by Earman's objection to doubt on general ground
whether the speculative metaphysics they offer provides a genuine answer
to a physical problem. In particular, I would question whether what has
been produced is anything more than a semantic model for probability
statements associated with the measurement process. In the last twenty
years philosophers have offered illuminating analyses of a great number of
modal concepts in terms of "possible worlds." (See Loux, 1979, for a careful
introduction to the literature.) To take a couple of trivial examples, a logi-
cally necessary statement is analyzed as a statement that is true in all possible
worlds, whereas a contingent statement is one that is true in some worlds but
not in others. Now probability is itself a modal concept (van Fraassen, 1980,
chap. 6, calls it "The New Modality of Science") and it too has heen ana-
lyzed in terms of possible worlds (Bigelow, 1976; Giere, 1976). The suspi-
cion that this kind of conceptual analysis is all that the many-worlds inter-
pretation supplies is strengthened by de Witt's claim (1970, p. 161) that "the
mathematical formalism of the quantum theory is capable of yielding its own
interpretation" (emphasis in the original).
But perhaps the many-worlds theorist could accept the description of his
enterprise as one of providing a semantic analysis of the probability state-
ments of quantum theory and claim nonetheless that it was true that each
measurement interaction resulted in a division of the world into multiple
copies of itself. Our possible-world analyses of modal concepts, he might
say, are not merely formal; on our best metaphysical picture of the universe,
this world is one of many equally real worlds. David Lewis (1986, p . 3)
writes,

Why believe in a plurality of worlds? - Because the hypothesis is serviceable, and


that is a reason to think that it is true. The familiar analysis of necessity as truth in all
possible worlds was only the beginning. In the last two decades philosophers have
offered a great many more analyses that make reference to possible worlds, or to
possible individuals that inhabit possible worlds. I find that record most impressive.
I think it is clear that talk of possibilia has clarified questions in many parts of the
philosophy of logic, of mind, of language, and of science-not to mention meta-
physics itself. Even those who officially scoff often cannot resist the temptation to
help themselves unabashedly to this useful way of speaking.

Lewis is not here discussing the many-worlds interpreta tion of quantum


theory (nor does he elsewhere in the book I hnv' just quoted from) . And he
M l'lIIl l1 fl '/11l ' /Il ( .J

r >ad il y n know l('dgl'/'I th 'l tm any will fi nd th • ontologica l price of his moda l
rea lism too mll h to pay (or th . th eor ' tica l bene fits it brings (p. 5). Let us
assume, however, that we are willing to make the purchase on the many-
worlds theorist's behalf. This still won 't give the theorist what he needs.
Consider the fact that, on Lewis's account, although all possible worlds are
equally real, for us only this world is the actual world. In the grand meta-
physical scheme of things, from the perspective of the Almighty, actuality
may only function as an indexical marker on the set of worlds (like " here"
and " present" across the set of points in space and time; pp. 92-94), but for
each observer there is only one actual world, the one which she inhabits.
Compare Everett's insistence that "all elements of a superposition (all
'branches') are 'actual,' none are more 'real' than the rest" (Everett, 1957,
p. 146n). This, it might be said, is a purely verbal difference: Everett uses
" actual" and "real" synonymously, where Lewis would use only " rea1." But
what, on Everett's account, has become of the world which is actu al in
Lewis's? If there is no such privileged world, then som thing odd happl' nll
to our conception of probability. For if all (relevant) venlS with no nz('ro
probability are realized in some world or other, then are not , lIth o~l{' j'V(' II L
certain of occurrence? (This was pointed out by Hea ley, 1984, p. 5':/3 .) And if
I wager on what the outcome of a measurement will b , will it not pay " li lt'''
to place my bet on whatever outcome is quoted a t the highest odds, wi th out
regard to the probabilities involved? We cannot just ay, (or exa mpl " tha t
there are three times as many worlds, and hence three lim s the lota l payoff,
corresponding to an event A, which has probability l' as there are corre-
sponding to event B, which has probability t, since no principle of indivi-
duation distinguishes one A-world from another. (Before an epidemic of
long-odds betting is upon us, however, I should add that even the National
Security Council would be hard put to divert funds from my Swiss bank
account in one world to its counterpart in another.)
These levities aside, we may ask what new understanding of the measure-
ment process MWI gives us. After a measurement each observer will inhabit
a world (for her the actual world) in which a particular result of the measure-
ment has occurred. And the "total lack of effect of one branch on another
also implies that no observer will ever be aware of any 'splitting' process"
(Everett, 1957, p . 147n). What is this observer to say about the physical
process which has just occurred? From where she stands, the wave packet
has collapsed no less mysteriously, albeit no more so, than before.
We are still left with the dualism that the interpretation sought to eradi-
cate. As de Witt (1970, pp. 164-165) himself remarks, the many-worlds
interpretation of quantum mechanics "leads to experimental predictions
identical with the (dualist) Copenhagen view." The difference is that any
294 'J'//(. 11I1/' rl"'/'IIII;oll oj (JIIIIIIIIIIII '1'llI'tlly

transition not governed by hrodingcr's eq ua tion is now a ompani cd by


an ontological cloudburst beside which the original modest dualism o f von
Neumann looks unremarkable, if not pusillanimous.

WIGNER's FRIEND

It is time to make the acquaintance of Wigner's friend . Wigner's account, the


last account of measurement I will discuss here, is a dualist account; mea-
surement produces a discontinuous change of state of the measured object
(Wigner, 1961). The radical difference between this and a more orthodox
view is that, to qualify as a measurement, an interaction must involve a
conscious observer. Note that this is not just an account of why the system
S + M seems to behave classically; it is not merely that the conscious ob-
server can register information only in a certain (classical) way. The event of
registration is not merely passive; it is this event which brings about the
projection of the system's state into an eigenstate of the observable mea-
sured. Wigner, a dualist with respect to the mind-body problem. sees the
measurement process as an example of mind-body interaction.
He illustrates his view with an example. As the system S he takes a
radiation field whose wave function "will tell us with what probability we
shall see a flash if we put our eyes at certain points, with what probability it
will leave a dark spot on a photographic plate if this is placed at certain
positions" (Wigner, 1961, pp. 173 - 174; page references are to Wigner,
1967). The system S is represented as having two eigenstates VI and V 2
which give probabilities 1 and 0, respectively, to the occurrence of the flash .
Thus, if the initial state of Sis some superposition V of VI and V 2 , then, on the
orthodox view, the registration of the flash, whether at the observer's eye or
the photographic plate, will cause the wave packet to collapse. On the other
hand, if the radiation field interacts with a quantum system M, such as an
atom, then the evolution of the joint system S + M is governed by the
Schr6dinger equation and the resulting state will be a superposition 'I' =
CI(V I ® u I ) + C2(V2 ® U2) of the kind we saw in section 9.5 . (Here U I and U2
are states of the atom.) As we saw there, this is inconsistent with the state of
S + M being either VI ® U I or V2 ® U 2 . Nevertheless, if an observer 0 now
performs a measurement on the system M, this will project M into an
eigenstate (u I , say). Since the systems are correlated, S will .:;.lso be projected
into V I, and the information received by the observer 0 will be equivalent to
the registration of a flash. (The consistency of direct and indirect measure-
ments is discussed further in Section 10.4.)
Wigner now considers the situation when the system S is observed by a
friend. Wigner can find out about the system by asking her whether or not
she has observed the flash . In doing so Wign r puts himself in the position
M t'IIIWrt' 1II1'1II 29;'

of th l' observer () ,Iml his fril'nd in th ' posili on of th e system M . His


" measun.'ml'nt " co nsists in aski ng he r w he the r she has seen a flash. But,
Wigne r continu es (pp. 179 - 180), a fter completing the whole experiment he
a n ask his friend, " Wh at did you feel a bout the flash before I asked you?" If
he does so, sh e will reply, " I told you already, I did (did not) see a flash."
Short of giving himself a solipsistically privileged position, Wigner must
accept that this report is indeed true, and hence that the interaction between
5 and his friend has already induced the collapse of the wave packet. The
friend and the atom are therefore radically different kinds of systems, and
(to quote Wigner, p . 180), " It follows that the being with a consciousness
must have a different role in quantum mechanics than the inanimate mea-
suring device, the atom considered above."
Stated in this way the conclusion is true but misleading. It has been shown
that an atom behaves differently from a conscious observer. However, it has
not been shown that the crucial difference is the consciousn ess of the ob-
server. His friend is not merely a single atom which (who?) h app n to b
equipped with consciousness; though the event la beled as "seeing a nash"
may have been triggered by an interaction involving one spe ific mol 'clIl l'
in his friend's retina, his friend is a highly complex orga nism o f ma roscopic
dimensions. What Wigner has done is to emphasize the fact, fa mili a r to
every dualist, that measurements, even of quantum systems, are to b
described in classical terms-where this means simply tha t statemen t
about measurement results are bivalent, either true or false. He has also
claimed that the place where the discontinuity between the quantum and
the classical worlds is located is in the distinction between a conscious
observer and an inanimate measuring device. But this claim his argument by
no means proves.
I do not wish to underestimate the difficulty for the dualist in specifying
where the quantumj classical cut is to be made, and I will return to the topic
in section 10.4. The strength of Wigner's proposal is that it points to a
difference in kind, the distinction between the mental and the physical, to
explain the discontinuity. But is this difference as clear-cut as Wigner as-
sumes? The weakness of his account is that it relies on a dubious theory of
mind and body, ironically the very theory which Wigner hoped to bolster by
his argument. If, contra Wigner, we accept a materialist theory of mind (of
whatever stripe), then Wigner's location of the cut between the quantum
and the classical worlds no longer looks so precise; it becomes another
distinction based on the size and complexity of a measurement system.
10
An Interpretation of Quantum Theory

Part One of this book gave an abstract summary of a physical theory; Part
Two has asked, what must the world be like if this theory accurately de-
scribes it? In this final chapter I offer a tentative answer to this question. In
Section 10.2 I present an interpretation of the theory which I call the "quan-
tum event interpretation"; in Section 3 I compare it with a version of the
Copenhagen interpretation; and lastly, in Sections 10.4 and to.5, I discuss
the relation between the quantum world and the classical, macroscopi c
world.
Prior to this, however, I consider the implications, some might say the
hazards, of working with an account of the theory that is as abstract as th
one presented in the first half of the book.

10.1 Abstraction and Interpretation


To a physicist the " theory" outlined in Part One would be very meager fare.
I have already quoted Cartwright's comment (1983, p. 135) on such ac-
counts: "One may know all of this and not know any quantum mechanks."
(See Section 2.8.) For example, the treatment of spin in Chapter 4 never
alluded to its physical significance; I never mentioned the fact that spin
contributes to the magnetic moment of a system and that, in consequence,
the Hamiltonian for the system will contain a term depending on its spin
and the magnetic field in which it is placed.
Indeed, in at least one regard, my discussion of spin has flagrantly over-
simplified matters. I have written as though the measurement of any com-
ponent of spin of a free electron presented no difficulties. This is not the
case. In 1929 Bohr showed that no Stern-Gerlach apparatus could perform
such a measurement, owing to the masking effect produ ced by the intera
tion between the electron's charge and th e magneti fi eld used to measure
the spin component (see Mott and Ma y, 1965, p. 2 15). In this rcs p 'cl [I
II III/t'Ilm'/ulillll tI/ (.ill/III/IIIII 'I'''t'OIY (7

In'(' d('clroll IwllllVI ' dl'''''t'!llly IfllIll thi..' -I' tri ally neutral atoms cxperi-
Il)(.'nl<.'d on by SIt'IIl "lid (;\·rl.lch . Bohr also laimed that measurements of
till' 'Ie tron 's spin ('ompOlll'nts w're, for conceptual reasons, impossible
(Ros -nfeld, 1971, in ohen and Stachel, 1979, p. 694). However, in the
1950s Crane devised a technique for performing such measurements which
\'vaded the problems of the Stem-Gerlach approach, and since then pro-
lon -proton pairs have been used by Lameti-Rachti and Mittig in experi-
ments to show that Bell's inequality is violated. (See Clauser and Shimony,
1978, pp. 1917 -1918; d'Espagnat, 1979.) There is also no masking problem
when the spin of a neutron is measured (see Leggett, 1986, p. 39). Thus spin
components are indeed measurable, though not as easily as I have sug-
gested.
To return to the threatened criticism, that my account has been too ab-
stract, the obvious response is to say that the aim of Part One was precisely
that of showing the abstract conceptual structure of the theory. Philoso-
phers of science may rashly tend to equate such abstract structures with the
whole of a theory, and thereby be led to mistakes of assessment, but that is
another matter. For example, the rejection of the wave picture urged in
Section 8.3 may possibly be a mistake of this kind; although at the abstract
level the picture is unhelpful, perhaps it is indispensable for pragmatic
reasons when physical applications of the theory are at issue. It may be so.
Nonetheless, although discussions of these applications would be needed to
flesh out an abstract, skeletal account of the theory and give it breath, all
these applications will involve a common set of mathematical models, and
these abstract structures repay investigation.
A separate question is whether or not the significant features of these
structures are being correctly identified. Here I am thinking in particular of
the importance attributed, both in Chapter 8 and in the remainder of this
hapter, to quantum conditionalization. In contrast, Teller (1983, p. 428)
suggests that the Liiders rule is simply a "fortuitous approximation," an
approximation because actual processes do not localize the state in precisely
the sharp way that the rule suggests, and fortuitous because "there can be
no uniform way, no formula which even in principle could be fixed in
advance for turning the approximation into exact statements."
I agree on both counts; how then can I resist Teller's conclusion that

If the projection postulate is a fortuitous approximation, we have no reason to think


lhat it gives even an approximate description of some one specific process which
might then stand in need of interpretation. (P. 428)

Teller's paradigm example of a fortuitous approximation is Hooke's law.


This law, that strain is proportional to stress (less esoterically, that the
deformation of n mnLcrial object is proportional to the load applied to it) is
298 TIll' III/I'rprc/II/io/l oj )/1/111/111/1 Till 'IIYY

approximately true of all kinds of ma terials in all sorts of onfigurations,


from steel wires to foam rubber mattresses. For Teller it is a for/IIi/oil s
approximation because there is no uniform way to correct it to allow for the
individual idiosyncracies of the materials involved. In this way, he suggests,
it differs from the uniform approximation afforded by, for example, the
pendulum law: T = 2nJfjg. We know wherein the approximation of the
pendulum law exists; its derivation involves the approximation that, for
small e, e = sine. Hence there is a clear-cut way in which it is correctable.
Hooke's law, on the other hand, is just a pragmatically useful approxima -
tion, roughly true for many materials below their elastic limit (Noakp.s, 1957,
pp. 141-142). Of the latter we can agree with Teller that (1) it is an approxi -
mately true law for which no uniform method of approximation exists, and
(2) it describes no single and theoretically significant process.
However, there is little reason to think that, in general, (1) entails (2).
Consider the ideal gas law, pV = nRT. This is only approximately true for
real gases. The models supplied by the kinetic theory of gases, wit!1in which
this law can be derived, represent molecules as point systems, undergoing
perfectly elastic collisions, and exerting forces on each other only during
these collisions. Molecules of real gases are not like that. In fact, so many
different idealizations are involved in the theory that there is no uniform
way to correct the ideal gas law. True, judicious choices of a and b make van
der Waals' law, [p + (a/V2)](V - b) = nRT, more nearly true for many
gases, but near the critical point it too goes astray, and whence are a and b to
be derived? (See Noakes, 1957, p. 375.) Nevertheless, the absence of a
systematic mode of correction would hardly justify our dubbing the ideal
gas law a fortuitous approximation, unless that term were shorn of its pejor-
ative implications and became merely a term of art.
The question of a uniform mode of correction is essentially subordinate to
another: are we dealing with an idealization within the models that the
theory supplies, or with an empirical approximation with no theoretical
support? Teller (1983, p. 428) writes that "virtually all descriptions of actual
processes idealize or approximate," but he does not thereafter distinguish
between the two. The Liiders rule, I suggest, like the ideal gas law, appears
as an idealization, Hooke's law as an empirical approximation.
Still, Teller could disagree; he could accept the distinction I have just
made and nevertheless claim that his arguments justify his placing th
Liiders rule and Hooke's law in the same basket. He points out that the fina I
projected state that the Liiders rule predicts for the system depends on its
initial state. Now, during the measurement process the system's state will be
continuously evolving in accordance with Schrodinger's equation . If lh '
system also suffers a discontinuous change of tate given by th Liiders ru\(:,
All Ill/alire/II/io/l (lj (.JIII/II/1I1I1 '/'/I/'ory 299

then, be a u ~l' thl' " inili,,) ~ 1,11l' '' is ontinuously changing, the result of this
projection will dep 'nd on the time at which it occurs. But there is no theoret-
ical reason to locate the projection at anyone time in the measurement
process rather than another: " No formulation of the projection postulate
tells us exactly at which point to apply it" (Teller, 1983, p. 425). Hence,
Teller could continue, there is no warrant for thinking of the postulate as
giving an idealization of a physical process.
This is a powerful argument, but it draws its strength, I think, from the
fact that Teller looks at the projection postulate solely in terms of its relation
to the measurement process. As I pointed out in Sections 9.4 and 9.5, the
question of the projection postulate is conceptually separable from the main
problem of measurement. Considered just as a constraint on accounts of
measurement, the postulate is a seemingly arbitrary stipulation which lacks
obvious links with the rest of quantum theory. On the other hand, if we
view the Liiders rule as Bub suggests, as the rule of conditionalization
appropriate to quantum event structures, we see it in a different light. As the
quantum analogue of the classical conditionalization rule, it is built into the
non-Boolean event structures around which quantum theory is constructed .
I acknowledge that the Liiders rule differs from the ideal gas law in an
important respect. The deviations of real gases from the ideal gas law hav
explanations (the finite size of actual molecules, their mutual attraction, and
so on); further, these explanations also tell us, in general terms, why van der
Waals' equation is an improvement. In contrast, we have no decent account
of when and why the Liiders rule is a less than adequate idealization, But my
reaction to this is not to revise my view of the Liiders rule, but to say that
quantum mechanics still faces a major empirical and conceptual task, that of
sorting out the relation of quantum systems to the classical, macroscopic
world. Take, for example, the simple case of an electron striking a dia-
phragm with a hole in it (as in Section 8.3), We need to know what it is about
the physical structure of a real diaphragm that makes the wave function of
an electron passing through it differ from the ideal localized wave function
predicted by the Liiders rule. But these gaps in our knowledge do not make
the rule a fortuitous approximation; the idealizations it relies on are those
assumed by quantum theory itself.
This fact, however, that there is no systematic way to explain deviations
from the Liiders rule, prompts a return to the question of the value of
abstract theory, since it hints at a deeper issue than the particular problems I
have looked at so far.
Duhem (1906) thought that what are often called "fundamental" physi-
cal theories (Maxwell's theory of the electromagnetic field, for instance) did
no more than provide a formal unifi ation of a wide range of phenomena.
300 ril e !II/ crl'r('/a/;II/I of UrI/III/II III 'J'IIt'ory

He endorses the view that "a physical theory .. . is an abstra t yst ' m
whose aim is to summarize and classify logically a group of experimental
laws" (p. 7). And he quotes approvingly Hertz's dictum that " Maxwell's
theory is the system of Maxwell's equations" (p. 80). Whether or not he is
right about Maxwell's theory, Duhem's description accurately fits the ver-
sion of quantum mechanics given here. This yokes together a disparate
group of phenomena in a purely formal way. The analogies between these
phenomena, one might think, do no more than allow a unified mathemati-
cal treatment of diverse aspects of nature; no further significance attaches to
them.
Certainly, the deployment of abstract analogies is part of the physicist's
repertoire. For example, in his Lectures on Physics Feynman introduces the
idea at an early stage, in his discussion of damped harmonic oscillations, by
displaying the pair of equations (Feynman, Leighton, and Sands, 1965, vol.
I, p. 25-8):

d 2x dx
(10.1) m dt 2 + ym dt + kx = F

(10.2) L d 2 q + R dq +i = V
dt 2 dt C

Equation (10.1) describes the mechanical motion of a mass m on the end of a


spring, under the influence of a varying force F; Equation (10.2), the electri-
cal oscillation set up in a circuit by a varying voltage V. The formal corre-
spondence between the two is evident. Any solution for one becomes a
solution for the other when corresponding terms are substituted for each
other (V for F, l/C for k, and so on). For Duhem, however, this is where the
correspondence ends; we simply have two disparate sets of phenomena
each of which can be modeled (mathematically) in the same way. There is
no more to be said.
Here our views differ. And the greater the number of fields that could be
modeled in the same way, and the more heterogeneous they were, the more
significant I would find it. But significant in what sense? To go back to the
example, it is dearly not the case that the same physical processes are at
work in the two types of oscillation.
The formal nature of the correspondence tells us that any underlying
commonality between the two exists at the most abstract, conceptual level,
and is to be found by examining the form of the mathematical equations in
which the analogy between the two is expressed. These equations are sec-
ond-order differential equations in x and q, respectively. Implicit in the us
of these equations is the assumption that both po ition (x) and charge (q) arc
All IlIlc/IIrt" lItio" IIf )11/,,/111111 T///'m.l! JO I

co ntinll()II ~,
a nd coni illIlOIl . Iy Ji(( 'r ' nti ablc, quantities. If similar differen-
tial equ ations appt·arl·<.! 1hroughout our fundamental physical theories, then
the implication would be that all physical quantities were continuous in
nature. This would then be a significant element within our metaphysical
picture of the world. In fact we no longer believe in the continuity of electric
charge, and so Equation (10.2) is in this respect misleading: charge is a
discrete, not a continuous quantity. The equation is a pragmatically useful
approximation, not a part of our foundational theory. (Recall the discussion
of Hooke's law.) That, I suggest, is the salient difference in significance
between the modeling of oscillations given by the two equations (particu-
larly the latter) and the models furnished by our abstract account of quan-
tum theory.
My point is this. Even if - or especially if - we accept Duhem's account
of physical theories, it is nevertheless worthwhile to examine the models a
theory employs, to see what metaphysical picture is implicit in them. This is
precisely what goes on when we look to Hilbert spaces in order to find a
categorial framework within which to interpret quantum theory. To this
end, the more abstract the presentation of the theory the better.
To seek such a categorial framework is the reverse of a process Duhem
elsewhere condemns, whereby physical theories are assessed in the light of
prior metaphysical commitments; instead, we are asking the theory to pro-
vide our metaphysics. Nonetheless, a resolute Duhemian skeptic might
insist that the search for a categorial framework was not a useful philosophi-
cal occupation. This itself, however, would betray a certain metaphysical
commitment, albeit one expressed in antimetaphysical terms. There seems
no a priori reason to think that the search should be either fruitless or
uninteresting. And should the skeptic persist-so weak is the power of
rational argument to persuade-one could only say, lilt was not you for
whom Part Two of this book was written."

10.2 Properties and Latencies: The Quantum Event Interpretation


In this section I will outline a possible interpretation of quantum mechanics,
which I will call the quantum event interpretation. That is, I will propose a
categorial framework whose elements find representation in the Hilbert-
space models the theory displays.
The categorial framework I will outline can be compared to that found
within classical mechanics, where we see, first of all, a distinction between a
system and its attributes (or properties), and, second, a fully causal account of
processes. Analogues of all these elements appear within the quantum
event interpr tation of quantum theory.
302 Til t' illl t'rp rt' ltll iu lI of (.) /1/111111111 '1'11"111 .'/

The concept of a system I hay . ta ke n for granted; presupposed by lhc


representation supplied by the theory is the assumption tha t parls of lh '
world are, for the purposes of theory, isolable. However, this presuppo i-
tion is very much an idealization; indeed it is challenged by the theory itself.
In quantum theory, coupled systems are more than the sum of their parts,
witness the behavior of the coupled systems used in EPR experiments. To
isolate a section of this world is to say that its couplings to other systems
have become so attenuated that we may disregard them. This said, the
notion of a system will not be further examined.*
The classical notion of a property is inappropriate to quantum theory, as
was seen in Chapter 6. Nor is much interpretive work done by retaining this
notion within a quantum-logical framework, for the reasons given in Sec-
tion 7.9. Heisenberg and Margenau have both suggested that quantum
theory requires instead a concept that recognizes the inherently probabilis-
tic nature of the quantum world. Heisenberg (1958, p . 53) talked of " tend -
encies" and suggested that these resembled the " potentia" of Ari&totelian
science. This particular analogy is very remote; instead, I am adopting the
term "latency" suggested by Margenau (1954).
The properties of classical systems were summarized by its state. Given
the state we could predict the values which ideal measurements of observ-
abIes would reveal. The latencies of quantum physics are also represented
by the state-here the state vector. These latencies assign probabilities to
measurement outcomes. We term these measurement outcomes quantum
events and no longer treat them as corresponding to possible properties of
the quantum system. A quantum measurement should be regarded neither
as revealing a property of the system nor as creating that property, for the
simple reason that quantum systems do not have properties. Rather, the
measurement involves the realization of a particular quantum event from
the Boolean algebra of such events associated with the measurement appa -
ratus. And, although only one Boolean algebra of events can be selected at a
time, the latency represented by the state vector determines probabilities for
a whole orthoalgebra of events.
The so-called wave-particle duality of quantum systems, shorn of its
mechanistic associations, fits naturally within this interpretation. For, as
Born pointed out, the "waves" which the wave-particle account portrays as
spreading through space are " probability waves"; the square of the ampli-
tude of the wave at any point in space gives the probability of finding the
"particle" there. Similarly, to ascribe a latency to a system with respect to its

• But see Teller (1989). And, in addition, Ned Ha ll has point ed out to me the problems ra ised
by the Pauli exclusion principle.
All 1IIII" prt'llllillll 01 (2 111111111111 '['ltl'or!! JO.1

position is just to s"y tlhlt thefe ill an ex tended region of space within which
(lwr ' is a nonzero probability of finding it. The wave formalism offers a
convenient mathema tica l representation of this latency, for not only can the
mathematics of wave effects, like interference and diffraction, be expressed
in terms of the addition of vectors (that is, their linear superposition; see
f.eynman, Leighton, and Sands, 1965, vol. 1, chap. 29-5), but the converse
also holds. Clearly, this mathematical equivalence is independent of the fact
that vectors can represent probability assignments; hence the propriety of
ta lking of the "interference effects" obtained in, for example, the two-slit
experiment. In contrast, "particle" effects typically occur when position is
localized; in other words, when a quantum event occurs, latency is actual-
ized and the "wave packet" collapses.
Thus the quantum event interpretation offers both an abstraction and a
generalization of the thesis of wave-particle duality; on the one hand, it
severs the thesis from its classical nineteenth-century antecedents, and, on
the other, it accommodates all quantum observables, not merely po ition
and momentum.
The sense in which a latency is a natural probabilistic genern lization of a
property can be made more precise. Although the exact ontologi al sta tu s of
a property (greenness, for example) may be questioned, one thing is not in
dispute (Staniland, 1972). If we say of a billiard ball that it is green, the n Ollr
statement entails that, if viewed under normal conditions, it will have a
certain appearance; simply put, that it will appear green. In classical physics,
the ascription of a property to an object entails the truth of various condi-
tionals of the form, "If an (ideal) measurement of A is made, then the result
will lie within Ll." I will call such a conditional a "measurement conditional"
and write it as MA -+ (A,Ll). (A,Ll) is, as usual, the event that an A-measur-
ing device gives a result within Ll.
A complete description of a classical system would give us all its proper-
ties, so that every measurement conditional would be assigned "True" or
"False."* (This description is familiar from Chapter 2.) In contrast, the
ascription of a latency to a quantum system entails the truth or falsity of a
host of conditionals of the form:

MA -+ [p(A,Ll) = x]

Such quantum measurement conditionals also carry reference to a set of


events (A,Ll), but this set, as we have seen, has a radically non-Boolean
structure; it has the structure of the set of subspaces of a Hilbert space. The

• 1 am ..dying I"'rt· on an inluilive account of the truth -conditions for conditionals.


()4 '/'III' JIIIN/m'll/lioll oj 0111//111111/ 'J'//('ory

result is that we ca n never find a probnbiliLy fun Lion p su h Lha t, fo r cv ' ry


event (A,~), either p(A,M = 1 or p(A,Ll) = O. That is to say, thes latcn i 's
can never be reduced to properties.
The pure states of quantum mechanics give complete descriptions in the
following sense. (1) A pure state assigns to each quantum measurement
conditional a value "True" or "False," and (2) the probabilities occurring in
these conditionals are not just epistemic probabilities, but objective propen-
sities. We can regard the description as complete with respect to latencies,
rather than to properties. In the case of a mixed state, the answer is not so
clear-cut. If the mixture can be given an ignorance interpretation, then it
does not give a complete description since (2) fails; if an ignorance interpre-
tation is ruled out, then, on this interpretation, there is no reason to think
that the description that a mixture provides is less than complete. Of course,
quantum theory is not complete in Jarrett's sense of the word (see Section
8.6); one could argue, however, that it is complete in Einstein's sense after all
(see Section 6.2): whenever the probability of an event (A,~) is one, then
there is a state specified by the theory whose support with respect to A is a
subset of ~, and only when the system is in that, or one of those, states can
one predict the value of A with certainty without disturbing the system.
How does the latency of a system change? In two ways, which exactly
match the two modes of evolution of the state function, as von Neumann
depicted them. Latencies, like state vectors, can change continuously or
discontinuously. The first type of change is not causally problematic. The
second is. In the first place, it is stochastic; since the event which induces it is
not in general determined by the state, but has just a nonzero probability of
occurrence, it differs in at least one respect from a classical cause. Second, it
may be nonlocal; the kind of transition which (on the account given in
Section 8.8) characterizes EPR experiments is an example.
I don't want to underestimate this last problem; a discussion by Shimony
(1986, pp. 193-196) shows just how severe it is. Assume that, in an EPR-
type experiment, the measurements performed on two systems a and bare
simultaneous in the laboratory frame of reference; call the events associated
with these measurements Ea and Eb , respectively. Then the special theory of
relativity (STR) tells us that there is a frame of reference ":fain which E"
precedes Eb and another frame ":fb in which Eb precedes E• . But, on th e
interpretation I am offering, this means that within ":f. the event E. occasions
a projection of the state of b to a new state (with support Eb ) prior to the event
Eb • Within ":f b , on the other hand, this projection is produced, if at all, by th
event Eb itself, and Eb also occasions a change of state of a.
There is no outright inconsistency here; however, the oc urr n cor non -
/\/1 1III,'IIIY/'II//i,II,II! (JIII/II/IIIII '1'Il/'or!! .J05

occurrcn l' o( a ch.ul!;l' o( 1.lkOCY b' omc8 (ram(' r 'lativ ' , and this certainly
off ' nds the spirit, if nol I h ' I 'ller, of STR .
. Indeed, at this point I can hear the objection that the interpretation of-
fered has just too many unpalatable features. On the one hand, so the
riticism runs, nonlocal conditionalization might be acceptable as a conve-
nient mathematical way to summarize the correlations associated with cou-
pled systems; on the other, the suggestion that there is a new ontological
category called "latency" seems fairly inoffensive. But when it transpires
that (1) these physical significant latencies can be changed by nonlocal
actions, and that (2) these alleged changes are not relativistically invariant,
that is just too much to swallow.
Not much can be said, I fear, to sweeten this particular pill, but perhaps
we can say more on behalf of the individual ingredients which together
prove so distasteful. To reiterate what was said in Section 10.1. in seeking an
interpretation of a theory we start from Duhem's thesis that a theory pro-
vides an abstract summary and logical classification of a group of experi-
mental laws. However, that is only where we start. Though our final con-
victions may be instrumentalist, we are setting these attitudes asid for th
time being and asking, what sort of world could be represented by the
mathematical models the theory provides? Further, if we are not instru -
mentalists, we may hope that this way of proceeding sidesteps Duhem's
argument that, since "explanations" are formulated only with respect to a
set of prior metaphysical assumptions, to think that theories provide expla-
nations is misguided. We perform this sidestep by looking within the theory
for the categorial framework it suggests, and which is to be appealed to in
explanations.
Within quantum mechanics we find, in a word, probabilities. However,
the probability functions the theory uses cannot be regarded as weighted
sums of dispersion-free probability functions - that is, as weighted sums of
property ascriptions; quantum theory is irreducibly probabilistic. Rejecting
properties from our categorial framework, we replace them with their prob-
abilistic analogues, latencies. But why replace them with anything? Why
grant ontological status to these remote and shadowy quasi-attributes? A
specific argument for doing so will be offered in the next section; mean-
while, here are some general considerations.
We invoke latencies for much the same reasons that, in the macroscopic
world, we invoke properties. Attempts to give a purely phenomenalistic
account of properties notoriously failed (see, for example, Hirst, 1967); a
property ascription is more than the logical product of a set of conditionals
of the kind, "If I were looking at object X now, under normal conditions of
JOb TIll' 11I11·tll rl ' ll/liO!1 III (...J III/IIIIIIII 'I 'III 'II IY

illumination, then J would b ' hav ing 14cl1 sa tions of greenn '514." 'imil nrl y for
latencies; these too license infinitely ma ny subjuncti ve conditi ona ls (of
which a proper subset are quantum measurement conditionals), but, for
much the same reasons, are not reducible to them .
What then of the projection postulate? This too emerges from the non -
classical nature of the probability spaces we deal with. Regarded not just as a
postulate applying (occasionally) to the measurement process, but as the
quantum version of conditionalization, it provides explanations of thc
otherwise inexplicable. In Section 8.9 I called these explanations "struc-
tural" but, if conditionalization is seen as a change in the latencies of a
system, they also acquire an ontological foundation.
It turns out that there is a price to be paid. Some of the conditionalizations
which figure in these explanations are nonlocal: latencies may be affected
by action at a distance. Even though stochastic Einstein-locality is respected,
the price may seem too high. The interpretation may still violate too many
intuitions. But so may quantum theory. And, like Isabella on a different
occasion, the fierce defender of intuitions may have got his priorities wrong.
After all, what's so hot about intuitions? Aren't these the folks who gave us
Bell's inequality? Duhem would have had little truck with them.

10.3 The Copenhagen Interpretation


The quantum event interpretation should be distanced from the " Copenha-
gen agnosticism" van Fraassen (1985) advocates "with respect to wha t
happens to measurable physical magnitudes when they are not being mea -
sured ." Bohr's expression of this agnosticism was discussed in Section 7.9.
There we also saw how it could be set out in algebraic terms. In the same
vein, van Fraassen and others offer a "Copenhagen approach" to quantum -
mechanical probabilities. On this approach, the fact that there is no simple
Kolmogorov model of probabilities involving incompatible observables A,
B, and C does not mean that we must jettison classical probability theory; it is
a result of the fact that such observables are not jointly measurable. Thc
apparent departures from Kolmogorov probability theory that we find in
quantum mechanics occur because quantum probabilities are all condition a I
probabilities; p(A,~) should be read as, " the probability that a result within
~ will be found, conditional on anA-measurement being made." We obtain
a perfectly good Kolmogorov probability space for incompatible observ-
abIes A, B, and C, so the story runs, by partitioning a classical probability
space Q into three mutually exclusive sets of events, corresponding to mea-
surements of A, B, and C, each of which forms a Boolea n algebra.
Assume for the sake of argument tha t cach of A, B, and C has two va lli '14,
II" 1"'1"1"'1'111111111 II/ 0111111111111 '1'/11'11111 ,1(}1

,1Ild ,,1 11 Ihelll (II, ,II,), (II, ,/J ,), (C, ,t', ), rl'Sp ' "Iiv 'Iy . Th 'n a finer six-way
p.II'litiol1 (II, ,1I" Ii, ,/J ,,(' , ,(,) o( 12 is .wailable, and quantum probabilities
.lppeJr according to the n.'eipe ((or observable A, in this example):

where PK is a Kolmogorov probability function defined on n.


In this formula the term a1 U a2 represents the event that a measurement
o( A takes place. With a slight abuse of notation we can write

Note that the event a 1 U a2 is not identified with the events bl U b2 and
(', Ue2 , as it would be in the construction of an orthoalgebra of quantum
events (see Section 8.1). On the contrary, these three events are mutually
exclusive.
The example may be generalized. Thatis, given any generalized probabil
ity function p defined on an orthoalgebra A, the probabiliti s Jl assigns 10
members of A may be reproduced as classical conditional probabilities on .)
Kolmogorov probability space as follows. Consider the family {i3{} of maxi
mal Boolean subalgebras of A . We embed these algebras individually in J
Kolmogorov probability space in such a way that their maxima are mutually
exclusive and jointly exhaustive: I j n Ii = 0 when i =1= j, and UJj = n. (I;) is
thus a partition of n, and if PK is any classical probability function on n, then
LiPdI j ) = 1. To reproduce the probabilities assigned by P to members of A,
we stipulate that, for any event e in 13 j ,

uch a probability function PK always exists, but since the assignments PK(I j )
are arbitrary (though they must all be nonzero), PK is not uniquely defined
by p.
To summarize. The Copenhagen view of quantum theory and the quan-
tum event view differ significantly in their treatment of probabilities.
Whereas on the quantum event view probabilities in quantum mechanics
are assigned by generalized probability functions to members of an orthoal-
gebra A of events, on the Copenhagen view the underlying probability
space is classical. This classical space is coarsely partitioned, each member of
the partition being the event that a particular measurement occurs, and each
rorn.'spo nding to a maximal Boolea n subalgebra of A. Probabilities are
.lssigl1l'd 10 pvt' nt s in this cia ical spa (' by a Kolmogorov probability func-
DB Thl' IlIlerllrellllioll of ),1111,/,1/11 '/ '11/'/111/

tion, and quantum-mechani al probabilities now a ppea r as onJilional


probabilities, each conditional on some event in the coarse pa rtition .
From the point of view of the quantum event interpretation, thi con
struction is not only formally respectable, but in some circumstances physi-
cally significant. Assume, for example, that we are dealing with an experi-
ment, like the Aspect experiment described in Section 8.6, in which there is a
probability Pe(MB) that a B-measurement will be performed, and so on; Pc is
of course a Kolmogorov probability function. Assume further that the state
of the system assigns probability q to some event (A,a;) according to the
usual algorithm, by specifying a generalized probability function on the set
of quantum events. We can now construct a Kolmogorov space in the way
prescribed, on which the function Pe yields an "absolute probability" for the
result a;, for example, according to the formula

Pe(a;) = p,(MA) . q

On the quantum event interpretation the equation holds because (1) the
state makes the conditional MA ----. [p(A,a;) = q] true, and (2) MA has proba-
bility Pe(MA) . On the Copenhagen interpretation we obtain the same equa-
tion, since

q = P (a 'IMA) = pia;)
e J Pe(MA)

In the light of this one may ask, what does the quantum event interpreta-
tion achieve that a Copenhagen interpretation does not? What is gained by
the appeal to arcane nonclassical algebraic structures, let alone by the invo-
cation of dubiously metaphysical "latencies"?
The same question was raised at the end of Section 10.2, and I can now
amplify the answer given there. One specific achievement is the ability to
talk of the probability of one quantum event conditional on another. On the
quantum event interpretation, to ask what the probability is that a measure-
mentof A will yield result a;, given that an event (B,b j ) has occurred, is to ask,
for what value of x is the statement MA ----. (p[(A,a;)I(B,bj )] = x) true? Since p
is a generalized probability function (GPF) defined on the set of subspaces
of a Hilbert space, the conditional probability p[(A,a;)I(B,bj )] is given
straightforwardly by the Liiders rule. Chapter 8 demonstrated just how
fruitful the application of this rule can be. In contrast, on the Copenhagen
approach we have no ready means of dealing with sequences of events;
PK(a;lbj ) will always be zero if A and B are incompatible.
More generally and fundamentally, the Copenhagen approach offers no
All IlIlt' IIm' /1I111111 II/ Olltllllll", 'f'/I/'/lly J Ot

,KCOllllt Li t ,III o ( th t' ,,'llli()J\/i lwtw '('11 illcompJtiblc obs ·rva bles. There are
probability (un ' ti ons 11K, ddinabl • on th e Kolmogorov space .Q constructed
a ording to th o p 'nhagcn pre cription, which do not generate quan-
tum -mechanica l proba bilities. To return to our earlier example involving
observa bles A, B, and C, a perfectly respectable classical probability measure
on the partition {a j ,a 2,bj ,b2' C1,c2} assigns to each of aI' b j , and Cl the value
;\, and to each of a2 , b2 , and C2 the value rs. This would yield the quantum
proba bilities

Yet if A, B, and C are the familiar components of spin, Sx, Sy, and Sz ,
respectively, no quantum-mechanical state assigns probability t to the posi-
tive value of each observable. (To be precise, no quantum state simulta-
neously assigns to all three events probabilities greater than 1/2-l
.[3/6 = 0.786.)
The Copenhagen interpretation offers no reason why such assignm ' nts
are ruled out. In rewriting the probabilities assigned by any cpr to ' Iem ' nt s
of an orthoalgebra as conditional probabilities defined on a cJas ica l proba
bility space, it takes no account of the fact that quantum mechani s uses
orthoalgebras which have a very rich structure; each is isomorphic to th e set
of subspaces of some Hilbert space.
Not only does the quantum event interpretation regard that fact as cen-
tral, a partial explanation of it has already been offered which leads natu-
rally to the concept of latency.
The ascription of a particular latency to a system assigns probabilities to
the values of a family of observables. With this in mind, consider the analy-
sis of spin in Chapter 4. The question that chapter asks is, what are the
results of assuming that the probabilities associated with a particular family
of observables are constrained in ways suggested by "natural" symmetries
- the isotropy of space, for example? The answer is that only if all the
observables in the family can in some sense be regarded as components of a
vector is a model of the set of events available which uses the full represen-
tational capacity of a Hilbert space; a condition must be put on the probabili-
ties associated with the component observables, analogous to those obtain-
ing when we deal with classical vector quantities. Equations (4.10) and
(4.1 1) give equi va le nt statements of the required condition.
My suggestion is that we think of this intricately related set of probabili-
ti ' 5 as det 'rmi ned by some one fea ture of the system, and give the name
" Iat ·ncy" to th is (eature. Aga in, la tcn ies appear as the probabilistic ana-
3 /0 'J'll/' lilll'I/I/'I'IIII;III1I1/ (.)111111111/1/ 'J'll/ 'IHI/

logues of pro perties. In I. SS iC.11 IlH.'c ha ni s n ve tor pro pt' rt y, th .lt is, ,1
particular value of a vector qu a ntity like momentum, de termin es th e Va ltl l'S
of all components of that qua ntity. Analogously, in qu a ntum th eory, th e
latency associated with, say, spin determines the proba bilities a signed to
the values of all its component observables.

10.4 The Priority of the Classical World


The quantum event interpretation has this in common with the Copenha-
gen interpretation: both assume a classical world which is in some sense
prior to the quantum world. But the kind of conceptual priority granted to
the classical world needs to be spelled out with some care.
First, there is the question of specificity. Bohr, in particular, thought tha t
any statement about quantum systems acquired meaning only in the context
of a particular experimental procedure (see, for example, Bohr, 1949, pp.
218,222). The quantum event interpretation, on the other hand, suggests
that quantum theory carries with it an implicit reference, not to particular
procedures, but to a set of events which are associated with classical devices
of some kind or other, and with respect to which latencies are defined.
The second question is the content of the term "classical." I have already
(Section 8.3) criticized Bohr's insistence that physicists must restrict them-
selves to the concepts bequeathed to us by late-nineteenth-century physics.
The kind of conceptual priority assumed by the quantum event interpreta-
tion is of a more abstract, structural kind. It allows for the possibility of
quantum concepts which lack direct analogues in classical physics. What it
takes from classical physics is the bare concept of an observable which can
take different values.
At the risk of repetition, I will spell out what this entails. The quantum
events to which the theory assigns probabilities are all of the form (A,~); all
events involve reference to some observable A and, if A is an observable,
then every pair (A,~) represents an event-though this may be the null
event even when ~ is not the empty set, as in the case of the electron-event
(5",[1,2]). As we saw in Section 7.5, the classical concept of an observable
thus imposes considerable structure on the set of events; this set is divided
into Boolean algebras, each associated with an observable. Any measure-
ment apparatus is associated with some observableA. When A, the operator
that represents A, has a discrete spectrum, an ideal measurement apparatus
would discriminate between all the distinct quantum events associated with
A, but when A has a continuous spectrum this is, of course, impossible.
Whether our apparatus is sensitive or insensitive, however, it is a classica l
device in two linked senses. First, the events it registers will form a Boolea n
1\11 11111 " 111'1' //1/11111 O/ l )IIIIIIIIIIII 'I'II/'IJ/I/ ,III

,il gl'bra, ,1nd , 0 llil' St,IIt'llll'lll s dl'sn 'ihin g tlH's(' l'VI' IlI s w ill .1llow bi v,lll'nl
Irulh nssignml'nl s. Sl'(,'( md , its indi 'ntor sta tl's .lrl.' da ssic,11 stnll's, and ca n
be th ought o f ns n pa rti nl Ii I of ils pro perties. This se o nd fea lure is in fn I
enlailed by th e first.
We see tha t, although the latencies of quantum theory a re late ncies with
re pect to a set of events with a thoroughly non-Boolean structure, never-
lheless the set of events realizable at any given juncture - namely, the set of
events associa ted with any experimental situation - will form a Boolean
a lgebra. While the contribution of quantum theory is to show that th e set of
all events, together with the states that assign them proba bilities, ca n be
represented in a Hilbert space, the first requirement of this represent a tion is
tha t it respects the classical structure of the set of events asso inll'd with ,1
gi ven observable. This is the sense in which, on the qu a ntum 'vent in!l'r
pretation, the classical world is conceptually prior to th 'qu a ntum wll l'ld
Implicit in quantum theory is a reference to a c1assicnl world . !lut wlll'll' 1/
the boundary to be drawn? And what is the r >lali on IWlwt't' 1) 11\1' wll1'ld,
Wigner, as we saw in Section 9 .8, located th ' bound ,lry li t till ' 1,'v,'1 lI t
consciousness; the only classical device was a conscious obHl' rv l'l' III II II Ii, II
undesirable on two grounds, (1) that it is too subje li ve for 0111' I.lsh'll, "".1 (J)
that it relies on a dubious distinction between mind a nd bod y. III 1'1)111111 It,
the original Copenhagen interpretation assumed a self-e vidt'nl dislilw tlll\l
between the quantum and the classical worlds. This, howev r, is unh elpflll
to those to whom the grounds for such a distinction do not imm edi all'l y
reveal themselves. Quantum systems, we may say, are smaller tha n rna ro
systems: an electron is paradigmatically the kind of system treated in qu a n-
tum theory, a piece of polaroid plus a photographic plate can act as a
classical measuring device. But is there a number N such that all systems of
N particles are microsystems, whereas all systems of N + 1 particles are
macrosystems? That sounds implausible.
One of Everett's aims, in his "relative state" formulation of quantum
mechanics (see Section 9.8), was to present the theory in such a way that this
"cut" between the quantum and the classical worlds disappears. Quantum
theory, on this account, would be a global theory; it would not be concep-
tually improper- as it is on the Copenhagen interpretation - to talk of the
" universal wave function" (Everett, 1973). In implementing this program,
the " relative state" formulation ran into a difficulty (see the appendix to
Shimony, 1986): if the " branching" of the universe was to correspond
properly to the (apparent) collapse of the wave packet, then, contrary to
quantum mecha nics, there had to be one preferred basis in which the states
of measurement ystems were represented . Certain systems, in other words,
could not bl' pro pe rl y accommoda ted within the theory; the "cut" which
312 rill! II/I apre /o/iol/ oJ J/lIIII/1I1II 'J'III'My

Everett sought to eliminate did not disappear after all, and the probl '01 of
the relation of the classical world to the quantum world was still with u .
Everett's interpretation provokes the question whether we ca n talk
meaningfully, as he thought we could, about the "universal wave func-
tion." If, as I have claimed, a reference to a classical world is implicit in
quantum mechanics, does this mean that this kind of talk is conceptually
confused?
It does not. I have argued that a quantum-mechanical state represents, at
least in part, dispositions to behave in certain ways in interactions with
certain classical systems. These dispositions do not go away if the interac-
tions are not realized; and even if, in our present universe, these dispositions
cannot be realized, we can still speak counterfactually about what would
happen were our universe to be embedded in another. It is thus not incoher-
ent to suggest that the universe has a quantum-mechanical state which is
unfolding as it should, even though there is (by definition) no external
material agency available to scrutinize it.
However, before arriving at a wave function for the universe, we need to
obtain wave functions for its components - including those which, as mea-
surement devices, furnish the classical world within which the latencies of,
say, an electron are realized. This confronts us with the measurement prob-
lem in its abstract form: if a particular set of quantum events is defined by
reference to the classical behavior of a given system, can we give a quan-
tum-mechanical account of that system?

10.5 Quantum Theory and the Classical Horizon


The question is this: if a system could function as a measuring device, need
this rule out the possibility of describing it in quantum-mechanical terms? (I
am here speaking of the kind of description a Laplacean supermind might
furnish; we cannot give a fully quantum-mechanical description of a large
molecule, but that's our problem.) What are the consequences, we may ask,
of allowing the boundary between the quantum and the classical worlds to
float, so that it can be drawn and redrawn wherever we will, above a certain
point? In order to proffer a quantum-mechanical description of a measure-
ment device, may we not just redraw the classical horizon so that the system
now falls below it?
Quantum theory may require that we divide the world into two. Does it
forbid us from making the location of the line a matter of convention, so that
a particular system may lie now on one side of it, now on the other? The
claim that it does not, I will call the thesis of the conventionality of the
classical horizon.
All 11111' IIII'l'llIlitlll IIf J'IIIIII"", 'f'/11 'OI!! J IJ

IA' lus Sl't how Ihis th 'sis bears on lh 'onulysitl of meusurement. Assume
Ihot 0 m 'u 'uring 'ysLcm M interacts with a quantum system 5 and that we
des ribe this intera tion as a measurement by M of the observable A for 5,
Then, according to the quantum event interpretation, the quantum event Ej
occurs, where Ej = (A,a j), for some aj. Thus, when we describe M classically,
Ej occurs.
What happens if we now describe M quantum-mechanically, as, accord-
ing to the conventionality thesis, we may? We now portray 5 + M as evolv-
ing according to the Schrodinger equation; no quantum event occurs unless
5 + M interacts with a new measurement system M* which lies above the
new classical horizon and measures the value of some new observable,
either for M or for 5 + M. Nonetheless, von Neumann (1932, pp. 436-442)
provides an argument to show that there is a sense in which the two ways of
regarding M are equivalent.
Using the notation of Section 9.6, we assume that A has two values, and
that the eigenvectors of A in 71 s are v+ and v_ . The states of Marc Uo (th '
ground state) and the two indicator states u+ and lL. When we represent M
quantum-mechanically, u o, u+, and u_ become orthogonal v tors in '11M .
Prior to the interaction with M, let the initial state of 5 be v = r I v I I (' v ;
we assume M to be in its ground state. then, if we regard M Iassi ally,
quantum theory suggests thatthe event E+ = (A,+) will 0 curwith probabil
ity Ic+12; conditionalization on E+ projects the state of 5 into v +.
Let us now regard M as a quantum system. Consider the observable AM for
systemM whose eigenvectors (in 71M) are u o, u+, and lL, with eigenvalues
0, + 1, and -1, respectively. If we apply the Schrodinger equation to the
interaction between 5 and M, we obtain

Assume that, when 5 + M is in state '1'1' a measurement of A is performed on


5, and a measurement of AM on M. It was shown in Section 9.6 that the
results of these two measurements will be correlated. Hence to measure
A ® AM on 5 + M it suffices to measure AM on M.
If we now" observe" M with apparatus M *- that is, if we measure AM for
M - it turns out that, when 5 + M is in state '1'/1 the event E~ = (A M,+) has
probability Ic+12. Conditionalization on E~ projects the state of M into ~
and also, via the correlation of 5 and M, projects the state of 5 into v+.
Thus whether we think of the measurement of A as being done directly or
at one remove, the probability of obtaining the value (A,+) is the same, and,
in ither cas ,conditionalization on the associated event projects the state of
5 into v I .
3 14 Till! II/t erpre tatiol/ of Q IIIIII/IIIII '1'111'011/

Despite this reassurance, we still face a problem. Assume that we usc M "
to observeM, and that we obtain the result (A M ,+). We are here regarding M
not as a classical measuring device but as a quantum system. The question is,
in this situation does the event E+ occur or not? It seems that, by decicting to
draw the classical horizon below M rather than above it, we can bring about
the event E+ ; in other words, it seems that a conventional choice of horizon
has an ontological consequence. Prima facie, this seems to bode ill for the
conventionalist thesis.
In fact, as this analysis shows, it is the conventionalist thesis that creates
the measurement problem. Note that only the least contentious aspect of the
quantum event interpretation - the claim that the registration of a value by
a measurement device can be called an event - is invoked in the generation
of this problem. If, adctitionally, the projection postulate is accepted, then a
further problem appears: does the state of S change to v + as a result of the
interaction with M or not?
One strategy open to the conventionalist is this. He may say that when S
interacts with M the quantum event (A,+) occurs, leaving M with the prop-
erty corresponcting to the positive value of A (call this property A+). To say
this is to describe M in classical terms. This does not rule out the possibility of
describing it quantum-theoretically, he may continue, but if we do so we
forgo two things. We can no longer speak of a quantum event occurring,
since that would involve reference to a classical system, nor can we speak of
M as having a property. However, this means neither that no quantum
event has taken place, nor that M does not in fact have a property; it is rather
that quantum mechanics only allows us to speak of latencies. When MOO
"looks at" M, we can describe that interaction classically: MOO tells us what
property M has; we may also describe it quantum-mechanically, as the
occurrence of the event (A M ,+). These two modes of description are, again,
two alternative ways to describe M.
There are two things to be said about this suggestion. The first is that it
does not entirely dissolve the problem; it shifts it to a new location. It breaks
the "property-eigenvector link" usually assumed to hold of measuring sys-
tems. We describe M classically as having the property A+, or we describe it
quantum-mechanically as being in the eigenstate u+; the assumption is
usually made that M has the property A+ if and only if it is in the indicator
state ~. On the suggested analysis, the conctitional holds in one direction
only: if M is in the state ~ then it has the property A+. However, it may
also have the property A+ even when it is in the mixed state OM =
Ic+12p~ + Icl2p~ as a result of its interaction with S (see Section 9.6).
Second, although the suggestion allows us Lo deal with properties, it will
not work for the projection postulate. Wh en'ns we cn n say without in 011 -
1111 111/1" 1 /1 ' /11/11111 IIJ (,) 111111111111 TlII'III!! .l /!i
'

sis tt' I1CY th.lt M II .IS tl\(' prolwrty II' w hen it is in th • mixed sta te OM, we
. ca nnot sny th nt S is both in th e pure sla te v, and in the mixed state
OS - Ie,1 7 p ~ + Ie. 12p s .
The stra tegy, toge ther with the two corollaries just mentioned, moves us
ve ry cl ose to van Fraassen 's " modal interpretation" of quantum theory. Van
Fraassen (1974a, pp. 300 - 301) presents in the formal mode what I have put
in ontological terms:
We distinguish two kinds of statements-state attributions and value attribu-
tions . . . The state of the system describes what is possibly the case about values
of observables; what is actual is only possible relative to the state and is not deduci-
ble from it.

Van Fraassen is happy to reject the projection postulate (p. 299) and, al-
th ough he does not write in quite these terms, the severing of the property-
eigenvector link appears as a small price to pay for allowing a measurement
device to be given both a classical and a quantum-theoretic description .
Ingenious though this interpretation is, I do not think it is right. I sny this
with some reluctance, since, as we shall see, it solves a number o f intrn tnbl l'
problems . My reservation stems in part from a belief in the ex pl nnntory
va lue of the Liiders rule (which in one guise acts as a projection postul nte),
a nd in part from a belief that van Fraassen's partial re jection of the prop
erty-eigenvector link does not go far enough. I consider even a pa rti al
identification of classical properties of a macroscopic system with quantum
states of that system to be problematic.
For the question one cannot avoid is, are nontrivial superpositions of
these quantum states also admissible pure states of the measurement de-
vice? If so, then they are pure states in a wholly Pickwickian sense, since no
observable distinguishes them from the corresponding mixtures of indicator
states. If not, if the set of admissible pure states is restricted to the indicator
states, then the account runs into the difficulty described in Section 9.7: this
restriction on the set of admissible quantum states is incompatible with the
application to the system of Schrodinger's equation, and hence with treating
it as a quantum system. This is not to say that classical systems admit no
quantum-mechanical description, just that, to the extent that an indicator
state is a classical property, it is implausible to treat it as a quantum state.*
Von Neumann's consistency proof, in my view, has little to do with
measurement or with the question of the classical horizon. If the system
M + 5 evolves according to the Schrodingerequation, then the states uo, 14,
a nd u of M ca nnot be regarded as classical indicator states of M, and so M
a nn ot fun ti on as a measurement apparatus in the way the proof suggests .
• SeL' 'l lso I .L·g~l· tt ( 19H6). This p.l pl! r ca me to my a tte ntion too la te to be discussed here.
3 16 Tlt e 1IIII'rlm'IIIIioll of (JI/(/IIIIIIII 'l 'III'ory

What the proof shows is the pos ibilil y of a qua ntum amplifi a lion d 'v i e or
relay.
The mere rejection of a particular identification of classical states with
quantum states does nothing, however, to resolve the crucial and persistent
question we are left with: what is the conceptual relation between the
quantum world and the classical world? This is the touchstone, pyx, assay,
ordeal, the High Noon, the Big Enchilada for all interpretations of quantum
theory.
Let us approach the question from the classical side, and ask: are there in
the actual world systems which behave like classical systems? To reiterate a
point made earlier, I am not asking whether there are systems whose behav-
ior is governed by the laws of nineteenth-century physics. The question is:
are there systems to which we can consistently ascribe properties, the set of
which forms a Boolean algebra? If we permit ourselves the kind of idealiza-
tion appropriate to any physical theory, the answer is clearly yes. Call these
C-systems. It turns out that there are very small systems whose behavior
with respect to certain C-systems differs markedly from that of other C-sys-
terns. The most complete specification of the state of one of these small
systems that we can obtain assigns probabilities to events associated with
properties of the C-systems with which it interacts. Call these Q-systems.
Are Q-systems and C-systems different in kind? Our best theory tells us
that C-systems are made up of a great number of interacting Q-systems.
Further, our theory of Q-systems includes an account of what happens
when a number of Q-systems together form a larger system, and this ac-
count has received experimental confirmation. We are led to postulate six
theses.

(1) A C-system behaves like a large composite Q-system.


(2) Differences between Q-systems and C-systems are to be attributed to
the complexity of the latter.
(3) With an increase of complexity of a Q-system classical behavior
emerges.
(4) Some systems large enough to be regarded as C-systems behave as
follows: in an interaction between one of these and a Q-system S,
classical properties of the C-system may be associated with pure
states of S.
(5) . When this occurs, these properties of the C-system are realized prob-
abilistically by the interaction.
1111 /11/('/11/'1'//1/11111 II/ )/1111//11111 '1""'111.'1 .l/7

(6) Tog(·tlwr wi lh IIII' n',liiz.llion of a particular prop 'rly of the C-sys-


lem, therl' 'om '$ .1 It) ali za tion of the sta te of S. (Conditionalization
ou rs.)

With the exception of (6), I do not take these theses to be particularly


controversial. Nonetheless, they are sometimes challenged, as will appear,
The first thesis effectively restates the conventionality of the classical
horizon. The others suggest that the differences between quantum and
classical behavior which the complexity of a C-system brings about are of
two kinds. (a) A description of a C-system in terms of properties is available
which is not available in the case of a "pure" Q-system (thesis 3). (b) The
mode of interaction between a Q-system and a C-system differs from that
between two Q-systems, at least in the way it is described (theses 4, 5, 6).
From (a) it appears that, with an increase of complexity of a system,
classical properties appear as emergent properties (and this phrase should
perhaps be read as "emergent properties"), supervenient on the quantum
states. (To say that A-states are supervenient on B-states is just to say thal a
difference between A-states always involves a difference between B-sLales.)
However, these classical properties need not be associated with parti ulnr
quantum states; formally, the Boolean algebra of classical properties nl' 'u
not be embeddable in the Hilbert space which provides the quantum d ,-
scription of the system. The facts summarized in (b) are familiar to all
students of the measurement problem.
I have no explanation of the differences between C-systems and Q-sys-
terns. Theses (1)-(6) constitute a list of the problems such an account would
have to resolve; to use Kant's nice phrase, the account itself remains " set as a
task." And it's not clear what such an account would look like. Whereas one
might look to an analysis like the one provided by Daneri, Loinger, and
Prosperi (see Section 9.8) for an account of (3) consistent with (1) and (2), it is
hard to see how (4)-(6) could be dealt with. In particular, what are we to say
about (6), that is, about the way in which the emergence of properties in a
classical system can "force" probabilistic and discontinuous changes on a
Q-system coupled to it?
Nor is this problem confined to the quantum event interpretation. Al-
though these theses have been formulated against the background of this
interpretation, the problem they pose is the common sticking point for
nearly all interpretations of quantum mechanics. One way to evade it is to
reject (1) and (2) and to treat the quantumj classical distinction as both sharp
and self-evident. This is done, albeit in very different ways, by both Bohr
and Wigner. As we saw earlier, another alternative is to adopt a modal
18 'I'IIt' IIII1'YIIYI 'llIliollll[ )111111111111 '1'llI'lIIy

interpretation like van Fraassen's. Ill' evades (2) by making no d istinction


between C-systems and Q-systems, and he denies (3) and (6). Valu e altri
butions to Q-systems are permissible; properties are not regarded as super-
venient on quantum states. Rather they are underdetermined by them: a
given quantum state delimits what is possible, specification of properties
tells us what is actual. Theses (4) and (5) hold, in appropriately amended
versions. Yet another suggestion, recently made by Bub, is that to describe a
system as a C-system is to make the idealization that it consists of an infinite
number of Q-systems. In the case of an infinite system, he argues, superse-
lection rules come into play, giving rise to the classical behavior characteris-
tic of C-systems. (This brief summary does not do justice to Bub's argument;
see Bub, 1987.) Bub's account is compatible with the thesis of the conven-
tionality of the classical horizon; its location depends on the point at which
the idealization is made. Like van Fraassen's, however, this account fall s
foul of the difficulty raised in Section 9.7: if classical properties are asso-
ciated with superselection subspaces, then it is not clear that an evolution of
one into another can take place that is consistent with quantum theory.
The rejection of these alternatives does nothing to solve the problem.
However, not only do I have no explanatory account which encompasses
theses (1)-(6), I also think that there is something very odd about requiring
quantum theory to provide one. The oddity can be explained in this way.
According to (the quantum event interpretation of) quantum theory, a
"pure" Q-system is describable in terms of latencies with respect to a classi-
cal horizon. That is to say, it is a system which interacts with certain C-sys-
terns in specific, probabilistic ways. We have reason to think that an increase
in complexity of a system gives us an alternative way to describe it; we can
describe it as having classical properties, that is, as a C-system. We may also
hope to explain why these properties emerge. But note, to say that the
system behaves like a C-system is to say that it is the sort of system in which
probabilistic behavior may be induced by interactions with Q-systems. And
this is where we came in.
What else is there to say? I suggest that quantum theory can no more be
called on to answer the question of why such probabilistic behavior occurs
than the theory of geometrical optics can be required to explain why light
travels in straight lines. Indeed, it is not clear what sort of explanation is
being demanded; in each case the explanandum is the given from which th e
theory starts. As Chapters 3 and 4 showed, the Hilbert-space models which
quantum theory uses are ideal for representing the probabilistic behavior of
systems with respect to certain families of events. Given any speCific piece of
quantum behavior, quantum mechanics will happily provide us with a
model within which it can be fitted . Is it also required to justify the use of
/\"/1111'//111'111/11111"/ (lllllllllllli '1""'1111/ ,1/'/

Ih('s(' Ill\ldl'l ~ If " 10 Jllsllf y" IlI'r(' nll',IIlS 11101'(' th.1I1 " to s how th.lt they S dVl'
tIll' plwnOl1ll'll.I," .lnt! wh,ll is required is some O('('p('r analys is warranting
tlwir USl', then it 'annot do so.
This argument is not intended to provide "a tranquilizing philoso-
phy, . . . a gentle pillow for the tTUe believer from which he cannot easily
be aroused" (Einstein, letter to Schrodinger, May 1928, on the Copenhagen
interpretation; quoted in Bub, 1974, p . 46). It is an argument which claims
that the scope of quantum theory is limited by its own structure.
Landau and Lifschitz (1977, p. 3) write,
Thus quantum mechanics occupies a very unusual place among physica l theories: it
contains classical mechanics as a limiting case, yet at the same time it rcquirt'S this
limiting case for its own formulation.

I suggest that the explanation of how this can be so, how Wt' (\\11 W I\' IllI'
limiting cases of quantum theory in order to formulate tht·tlwory, ('.lllliot lit'
given within the theory itself. It will have to await thi.' ,\I 1 iv,II of .1 IH 'W
physical theory, a theory which is not formulated against a cld ss ir.II III tllIOIl
in the way that quantum mechanics is.
Can there be such a theory?
Probably.
APPENDIX A

Gleason's Theorem

Gleason's theorem is of fundamental importance, not onl y for Ilw Ih '\ 11 YII f
Hilbert spaces, but for the interpretation of qu antum ITl('ch .ln ir . TI IIlII )'. h
the original proof, published in 1957, was math mali .1 11 Vt'l y dtIlHIIII , "
1984 an "elementary proof" of the theorem was giVt'fl b ' 00 1..1', 1' 1111',
and Moran (whom I shall refer to collectively as" KM" ), Illd il ill 1'1' 1)1'0
duced below. For the amateur mathematician, ev ' n Ihis proof iH 111'1 111111 1/
enough. To ease the reader's task I have added a ommenl ary cO Il Hi III" /,,
partly of explanations of unfamiliar terms, but mostl y of answer:; 10 Ill\'
questions I asked myself as I worked through the proof. The e qu sl.ionll
were of two kinds: "Why is this move being made here?" and " How does
this follow?" I assume a familiarity with Section 5.6 of the text, and with the
vocabulary and notation of set theory (see, for example, Monk, 1969).CKM
also use one theorem from topology which I quote but do not explain. The
theorem guarantees the existence of the limit of certain sequences; the
reader will have to take it on trust, but in context the intuitive content of its
conclusion will be clear.
I have not altered the symbols used by CKM to make them conform to
those used in the text and in my commentary, but since the symbols in the
proof are all defined on first use, this should cause no problems; the reader
need only be aware that such differences exist.

An Elementary Proof of Gleason's Theorem


by Roger Cooke, Michael Keane, and William Moran
The following proof is reproduced from the Math ematical Proceedings of the Cambridge Philo-
sophical Society 98 (1 985), pp. 117 - 128. Copyright © 1985 Cambridge Philosophical Society.
Reprint ed with the permission of Ca mbridge University Press.
J 2 /1/'111'1111;,\ /I

Abstract
Gleason's theorem characterizes the totally additive measures on the losl.'d
subspaces of a separable real or complex Hilbert space of dimension grea ter
than two. This paper presents an elementary proof of Gleason's theorem
which is accessible to undergraduates having completed a first course in real
analysis.

Introduction
Let H be a separable Hilbert space over the real or complex field. A (normal-
ized) state on H is a function assigning to H the value 1, assigning to each
closed subspace of H a number in the unit interval, and satisfying th e
following additivity property: If any given subspace is written as an orthog-
onal sum of a finite or countable number of subspaces, then the value of the
state on the given subspace is equal to the sum of the values of the state on
the summands. States should be thought of as 'quantum mechanical proba-
bility measures'; they play an essential role in the quantum mechanical
formalism. For an exposition of these ideas we refer to Mackey (1963).
Examples of normalized states are obtained by considering positive self-
adjoint trace class operators with trace 1 on H. Such operators correspond to
preparation procedures in quantum mechanics. If A is such an operator,
then it is easy to see that we can define a state by associating to each one
dimensional subspace generated by a unit vector x E H the inner product
(Ax,x) and extending to subspaces of dimension greater than one by count-
able additivity. States of this type are called regular states.
In his course on the mathematical foundations of quantum mechanics
Mackey (1963) proposed the following problem: determine the set of states
on an arbitrary real or complex Hilbert space. This problem was solved by
Gleason (1957) and the principal result, known as Gleason's theorem, states
that every state on a real or complex Hilbert space of dimension greater than
two is regular. Gleason's proof uses the representation theory of 0(3), and
relies on an intricate continuity argument. Because of the role which Glea -
son's theorem plays in the foundations of quantum mechanics, there have
been several attempts to simplify its proof. Using elementary methods, Bell
(1966) proved a special case of the theorem, namely, that there exist no
states on the closed subspaces of a Hilbert space of dimension greater than
two taking only the values zero and one. Kochen and Specker (1967) proved
a similar result for states restricted to a finite number of closed subspaces.
Piron (1976) produced an elementary proof of Gleason's theorem for the
special case that the state is extreme (i.e. assigns the value 1 to some one
dimensional subspace).
In this article we give an elementary proof of Gleason's theorem in full
,It'tl tl// ', T llt 'tlll ' l/I ,1 ,1

gl' lwra lily. I\ llh o\l ); h 11.i:. p rlio f iH IOllger 111,111 (; II ' .I ~() II ' H p roof, WI' Lwlit'Vl'
th at it onlri bul l's to Ih l' inill itivl' 1I11tkrsla nd ing o f 11ll' un de rl ying reasons
for the va lid ity of the theorem. The slru lu re of lhe argument is as follows.
In § 1 we show tha t it is enough to handle the case H = II~P. This was part of
Gleason's original argument, and is well understood; the essential difficulty
of the proof is the treatment of the case H = II~P , For this purpose it is
convenient to study a certain class of real-valued functions on the unit
sphere of ~3, called frame functions. §§ 2 and 3 are devoted to an exposition
of the properties of frame functions and the statement of the theorem in the
case of ~3 in terms of frame functions. § 3 also contains two 'warm-up
theorems' whose contents were essentially known to 19th century mathe-
maticians. Coupled with a basic lemma in § 4 (essentially due to Gleason
and Piron), they yield a new proof for the extreme case, which is given in § 5.
In § 6 we show that a weak form of continuity in the general case follows
from the result of § 5, and in § 7 we treat the general case. The proofs in §§
2 - 7 are accessible to undergraduates who have completed a first course in
real analysis.

1. Reduction to H = 1R3
Let H be a real or complex separable Hilbert space, and let L be the set of
closed subspaces of H. If A E L, andB E L, then we write A 1- BifA andB are
orthogonal. For Ai E L, i E I, ViE/Ai denotes the smallest closed subspace
containing Ai for all i E I. If x is a vector in H, then denotes the one x
dimensional subspace generated by x.
Definition. A function p: L -+ [0,1] is called a state if for all sequences
{AJf-l' Ai E L, i = 1 . . . ; with Ai 1- Ai' for i j: *

Definition. p is called regular if there exists a self-adjoint trace class opera-


tor A on H such that for all unit vectors x E H

p(x) = (Ax,x).

LEMMA. The following statements are equivalent:


(i) p is regular.
(ii) There is a symmetric continuous bilinear form B on H such that

p(x ) = B(x,x).

Moreover, both A an d B are uniquely determined in this way by p.


324 A,',II'm/; A

Proof See Halmos (1 957, §§ 2 and 3). 1


LEMMA . If the restriction of p to every two-dim ensional subspace of I J is
regular, then p is regular <the restriction need not be normalized).
Proof. For each two-dimensional subspace E of H we can find a symmetric
continuous bilinear form BE such that BE(X,X) = p(x) (x E E, IIxll = 1). For
IIxll = lIyll = 1, choose a two-dimensional subspace E(x,y) containing x and y
and define

B(x,y) = BE(X,y).

It is straightforward to check that B can be uniquely extended to a symmetric


continuous bilinear form on H, and that p(x) = B(x,x) (lIxll = 1).1
We shall call a closed real-linear subspace of H completely real if the inner
product on this real linear subspace takes only real values.
LEMMA . If P is
a state on a two-dimensional complex Hilbert space H, and if pis
regular on every completely real subspace, then p is regular.
Proof. We first show that there is a one-dimensional subspace x such that
p(x) is maximal. Put

M= sup p(x).
xEH

Choose a sequence Xn E H such that limn--+oop(xn) = M. By passing to a sub-


sequence, assume limn--+ooxn = x. Clearly there exist (}n such that (e i6·xn,x) is
real and nonnegative, and passing again to a subsequence, we may assume
that limn--+oo(}n = (). By continuity of the scalar product, the limit (eillx,x) =
eill llxll2 is also real, and hence eill = 1. Thus limn--+ooeill·xn = x, and for each n
the vectors x and eill·xnare in the same completely real subspace. By uniform
equicontinuity of regular states it follows that p(x) = M.
Now for any y E H there exists () such that (x,e i6y) is real; hence
p(eilly) (=p(y» is equal to

M«X,ei lly»2 + (1 - M)(I - «x,eilly»2)


= MI(x,y)i2 + (1 - M)(I -1(x,y)i2),

and p is therefore regular. 1


THEOREM. If every state on 1R3 is regular, then every state on a real or complex
separable Hilbert space H of dimension greater than two is regular.
( ;/1 '11 10 1/ ' 'l '/I(' lIft' lII .12f>

Proof. Ev ory s l.\I (' O il II, wCl'ssari ly indu 's a con tinuous sy mmetric bilin-
ea r form on 'very omplclcly r a l three-dimensiona l subspace, and every
compl etely rea l two-dimensional subspace can be embedded in a com-
pletely real three-dimensional subspace. It follows that the restriction of a
state on H to any two-dimensional completely real subspace is regular, and
from the above lemmata it follows that every state on H is regular. I

2. Frame Functions
In this section, we define frame functions, collect some of their properties,
and give examples. Denote by 5 the unit sphere of a fixed three-dimensional
real Hilbert space. If sand s' are elements (i.e. vectors) of 5, the angle
between 5 and 5' is designated by O(s,s'). If O(s,s') = nl2, we write 5 1- 5' .
Definition. A frame is an ordered triple (p,q,r) of elements of 5 such that
p 1- q, P 1- rand q 1- r.
Given a frame (p,q,r), each point in 5 (and in the vector space) can be
uniquely expressed as xp + yq + zr, with x,y,z E IR. We call (x,y,z) the frame
coordinates of the point with respect to the given frame.
Definition. A frame function is a function f: 5 --+ IR such that the sum

f(p) + f(q) + f(r)


has the same value for each frame (p,q,r) . This value, called the weight of f,
will be denoted by w(f).
The following obvious properties of frame functions will be useful in the
sequel.
(PI) The set of frame functions is a vector space, and

w(af) = aw(f),
w(f + g) = w(f) + w(g) (a E IR, f, g frame functions).

(P 2 ) If f is a frame function, f(-s) = f(s) (s E 5).


(P 3 ) If f is a frame function, and if s,t,s',t' E 5 all lie on the same great circle
and 5 1- t, 5' 1- t', then

f(s) + f(t) = f(s') + f(t') .


To illustrate the use of P 3 we prove:
(P4) Let f be a frame function with sup f(s) = M < 00 and inf f(s) = m > - 00.
Let c; > 0 and let s E 5 with f(s) > M - c;. Then there is t E 5 with s 1- t and
f(t) < m + c;.
326 "JI!,I'lId;"

Proof Given s with f(s) > M ~, c ho()s ' () 0 such thaI [(s) , M <: I (),
and t' such thatf(t ' ) < m + J. Then sand /' determine a great ir Ie on 5, < nd
if t, S' are chosen on this great circle with s ..l t, S' ..l t', P 3 yields:

f(t) = f(S/) + f(t /) - f(s) <M + m + J - (M - ~ + J) = m + ~. I.


Next we give examples of frame functions. Obviously, constants arc
frame functions . If we fix a vector Po E 5, then for any frame (p,q,r) the
frame coordinates of Po with respect to (p,q,r) are given by:

and the sum of the squares of these three numbers is one since Po E 5. Hence

is a frame function, with w(f) = 1. Next, fix a frame (po,qo,ro) and a triple
(a,p, y) of real numbers. Let (xo, Yo ,zo) denote the frame coordina tes of a poin t
s E 5 with respect to (po,qo,ro). By the above and by PI'

f(s) = aX6 + PY6 + YZ 6


is a frame function, with w(f) = a + P+ y. Now recall that if Q is any
quadratic form on our Hilbert space, then there exists a frame (po,qo,ro) and
a triple (a,p,y) of real numbers such that the restriction of Q to 5 is given by
(*). Hence we have proved the following result:
PROPOSITION. Let A be a linear operator from the given three dimensional
Hilbert space to itself, and let

Q(s) = (s,As)

be the quadratic form associated with A. Then the restriction of Q to 5 is a frame


function whose weight is the trace of A.
Note that po,qo,ro are eigenvectors of t(A + AT) with respective eigen-
values a, p, y.
Our last example shows that frame functions can be wildly discontin -
uous. Let If/: ~ -+ ~ be any map such that If/(x + y) = If/(x) + If/(Y) for all
x,Y E R Then if fis a frame function, so is If/(f), and If/can be chosen to have
arbitrary values on a basis of ~ over Q. Of course Cauchy's classical theorem
( : /I ' II Nt Ill 'Il '/ '111 '111'1' 111 ,I) 7

on (unction.11 ('qll.1lions 1(' lI s us Ih.)l i( If! is bounJed on an interval, then


f{/(x) ex (or some cO llsI.lIll e. This exa mple shows that the restriction to
. bounded frame fun ti ons in th e following theorem is essential.

3. Statement of Gleason's Theorem


We now state the result to be proved.
GLEASON'S THEOREM. Let f be a bounded frame function. Define

M = sup f(s)
m = inf f(s)
a = w(f) - M - m.

Then there exists a frame (p,q,r) such that if the frame coordinates with respec// o
(p,q,r) of s E 5 are (x,y,z),

f(s) = Mx 2 + ay2 + mz 2

for all s E S.
In particular, the proposition of § 2 provides all bounded frame fu nctions.
We remark that the above representation implies that 112 :5 a :5 M; if m
a < M, then the frame (p,q,r) is unique up to change of sign; if m :5 a < M
then p is unique up to change of sign, and similarly for m < a :5 M.
In order to clarify the idea behind our proof of the above result, we now
state and prove a theorem which might be called an 'abelianized' version of
Gleason's theorem. Its content was essentially known to 19th century
inathematicians.
' WARM-UP' THEOREM I. Let f: [0,1] --+ IR be a bounded function such that for all
a,b,c E [0,1] with a + b + c = 1, f(a) + f(b) + f(c) has the same value ill =
w(f). Then f(a) = (ill - 3f(0»a + f(O) for all a E [0,1).
Proof. By subtracting a constant, we may assume f(O) = O. Now take c = 0,
b = 1 - a to obtain

f(a) = ill - f(l - a),

and then set c = 1 - (a + b) to obtain


f(a) + f(b) = ill - f(l - (a + b» = f(a + b)
328 Appelldix A

for all a,b,a + b E [0,1]. This impli ' immedia tely tha t

[(a) = wa

for all rational a, and for general a E [0,1] and n ~ 1 with na ::5 1 we have

[(na) = n[(a}.

Hence as a tends to 0, [(a) must tend to 0 because [is bounded, and thus

lim [(a + b) = [(b)


.-0
for all bE [0,1]. Thus [(a) = wa for all a E [0,1]. 1
The above formulation was chosen in order to make the analogy with
Gleason's theorem clear. Actually, we shall use the following modified
version in our proof.

'WARM-UP' THEOREM II. Let C be a finite or countable subset o[ (O,l). Let


[: [O,l]\C -IR be a [unction such that
(I) [(O) = O.
(2) I[ a, bE [0,1 ]\C and a < b, then [(a) ::5 [(b) .
(3) I[ a, b, c E [O,l]\C and a + b + c = 1, then [(a) + [(b) + [(c) = 1.
Then [(a) = a [or all a E [0,1 ]\C.
Proof. The set

C= {rc: c E C, r rational} U {r(l - c}: c E C, r rational}

is at most countable, so that there exists a point ao E (0,1) with ao ~ C. Now


if r is a rational number such that rao E [0,1], then neither rao nor 1 - rao
belong to C, since ao ~ C. As in the proof above, we conclude that

[(rao) + [(r' ao) = [«r + r')a o)

for rational r, r' with rao, r'ao, (r + r'}a o E [0,1], and hence

for rational r with rao E [0,1]. It now fo llows from (2) tha t [(a) = a for all
a E [0,1]\ c.1
(:/1'111 1111' 'I'lt I'/11'1'/11 .Ill)

4. Tltl' Hflsie 1,,' //////11


In this paragraph, we prov '< basi lemma to be used in the following two
sections. We fix a vector p S, to be thought of as the north pole, and use the
following notation .

N = (s E S: fJ(p,s) :5 7C/2) = 'northern hemisphere',


E = (s E S:s 1.. p) = 'equator'.

For each sEN, set

/(s) = cos 2 fJ(p,s) = 'latitude' of s,

and define for 0 :5 I :5 1:

Li = (s EN: I(s) = /} = 'Ith parallel'.

Thus Ll = (p) and Lo = E.


For s E N\{p}, there is a unique vector s.1. E N such that s 1.. s I and
I(s) + l(s.1.) = 1 (s.1.isthe 'coldest' vector orthogonal tos). Thegrcathalf ir Ie
Ds defined by

Ds ={tEN:t1..s.1.) (s E N\{p))

will be called the descent through s; it is the great circle through s which has s
as its northernmost point. (For e E E, D, = E). We can now state the basic
lemma:
. BASIC LEMMA. Let [ be a frame [unction such that
(1) [(p) = SUPSES[(S), and
(2) [(e) has the same value [or all e E E.
Then i[ 5 E N\{p} and if s' E Ds

[(5) ~ [(5').

Proof. Set [(p) = M. Property P 4 implies that

[(e) = m = inf [(s) (e E E).


sES

Let s E N\{p} and 5' E D •. Choose t,t' E Ds with s 1.. t, and s' 1.. i'. By prop
erty P3 ,

[(s) + [(i) = [(s') + [(t').


And using th e fa ct that I cE w • ob l,lin

[(s) - [(s') = [(t') - [(t) = [(t') - m ~ 0.1


La ter on we shall need an
ApPROXIMATE VERSION OF THE BASIC LEMMA . Let [be a frame [unction and ~ > 0
such that
(1) [(p) > SUPSES[(S) - ~, and
(2) [(e) has the same value [or all e E E.
Then i[ s E N\{p} and i[ s' E Ds ' [(s) > [(s') - C;.
Proof. As above, property P4 yields

[(e) <m+~ (e E E),

and with exactly the same choices of t and t' :

[(s) - [(s') = [(t') - [(t) > [(t') - m- c; ~ - c;.1

5. Simple Frame Functions


In this paragraph, we show that Gleason's theorem is true under two addi-
tional hypotheses on frame functions . We begin with a geometric lemma
due to Piron (1976).
GEOMETRIC LEMMA. Let s, t E N\{p} such that l(s) > l(t). Then there exist n ~ 1
and so, . . . ' Sn E N\{p} such that So = S, Sn = t, and [or each 1 :5 i:5 n:

Figlire Al . 1
( ;/"(/ !I()/I ':1 ,/,111 '111 ,' 11/ ,1.1 I

5f

5=5 0
Figure Al.2

Proof. To facilitate the calculations, we transfer this problem to the plane


tangent to 5 at p by projecting each point of N onto this plane from the o rigin
(center of the sphere 5). Points of the same latitude on 5 a re pro jected onto
circles centered at p, and the descent through 5 becomes the stra ig ht line
through 5 tangent to the latitude circle at 5 (see Figure A 1.1). In the simpll's t
case, 5 and t lie on a ray from the origin. In this case we may hoose 1/ 2
and pick 51 as in Figure A1.2. Now fix 50 = 5 = (x,O) (in 1R2 coordina tes) .l nd
n ~ 1. Choose 51 . . . 5 n successively such that 5j E DSI a nd su h tha t th •
I

angle between 5j and 5i+1 in the plane is njn (see Figure A 1.3). Then 5" has
°
coordinates (-y,O) and we wish to show that y - x - 4 as n - 4 00. Let dk be
the distance of Sk from the origin. Then do = x and dn = y. For each j we ha ve

di+dd; = 1jcos(njn),

and hence

d
n 1 1
l<yjx=djd
- n 0
=TI-'
d
;=1
= (cosnjn)n
j-
<-- --
- (1 - n 2j2n2)" ,
1

which approaches 1 as n tends to infinity. The lemma is proved. I


We now come to the main result of this section.
THEOREM . Let f be a frame function 5uch that for some point pES
(1) f(p) = M : = supsEsf(5),
(2) f( e) takes the con5tant value m for all e E E.
Th en f( s) = m + (M - m)cos20(s,p), for all s E s.
Proof By property P4, m = inf• .sf(5), so tha t if M = m the theorem is true.
If M f 1/1 , Ilw n we may assume that m = ° a nd M = 1 (replace f by
J 2 AIIIII'II IIi A

Sn=(-Y, 0) So=(x, O)
Figure Al.3

(lj(M - m»(f - m». Let s,t E N\{p} with I(s) > I(t). Then by the geometric
lemma and the basic lemma of the preceding section, we have

f(s) ~ f(t).

For each I E [0,1], define:

1(1) = sup{f(s): sEN, I(s) = I},

[(I) = inf{f(s): sEN, I(s) = I}.

Then 1(1) = [(1) = 1,1(0) = [(0) = 0, and if I, l' E [0,1] with I < I', it follows
from the above that

1(1) ~ [(I') .
Hence the set c: = {I: 1(1) > [(I)} is at most countable, as
L (1(1) - [(I» ~ 1.
lEe

For I E [0,1]\ C, define

f(I) = 1(1) = [(I) .

If 1,1',1" E [0,1] with 1+ l' + I" = I, then there exists a frame (q,q',q") with
I(q) = I, I(q') = I', I(q") = I". That is, the function f satisfies the hypotheses of
Warm-up Theorem II, and we conclude that

f(/) = I for I E [O,I]\C.


( ;1/'/1 / 111/ ' '1'''/ '01'/'/11 ,1.\,1

But this implk, 11'.11 C - 0, HO Ih,11 {or (',lCh Il ( N ,

[(s) = [(/(s)) /(s) "" cos 2 0(s,p).

The theorem now follows from property P2.1

6. Extremal Values
In this section we use the results of the preceding section to show that
bounded frame functions attain their extremal values. Let 1be a bounded
frame function,

M = sup I(s),
sES

and choose a sequence Pn E 5, n ~ I, such that lim,,_oo/(p,,) = M. Since 5 is


compact, we may assume by passing to a subsequence that p" converges,
and we set

p = lim Pn'

Assume also Pn E N for all n. Our goal is to show that I(p) = M.

STEP 1. Changing coordinates


For each n, we would like to look at Pn as the north pole, instead of p. We
do this as follows. Choose and fix a point eo E E and let Co denote the great
circle segment from p to eo. Let Pn : 5 - 5 be the rigid motion of 5 which takes
p to Pn and some point, say Cn, on Co to p. Obviously

lim Cn = p.

Now define the sequence gn of frame functions by setting

(s E 5).

We note the following properties:

(1) lim" oogn(P) = M.


(2) M = suP. sg,,(s) and m = infsEs/(s) = infsEsgn(s) for each n ~ 1.
(3) g,,(c,,) = [(p) for each n ~ 1.
334 Apllolili A

STEP 2. Symm etrization


Denote by p:S - 4 S the right-ha nd rota tio n by 90 ' o f S a ro und th e poll: II.
For each n ~ I, set

(s E S).

The sequence h n of frame functions (PI) has the following properties:


(1) supsEsh n :-;:; 2M for all n ~ 1.
(2) infsEshn ~ 2m for all n ~ 1.
(3) limn_",,hn(p) = 2M.
(4) Each hn is constant on E (by P3 ).
(5) hn(e n) :-;:; M + f(p) for all n ~ 1.
STEP3. Limit
We consider each h n as a point in the product space

Under the product topology, this space is compact, so that the sequence hI!
has an accumulation point, which we denote by h. Then:
(1) h(p) = 2M = supsEsh(s).
(2) h is constant on E.
(3) h is a frame function, since the frame functions form a closed subset of
[2m,2Mf
By the theorem of § 5, h is continuous (and has a special form, which does
not interest us here).
STEP 4. f(p) = M
Choose E > 0, and choose e E Co such that h(e) > 2M - E. Applying the
approximate version of the basic lemma to h" and noting that we can reach e
from en in two steps (easiest case of the geometric lemma) for sufficiently
large n, we obtain

with t>n> 2M - hn(p) -4 0 as n - 4 00. Now choose a subsequence nj - 4 00

such that

lim
.
hn(e)
I
> 2M - E.
1-""
(.fIo" II OII '1i '/ 'It, " II'I'11I .I.I!,

II 11i('1I follow HIholl (1l1\'P 1.'1)

M + f(p) lim inf II",(C",) lim (h",(c) - 26,,) > 2M - E,


j " 00 j 00

so that f(p) >M- E. Hence we have proved:


THEOREM. Bounded frame functions attain their extremal values.

7. The General Case


We now prove the theorem stated in § 3. Choose p E 5 such that f(p) = M,
and r E 5, r ~ p, such that f(r) = m. This is possible because of P4 and the
theorem of §6. Choose q orthogonal to p and r, and set [(q) = a. We may
assume that m < a < M since otherwise the result of §4 applies to f or - [
and the proof is finished. As in §6 we let p, q, f denote the 90° right-hand
rotations about p, q, and r.
We shall now use the theorem of § 5 to obtain information concerningf. II
is sufficientto know that fbelongs to the space of quadratic frame fun lionH.
Taking p as the north pole, the function

f(s) + [(ps)
takes the constant value m + a on the equator, and attains its supremum 2M
at p. Letting

g(s) = M cos 2 8(s,p) + m cos 2 8(s,r) + a cos 2 8(s,q),

we have from § 4

[(s) + [(ps) = 2M cos 2 8(s,p) + (m + a)(l - cos 2 8(s,p»


= g(s) + g(ps),
[(s) + [(fs) = g(s) + g(fs)

(the second equation follows by analogous reasoning, since -[is a frame


function taking its supremum - m at r).
Now let (x,y,z) denote the (p,q,r)-frame coordinates of s E 5.
Claim. (a) If either x = y, x = z, or y = z, then [(s) = g(s);
(b) If either x = - y, x = - Z, Y = - z, then [(s) = g(s).
336 Appcllrii A

Proof o[ claim. (a) Note that f (x,y,z ) = (- y,x,z ); P(x,y, z) (x, - z,y). Ap
plying these operations in succession, one verifies:

ppf(x,x,z) = (- x, - x, - z),
pff(x,z,z) = (- x, - z, - z),

rpppf(x,y,x) = (-x, -y, -x).

Suppose s = (x,x,z). Since [(s) = [(- s), g(s) = g(- s) (by property P2 ), we
conclude from (*):

+ [(fs) =
[(s) g(s) + g(fs),
[(fs) + [(pfs) = g(fs) + g(pfs),
[(pfs) + [(ppfs) = g(pfs) + g(ppfs);

subtracting the second equation from the sum of the first and third, we
conclude that [(s) = g(s). The other two cases under (a) are proved similarly.
(b) Suppose s = (x,- x,z); then f(x,- x,z) = (x,x,z), which lies on the great
circle x = y. From (a) we know that [(fs) = g(fs), and from (*) we conclude
that also [(s) = g(s). The other two cases in (b) are proved similarly, and the
claim is proved.
Now define h: = g - f h is clearly a frame function, and the claim implies
thath(p) = h(q) = h(r) = O,sothattheweightofhiszero. We also know that
h is zero on the six great circles x = ±y, x = ±z, Y = ±z. The proof is com-
pleted by showing that h is identically zero. Assume that h is not identically
zero; then by §5 we may put

M': = sup h = h(p'),


m': = inf h = h(r'),

ex': = h(q'); q' .1. r', q' .1. p'.

The argument is broken into four steps.


(i) M' = -m': Assume that m' > -M'. Then ex' < 0, and by P3 , ex' is the
maximal value of h on the great circle orthogonal to p' . However, the great
circle x = y must intersect the former great circle in at least two points, and at
these two points h must take the value zero. Considering - h, we derive a
contradiction from the assumption m' < -M' by the same argument.
( ; /I'II IlOI/'Il 'l'llI'orl'/II ,l.l1

(ii) n' 0: This follow s illlll)l'tii.ltl'ly from (i) a nd the fa ct that II has
w ' ig ht z 'ro.
o (iii) "(x',x',z' ) = M'(X'2 - Z' 2), where (x',y',z') denote the (p',q',r') -frame
coordinates. Using the previous two steps, this follows from the claim, upon
substituting h for f and M'(X'2 - Z'2) for g.
(iv) On the great circle x' = y', h takes the value zero at exactly the
following four points: (x',x',x'), (x',x',-x'), (-x',-x',x') and (-x',-x',-x').
The great circles x = y, x = z and y = z intersect in the two points: (x,x,x)
and (-x,-x,-x). As h is zero on these great circles, we see that the great
circle x' = y' must pass through the points (x,x,x) and (-x,-x,-x), since
otherwise there would be six points on x' = y' at which h takes the value
zero. The great circles x = - y and x = - z intersect at (x, - x, - x) and
(- x,x,x). x' = y' must also intersect these points, since otherwise it would
intersect x = -y and x = -z at four points, making six points at which h
would take the value zero on x' = y'. However, there is only one great circle,
passing through the four points (x,x,x), (- x, - x, - x), (x, - x, - x) and (- x,x,x),
namely y = Z.1t follows that y = z and x' = y' describe the sa me gr at cir It"
and therefore h must take the value zero at all points of x' = y' . This onlr.
dicts step (iv) and the theorem is proved. I

Commentary on the CKM Proof


Commentary on §1
The goal of this section is the theorem appearing at the end of it. The strategy
of the proof is this: Assume that every state on 1R3 is regular. Consider a
Hilbert space '}f on which a state Pis defined. Then the restriction of Pto any
completely real three-dimensional subspace L(3) of '}f is a state PL on L(3) . By
our assumption PL is regular. The restriction of PL to any two-dimensional
subspace L(2) of L(3) is therefore a regular state on L(2). L(2) is, of course, also
completely real.
Thus, under the assumption, the restriction of P to any completely real
two-dimensional subspace of '}f is regular. But we can also show:
(1) If Pc is a state on a two-dimensional complex space C2 and Pc is regular
on every completely real subspace of C2, then Pc is regular (lemma 2).
(2) If the restriction Pc of p to every two-dimensional (complex) subspace
of '}f is regular, then p is regular (lemma 3).
The theorem follows.
Further notes. A bilinear form on '}f is a function B mapping pairs (x,y) of
vectors into (complex) numbers, such that, for all x, y E '}f, c E C,
H(x,y I z) H(x,y) I II( ,70)
B(x + z,y) = B(x,y) I- B(z,y)
B(cx,y) = cB(x,y)
B(x,cy) = c*B(x,y)

B is symmetric if, for all x,y E 71,

B(x,y) = [B(y,x)]*

The restriction of B to pairs of vectors of the form (x,x) yields a quadratic form
Q (see §2), such that

Q(x) = B(x,x)

It turns out that a bilinear form B is completely determined by its quadratic


form Q (Halmos, 1957, p. 13). Note also that B is symmetric if and only if Q is
real.
The proof of the first lemma is given in Halmos (1957) §§22-24. The
move from regular states to bilinear forms achieves three things. First, we
need bilinear (rather than quadratic) forms in the proof of Lemma 2. Sec-
ondly, regular states are already seen to be continuous functions (Lemma 3).
Third, the constraint on states is put in terms of a function on the vectors
within the space, rather than an operator on that space. Thus when we claim
that the restriction of PL to L(2) inherits the regularity of PL on L(3) (see
above), there is no problem of the kind that might arise were the regularity
of the states just defined in terms of an operator on L(3).
A closed real-linear subspace of 71 forms a vector space over the field of
the reals (see Section 1.9).
Lemma 3 is the hardest of the three. It contains two arguments, the first of
which is a continuity argument to show that, under the assumptions of the
lemma, there is a vector x such that p(x) is the supremum of p. (For the
definition of supremum, see Section 7.3 .) The argument uses a topological
fact about compact sets (see Kelley, 1955, p. 135): any infinite sequence {xn}
of elements of a compact set X contains a converging subsequence {xn}I
which converges to a point x in X (Kelley, 1955, p. 136). This property of
compact sets is appealed to again (twice) in §6; I will refer to it as the
accumulation point property, since x is called an accumulation point in X.
CKM consi(kr I "'1IIl 'IlCl'l x,,) of nOfnhlli :l.('d VI'\'torM lI c h that I'(x,,) • M.
Si nce the unit , pili" '" .Ii oj '/1 is compact, ther' is an accumul ation point x in S
such that SOI1I1' lIl ,h:j( 'qll 'nce ( x,,) of (x,,) converges to x, and, of course,
p(X n j ) -+ M. (I ;rom now on it is this subsequence which is referred to as
{x n } .) A furthl'r move takes us to a sequence {eiO·x n } of vectors inS such that,
for ea h II, the vectors x and ei8·xn are in the same completely real subspace.
(This u es the fact that, for any complex number e, there exists an angle {}
such that ee iO = a, where a is real and a2 = lel 2; see Section 1.5.) The as-
sumption of the lemma can then come into play, together with the fact that
every regular state p is continuous (see Lemma 1). This continuity ensures
that p(i) = lim p(e i8·x n ). Since, for each n, xn = ei8·xn , we obtain

p(lim x n ) = p(i) = lim p(xn ) = M

In the second argument CKM show that, for an arbitrary vector y, p(y) is
given by an expression involving just y, x, and M. The vectors x and yare
assumed to be normalized, and, although the main result does not depend
on it, so is p: that is, CKM assume that p('li) = 1; hence, for any x.L orthogo-
nal to x, p(x.L) = 1 - M. For any angle {}, since eiO is a scalar, y contains eiOy .
We choose {} so that (xle i8y) is real; then within the two-dimensional com-
plex space 'li, there is a completely real two-dimensional space 'li R contain-
ing both x and ei(Jy. Let x.L be a normalized vector in 'li R orthogonal to x; then
there exist real numbers b1 and b2 such that bi + b~ = 1 and
eiOy = b1x + b2 x.L. Note that b1 = (xle i8y) and b2 = (x.Llei(Jy). The restriction
of p to 'li R is again a normalized state, and by the assumption of the lemma
there is a self-adjoint trace-class operator A on 'li R such that, for all normal-
ized v in 'li R, p(v) = (Aviv). Furthermore, we can show that, since A is a
trace-class operator on 'liR and (Aviv) is at a maximum when v = x, x is an
eigenvector of A. Since 'li R is two-dimensional and A is self-adjoint, x.L is
also an eigenvector of A. For any normalized eigenvector v of A with
corresponding eigenvalue a, we have (Aviv) = a; hence the eigenvalues of
A corresponding to x and x.L are, respectively, M and 1 - M.
As Cooke has pointed out to me (pers. com., May 1988), the neatest way
to obtain the result of the lemma is now to use the remark following equa-
tion (*) in §2. To see this, consult the commentary on §2 and consider the
frame {x,x.L} in the two-dimensional space 'li R' The coordinates of ei8y with
respect to this frame are (xleiOy) and (x.Lle i8y), respectively, and, as we have
noted, (xle i(Jy)2 + (x.Lle i8y)2 = 1. By plugging in these coordinates and the
eigenvalues of A into the (two-dimensional version of) equation (t) of the
commentary on §2, we get:
340 ""''''"111 A

p(y) = p( elOy) = (AelOyl ellJy )


= M( (xl elOy) 2) + (1 - M)( (x ' leIO y) 2)
= M(xle ilJy)2) + (1 - M)(l - (xl eiIJy )2)
= MI(xly)i2 + (1 - M)(l -1(xly)i2)

This expression defines a continuous real-valued quadratic form Q(y) on 7f,


and hence a symmetric continuous bilinear form B(y,z), and so the lemma is
proved.

Commentary on §2
The injunction after the equation marked (*) may well tax the resources of
memory. The result can be quickly shown for quadratic forms on 1R3, as
follows.
To each quadratic form Q on 1R3 there corresponds a unique symmetric
bilinear form B on 1R3, such that B(x,y) = B(y,x), and a symmetric operator A
on 1R3 such that, for all s E 1R3,

Q(s) = B(s,s) = (sIAs)

(Halmos, 1957, §24; the properties of symmetric operators on a real space


resemble those of Hermitian operators on a complex space: see Section 1.2).
Now let (p,q,r) be an arbitrary basis for 1R3. Then for any s E S there are x,y,z
such that s = xp + yq + zr . Hence

Q(s) = (sIAs) = (xp + yq + zrlA(xp + yq + zr»


= x 2(pIAp) + y2(qIAq) + z2(rIAr)
+ xy(pIAq) + (qIAp» + yz(qIAr) + (rIAq»
+ zx(rIAp) + (pIAr»
Now choose Po, qo, r o to be a set of orthonormal eigenvectors of A; we
know that such a basis exists because A is symmetric (see Sections 1.2 and
1.14). The cross terms in the expression above now vanish and we obtain:

Q(s) = (sIAs) = ax 2 + py 2 + ),Z2 (:j:)

where a, p, )' are the eigenvalues of A . Further,


W(Q) = a + p+ J' = Tr(A)
(;/1 '1/ 111111 '/ '1'111'01'1'/11 .141

In tlw pro positi o/l ,II Il w l' lI ti of the se ' tion, 'K M Irea l a more general
case, sin . A is nol l1l·n·ssari ly symmelri . To xtend the above proof to the
. g nera l as', w . n . ' d to onsider th e symmetric operator t(A + AT) (see
Fa no, 1971, p. 68).
Furth er notes. (1) Property P4 plays a large partin what follows; note that it
yields an inequality: [(t) < m + C;.
(2) Compare the frame function [(s) = cos 2 0(po,s) with equation (4.4).

Commentary on §3
To recognize the theorem given by CKM at the beginning of this section as
Gleason's theorem, note that (1) every state p on ~3 is a bounded frame
function; we must also show (2) that from the conclusion of the CKM
theorem it follows that there is a symmetric operator A on ~3 such that, for
every s E 5, p(s) = [(s) = (Asls). (Recall that s is the ray containing s.) (2) is
the converse of the proposition of §2. Proof by constru lion: IA'I
A = MPp + aPq + mP r, whereP p ' Pq , Prprojectonto p , q, i, re p ·Cli vl'l y. lf
the coordinates of s with respect to (p,q,r) are (x,y,z), then, sin ( 8 1P1'~)
IPpSl2 = X2 (and similarly for q and r), we obtain

(sIAs) = MX2 + ay2 + mz 2 = [(s)

as required.
The implicit quantifications in the proof of " warm-up" theorem I may
give trouble. Throughout this theorem we are considering a fixed (although
w
arbitrary) [with the property [(a) + [(b) + [(c) = for all triples (a,b,c) such
that a + b + c = 1. We take an arbitrary a andobtain[(a) = w-
f(1 - a), by
considering a as part of the triple (a,1 - a,O). This holds for all a E [0,1], and
is applied to (a + b) to obtain

(AU) [(a) + [(b) = [(a + b)


for all a, b, a + b E [0,1]. By extending (ALI) we obtain, for any integer n
1 1_
(na ~ 1), n[(a) = [(na). Whence, for a = - , [(a) = - w. Applying (ALI)
n n
again, for any in teger m (m < n), we obtain [ (--;;m) m_ in other words,
= --;; w;

[(a) = wa for all ra tional a.


We obtain the fina l conclusion of "warm-up" theorem II as follows . From
(I) and (2) f i bounded from below, and, for ao E [0,1] - C (in CKM nota-
342 "Pllt' l/ili A

tion, ao E [O,I]\C),

lim [(aD) = 0 = [(0)


00-+ 0

From (1) and (3) [is bounded from above, and, using (2), we obtain

(AI. 2) lim [(aD) = sup [(aD) = 1


0 0- 1

But now assume that, for some ao, [(aD) * ao. Then for a sequence ro, r1,
r2 • • • such that rjaO -+ I, we have

lim [(rja o) = lim r;/(a o)


i--+oo i--+oo
*1
This violates (A1.2) above; hence, for all a o E [0,1] - C, [(aD) = ao.

Commentary on §4
Figure Al.4 illustrates the two lemmata of this section.

Commentary on §5
The geometric lemma, together with the basic lemma of §4, shows that,
given premises (1) and (2) of the basic lemma,

if l(s) > l(s') then [(s) ~ [(s')

The geometric lemma itself shows that from any point s in N one can reach
another of lower latitude via a sequence of descents, starting with the

Figure Al.4
( ;11'11 fllI '/I '1'I11'O /'!'''' J4.J

Figure AI.S Projection of N onto the tangent plane P.

descent through 5. The method of proof employs projective geometry: the


strategy is to map each point 5 in N onto the tangent plane P to 5 at p, so that
the image of 5 [Im(5») is the point on the plane P where the extended radius
through 5 meets the plane (see Figure A1.S). The result is as though the
hemisphere N had been stretched out to an infinite plane. Although dis-
tances on N are not preserved under the transformation, various rela tio ns
carry over; thus, if 51 and 52 are equidistant from p, then their images arc
equidistant from Im(p), and, if 1(5 1 ) > 1(5 2 ), then Im(5 2) is further from Im(p)
than is Im(5 1). Also, if 51 and 52 are on the same line of longitude on N, then
Im(5 1), Im(5 2), and Im(p) are collinear.
Any great circle on 5 lies in a plane through the center of 5, and this plane
will cut P in a straight line. It follows that the descent Ds through 5 is mapped
onto a straight line Im(Ds) in P, and Im(Ds) is tangent to the image of the
circle of latitude through 5. The problem of finding a sequence of descents
on N becomes a problem of finding a sequence of straight lines on P.
From now on, like CKM, I will talk of points, such as p and 50' on P, rather
than of their images Im(p) and Im(50).
We take 5, t such that 1(5) > l(t). Stage (1) of the proof deals with the case
when 5 and t have the same longitude. For the general case CKM then
construct what I will call an n-polygon (which actually has n + 1 sides) to
show a sequence of descents starting from 50 and ending on 5 n , where 50 and
5n are 180 of longitude apart. Now, given arbitrary 5 and t [1(5) > l(t»), any
0

n-polygon starting from 5 and moving in the direction of t will intersect the
line pt: call the point of intersection t(n) ' The next step of the CKM proof
shows that, by making n big enough, we can obtain l(t(n» > l(t). The required
sequence of descents takes us round this n-polygon from 5 to t(n) and then,
using the maneuver of Stage (1), from t(n) to t.
Three points in the theorem deserve comment.
44· Appclllii A

(1) To sh ow tha t 1:cl[(/) - [ (/)J - 1, no t ' th at, fo r 1 .1', [ (I) [ (I ' ), so


that, as we move up to p from the equa to r, the djfferences [ (I) - [ (I) w ill add
up. But, by hypothesis, [(p) = 1, and so the sum of these differe nces is no
greater than 1.
(2) The existence of the frame (q,q',q") may be seen as follows. Let
(qo,q~,q~) be an arbitrary frame. For any (J, (J' , (J" such that cos 2 (J +
cos 2 (J' + cos 2 (J" = I, we can construct a normalized vector Po with coordi-
nates (cos(J, cos(J', cos(J") with respect to (qo, q~, q~). We now choose (J = I,
(J' = 1', (J" = I", construct Po , and rotate (qo, q~, q~) within S to make Po
coincide with p (the N-pole of S) . This rotation transforms (qo, q~ , q~) to the
required frame (q,q',q").
(3) To show that C = ¢, assume that I' E C; then, for I E [0,1' ) - c, we
have

lim [(I) = I'


l-l'

and, for I E (1',1] - C,

lim [(I) = I'


I_I'

Whence I' :5 [(I') :51(1') :5 I', and so 1(1') - [(1') = 0, contra hypothesis.

Commentary on §6
At this stage it is useful to compare the theorem of §5 with the statement of
Gleason's theorem in §3. The premises of the §5 theorem are stronger than
the premises of Gleason's theorem: they impose both (1) an extreme-value
requirement and (2) a symmetry condition. The extreme-value requirement
not only requires that [be bounded (sup [(s) = M, inf [(s) = m), but also that
there exist a point p on S such that [(p) = M and a (set of) points for which
[(s) = m. Symmetry requires that for all s 1- p (that is, for SEE), [(s) = m.
§6 shows that the extreme-value requirement holds for any bounded
frame function [; it also introduces a technique for symmetrization which is
used again in §7. Note, however, that in general [ does not satisfy the
symmetry condition.
The conclusion of the §5 theorem is a special case of the conclusion of the
§3 version of Gleason's theorem; if the function [is expressed in terms of
coordinates (x,y,z) with respect to the frame (p,q,r) (q, r E E), it appears as:
."'/It 11// '/1 T I'/ '/I//' III . 4.1

/(s) III I (M 1II)~'Oil) ()(S" ,)


M oS~ O(S" ,) I 11111 OS2()(S,p)J
= M x 2 + l1I(y 2 + Z2)

Note that this is symmetrical about p.


§6 has a preamble and four steps. To prove that a bounded frame function
f attains its ex·tremal values we use the same strategy as in lemma 3 of §2; we
consider a sequence of points Pn E S such that limn f(Pn) = M. Since S is
-+",

compact, the accumulation point property tells us that (if we pass to a


subsequence (Pn)) there is a point P such that P = lim n (Pn). These limits
-+",

must be shown to match, as it were, so that f(lim n Pn) = f(p) = - + ",

limn-+", f(Pn) . This is a continuity requirement. It would be violated if, for


example, f(p) dropped discontinuously to the value M - eat p.
The trick is to use f and the original sequence Pn of points to define a frame
function h that satisfies the premises of the §5 theorem (steps 1, 2, and 3);
any function which satisfies these premises is known to be continuous. At"
the function h has value 2M; we then show (step 4) thal, fo r any . 0,
M + f(p) > 2M - E. It follows that f(p) = M.
Step 1. This takes the original function f and, so to speak, d ra Wi il 11'1 )111111
on the surface of the sphere; we obtain a sequence g" of fun ('l ionll, I' ICIt jll I
like fbut dislocated over the sphere's surface. Each dislo olion is {'qlli v, lI t'lIl
to a rigid motion of the sphere that carries the fun lion / with iI, and Ih al
takes the N-pole p to the point Pn ; thus the sequ ence g" i in on ' to one
correspondence with the sequence Pn . A rigid motion of S is a rotation of S.
Obviously the angular separation of points of S is invariant under rigid
motions. Any rotation can be specified by its effect on two points in N - E.
We define Pn by selecting cn on Co such that O(p, cn) = O(P' Pn); Pn is then the
rotation that takes p to Pn and Cn to p. We define

Step 2. Symmetrization of the functions gn yields a sequence hn of func-


tions. This symmetrization (1) allows h n to satisfy the premises of the ap-
proximate version of the basic lemma (see step 4) and (2) allows h to satisfy
the premises of the §5 theorem (see step 3).
Step 3. The existence of the function h is now proved by the accumulation
point property. The space [2m,2M]S is the set of functions from S into the
closed subset [2m,2M] of the rea Is; this set is compact, and so, by passing to a
subsequence of hn' we obtain the limit h. CKM's emphasis here on the
346 AplJl! lIti;x A

~-/---t--""r

Figure Al.6 Great circles on S.

product topology is important, since there are other topologies under which
the space [2m,2M]S is not compact (see Kelley, 1955, pp. 217-218).
Step 4. The existence of this function h allows the final move to be made in
the chain of inequalities that yields the desired result.

Commentary on §7
We now know that any bounded frame function [satisfies the extreme-
value requirement, and we have a technique for using [ to define a function
[sym which fulfills the symmetry condition of the §5 theorem: we write
[sym(5) = [(5) + [(p5) (§6, step 2). By the §5 theorem we also know the form of
[sym [§7, Equation (*»).
In §7 [is compared with a quadratic frame function g which has the same
extreme values, M and m, and the same weight, M + m + a, as does f. We
see first that gsym = [sym, and then that g(5) = [(5) for points on selected great
circles [claims (a) and (b): see Figure A1.6). Lastly, the function h = g - [is
shown to be zero, not merely for points on these great circles, but over all S.
Hence any bounded frame function is a quadratic frame function, and
Gleason's theorem is proved.
Further notes. Step (iii) in showing that g - [ = 0 is an elegant move
whereby claim (a) is made on behalf of h and the quadratic frame function
M'(x t2 - Z'2); this quadratic frame function is constructed from the extreme
values of h as was g from the extreme values of f.
APPENDIX B

The Luders Rule

In Section 8.2 it was shown that, if subspaces P and Q are comp.ltil lh Ih."
the Liiders rule yields classical conditionalization; in other w(lill .
conditionalize according to the rule, then

IP(PIQ) = p(P n Q)
p(Q)

for any generalized probability function (GPF) p on 5(1/). It follow r tl Hlt


when P ~ Q (and hence P n Q = P), according to the Liiders rule:

IP(PIQ) = p(P) = q(P)


p(Q)

The Liiders rule thus renormalizes the probabilities assigned to all P ~ Q, III
that q(Q) = 1. We now show that the rule specifies the only GPF on 5('/1)
which does this.
In this proof, Q denotes both a subspace and the projector onto it, and Q I
denotes both the orthocomplement of Q and the corresponding projector.
v
We write for the ray containing a normalized vector v.
Note first that, for any density operator D, the operator QDQ is a trace-
class operator [see (5 .6)], and hence the operator

QDQ
Tr(QDQ)

appearing in the Liiders rule is a density operator [see (5.7)].


Now let q be any GPF on 5(71) which yields the renormalization described
above. W e know that (1) sin e any Cpr- is additive over orthogonal sub-
348 Appendix B

spaces, q is completely defin ed by the probabilitie it assigns to the ray of


7/; (2) by Gleason's theorem, q can be represented by a density o pera tor Dq
v,
on 7/, so that, for any ray q(V) = (vIDqv); (3) if, for some GPF q, repre-
sented byD q, q(Q) = 1, then, for all vEQ-L , q(v) =0 = (vIDqv); since Dqis a
(weighted) sum of projection operators, it follows that Dqv = 0 for all v E
Q -L .
v
Now let be an arbitrary ray in 7/. Then

q(v) = (vIDqv)
= (Qv + Q-LvIDq(Qv + Q-Lv»
= (Qv + Q-LvIDqQv) [(3), above]
= (Dq(Qv + Q-Lv)IQv) [Hermiticity]
= (DqQvIQv) [(3), above]
= (QvIDqQv) [Hermiticity]

Now Qv lies within Q, and so Qv = ex for some scalar e and normalized


vector x in Q . Thus we obtain

q(v) = leI 2 (xIDqx) = leI 2 q(x)

We see that any GPF q on 5(7/) such that q(Q) = 1 is completely specified by
the values which it assigns to the rays within Q . Hence, given any GPF p,
there is a unique GPF q on 5(7/) such that, for all P k Q,

P - p(P)
q( ) - p(Q)

In turn, q is uniquely represented by the density operator D q •


But the operator

QDQ
Tr(QDQ)

appearing in the Liiders rule is a density operator. It follows that

QDQ
Dq = Tr(QDQ)

Thus the Liiders rule gives the unique GPF q with the property that, for all
PkQ,

p(P)
q(P) = p(Q)
APPENDIX C

Coupled Systems and


Conditionalization

At the end of Section 8.8, it was shown that, when an electron-positron pair
is prepared in the singlet spin state, Liiders-rule conditionalization on the
event (5g" +), associated with the positron, projects the state of the electron
into the pure state p~_; further, that this state indeed yields the quantum
theoretic probabilities for measurements of spin on the ele tron, gi v n Ih al fl
measurement of Sa on the positron has yielded the re ult +.
Here I generalize this result, by taking a coupled syst m in an a rbitra ry
initial state D and looking at the effect of conditionalizing on an 'v 'nl
associated with one of its components. I use the notation o f the last part of
Section 8.2, and the proof is an extension of the one which appea rs there.
Consider a coupled system with components a and b, whose states are
representableina Hilbert space 7i" ® 7i b. Assume that a measurement of Ab
is conducted on system b, and letA" be an observable associated with system
a. We can then form a classical probability space partitioned by the conjunc-
tions (P"· Pb) of A"-events and Ab-events. Since this space is classical, condi-
tionalization on the event pb (the result of the measurement of Ab) will yield
conditional probabilities for the N -events given by the classical rule:

But A" was an arbitrarily chosen a-observable, and so this rule holds for all
a-events P".
Note tha t the probabilities appearing in the expression on the right of this
equation are given by the statistical algorithm of quantum mechanics, if we
know the initial state of the composite system. For if this system has been
prepared in a quan tu m state n., which reduces to states Di and Dt of the
omponcn t , w th n have
p(pa. Pb) = Tr[O\(pa ® Pb)1
p(Pb) = Tr[O\(la ® Ph)] = Tr(O'(pb) [(5.27)]

According to the Liiders rule, conditionalization on the event pb yields the


state O 2 of the composite system, where

(Ia ® Pb)D\(la ® Pb)


D2 = Tr[D\W ® Pb)]

We write D~ and D~ for the reduced states of the components.


We now show that

for an arbitrary a-event pa; in other words, that the conditional probabilities
for all a-events are as though pb projects the state of system a to D~ .

Tr(D~pa) = Tr[D2(pa ® I b)] [(5.27)]


_ Tr[(la ® Pb)D\W ® pb)(pa ® Ib)]
- Tr[D\W ® Pb)]
_ Tr[D\(pa ® Pb)] [by (5.6), idempo-
- Tr[D\(la ® Pb)] tence, and operator
multiplication on
7fa ® 7fb]
References

Accardi, L., and A. Fedullo, 1982. "On the Statistical Meaning of Complex Numbers
in Quantum Mechanics." Lettere al Nuovo Cimento 34:161-172 .
Aristotle. 1984. The Complete Works of Aristotle, 2 vols. Ed. J. Barnes. Princeton, N.}.:
Princeton University Press.
Aspect, A. 1976. "Proposed Experiment to Test the Non -Separability of Quantum
Mechanics." Physical Review D 14:1944-1951. Reprinted in Whcl'lcr Ind
Zurek (1983), pp. 435-442.
Asquith, P . D., and R. N. Giere, eds. 1980. PSA 1980, vol. 1. East Lansing, Mich .:
Philosophy of Science Association.
- - - 1981. PSA 1980, vol. 2. East Lansing, Mich.: Philosophy of Science As 0 ia -
tion.
Asquith, P. D., and T. Nickles, eds. 1982. PSA 1982, vol. 1. East Lansing, Mich.:
Philosophy of Science Association.
Ballentine, L. E. 1970. "The Statistical Interpretation of Quantum Mechanics." Re-
views of Modern Physics 42:358-381.
- - - 1972. " Einstein's Interpretation of Quantum Mechanics." American Journal
of Physics 40:1763-1771.
Belinfante, F. J. 1973. A Survey of Hidden Variable Theories. Oxford: Pergamon Press.
Bell, J. L., and A. D. Slomson. 1969. Models and Ultraproducts: An Introduction.
Amsterdam: North Holland.
Bell, J. S. 1964. "On the Einstein-Podolsky-Rosen Paradox." Physics 1:195-200.
Reprinted in Wheeler and Zurek (1983), pp. 403-408.
- -- 1966. "On the Problem of Hidden Variables in Quantum Mechanics."
Review of Modern Physics 38:447-452.
Beltrametti, E. G ., and G . Cassinelli. 1981. The Logic of Quantum Mechanics. Reading,
Mass: Addison Wesley.
Beltrametti, E.G., and B. C. van Fraassen, eds. 1981. Current Issues in Quantum Logic.
New York: Plenum Press.
Bigelow,} . C. 1976. "Possible Worlds Foundations for Probability." Journal of Philo-
so"ilical I.o!{ic 5:299 - 320.
352 RCfl'l'I'lI(,c s

Birkhoff, G., and J. von Neumann. ] 936. " The Logic of Quantum M e hani s."
Annals of Mathematics 37:823 - 843. Reprinted in Hooker (1975), pp. "1 - 26.
Blanche, R. 1962. Axiomatics. Trans. G. B. Keene. London: Routledge and Kegan
Paul.
Bohm, D. 1951. Quantum Theory. Englewood Cliffs, N.J.: Prentice Hall.
- - - 1957. Causality and Chance in Modem Physics. London: Routledge and Kegan
Paul.
Bohr, N. 1934. Atomic Theory and the Description of Nature. Cambridge: Cambridge
University Press.
- - - 1935a. "Can Quantum-Mechanical Description of Reality Be Considered
Complete?" Physical Review 48:696 -702. Reprinted in Wheeler and Zurek
(1983), pp. 145-151.
- - - 1935b. "Quantum Mechanics and Physical Reality." Nature 12:65. Re-
printed in Wheeler and Zurek (1983), p. 144.
- - - 1949. "Discussion with Einstein on Epistemological Problems in Atomk
Physics." In Schilpp (1949), pp. 200-241. Reprinted in Wheeler and Zurek
(1983), pp. 9-49.
Bohr, N., H. A. Kramers, and J. C. Slater. 1924. "Uber die Quantentheorie der
Strahlung." Zeitschrift fur Physik 24:69-87.
Born, M. 1926a. "Zur Quantenmechanik der Stossvergange." Zeitschrift fur Physik
37:863-867. Trans. in Wheeler and Zurek (1983), pp. 52-55.
- - - 1926b. "Quantenmechanik der Stossvergange." Zeitschrift fur Physik
38:803-827.
Bub, J. 1968. "The Daneri-Loinger-Prosperi Quantum Theory of Measurement." Ii
Nuovo Cimento 57B:503-520.
- - - 1974. The Interpretation of Quantum Mechanics. Dordrecht, Holland: Reidel.
- - - 1975. "Popper's Propensity Interpretation of Probability and Quantum
Mechanics." In Maxwell and Anderson (1975), pp. 416-429.
- - - 1977. "Von Neumann's Projection Postulate as a Possibility Conditionaliza-
tion Rule in Quantum Mechanics." Journal of Philosophical Logic 6:381-390.
- - - 1979. "The Measurement Problem of Quantum Mechanics." Problems in the
Philosophy of Physics (72d Corso). Bologna: Societa Italiana di Fisica.
- - - 1987. "How to Solve the Measurement Problem of Quantum Mechanics."
Paper delivered at the VIIIth International Congress of Logic, Methodology and
Philosophy of Science, in Moscow, 1987. College Park, Md.: University of
Maryland, mimeo.
Busch, P., and P. Lahti. 1984. "On Various Joint Measurements of Position and
Momentum Observables." Physical Review D 29:1634-1646.
- - - 1985. "A Note on Quantum Theory, Complementarity and Uncertainty."
Philosophy of Science 52:64-77.
Carnap, R. 1974. An Introduction to the Philosophy of Science. Ed. M. Gardner. New
York: Basic Books.
Cartwright, N. 1974. "Van Fraassen's Modal Model of Quantum Mechanics." Phi-
losophy of Science 41:199-202.
l<t'lt' rt' lIt't'lI •• 1.1

I!JH , I/ ow lhl' L(/ 1U.~ 11/ J )" y ~ ;('s !.ie, Oxford : 'Iarcndon Press, 1983.
lauser, J. F., M . A . Il orne, A. Shimony, and R. A . 11olt. 1969. " Proposed Experi-
ment to Test Ilidd 'n Varia ble Theories." Physical Review Letters 23:880-883 .
Clauser, J. F., and A. Shimony. 1978. " Bell's Theorem: Experimental Tests and
Implica tions." Reports on Progress in Physics 41:1881-1927.
Cohen, R 5 ., C. A. Hooker, A. C. Michalos, and J. W. van Ezra, eds. 1976. PSA 1974.
Boston Studies in the Philosophy of Science, vol. 32. Dordrecht, Holland: Reidel.
Cohen, R 5., and J. J. Stachel, eds. 1979. Selected Papers of Leon Rosenfeld. Dordrecht,
Holland: Reidel.
Cohen, R S., and M. W. Wartofsky, eds. 1969. Boston Studies in the Philosophy of
Science, vol. 5. Dordrecht, Holland: Reidel.
- - - 1974. Logical and Epistemological Studies in Contemporary Physics. Boston
Studies in the Philosophy of Science, vol. 13. Dordrecht, Holland: Reidel.
Colodny, R A., ed. 1965 . Beyond the Edge of Certainty. Englewood Cliffs, N .J.:
Prentice Hall.
- - - 1972. Paradigms and Paradoxes: The Philosophical Challenge of the Quantum
Domain, Pittsburgh: University of Pittsburgh Press, 1972.
Cooke, R M., and J. Hilgevoord. 1981. "A New Approach to Equivalence in Quan-
tum Logic." In Beltrametti and van Fraassen (1981), pp. 101 - 113.
Cooke, R, M. Keane, and W. Moran. 1985. " An Elementary Proof of G leason's
Theorem." Mathematical Proceedings of the Cambridge Philosophical Sociely
98:117-128.
Cushing, J. T., C. F. Delaney, and G. Gutting, eds. 1984. Science and Realily: Recelll
Work in the Philosophy of Science. Notre Dame, Ind .: University of Notre Da me
Press.
Cushing, J. T., and E. McMullin, eds. 1989. Philosophical Consequences of Quantum
Theory. Notre Dame, Ind: University of Notre Dame Press.
Dalla Chiara, M. L. 1977. "Quantum Logic and Physical Modalities." Journal of
Philosophical Logic 6:391-404.
- - 1986. "Quantum Logic." In Gabbay and Guenther (1986), vol. 3, pp. 427-
469 .
Daneri, A., A. Loinger, and G. M. Prosperi. 1962. "Quantum Theory of Measure-
ment and Ergodicity Conditions." Nuclear Physics 33:297 -319. Reprinted in
Wheeler and Zurek (1983), pp. 657-679.
Davies, E. B. 1976. Quantum Theory and Open Systems. London: Academic Press.
Davies, P. C. W. 1984. Quantum Mechanics. London: Routledge and Kegan Paul.
de Boer, J., E. Dal, and O . Ulfbeck, eds. 1986. The Lesson of Quantum Theory: Niels
Bohr Centennial Symposium, 1985. Amsterdam: North Holland.
Demopoulos, W. 1976. " What Is the Logical Interpretation of Quantum Me-
chanics?" In Cohen et al. (1976), pp. 721-728.
D'Espagnat, B. 1979. " The Quantum Theory and Reality." Scientific American
241 :158 - 180.
de Witt, B. S. 1970. "Quantum Mechanics and Reality." Physics Today 23:30-35 .
Reprin ted in de Witt and G raham (1 973), pp . 155 - 165 .
354 I<I'!/'/'I' I/ /'/'s

de Wi tt, B. S., and N . Gra ham, cds. 1973 . '/'/11' M OllY Worlds /lll erp r/' ,o l ;IJ /l llfQ II I/ IlIIlI1/
Mechanics. Princeton, N .).: Princeton Uni versity Press.
Dirac, P. A M. (1930]1967. Th e Principles of Quallium Mec/II/Ilics, 4th cd., rev.
Oxford: Clarendon Press.
Duhem, P. [1906]1962. The Aim and Stru cture of Physical Theory. Trans. P. P. Wiener.
New York: Athaneum.
Earman, J. 1986. A Primer on Determinism. Dordrecht, Holland: Reidel.
Eberhard, P. H . 1977. " Bell's Theorem without Hidden Variables." II Nu ovo Cimento
388 (1):75 - 79 .
Eco, U. 1979. The Role of the Reader, Bloomington, Ind. : Indiana University Press.
Eddington, A S. 1935a. " The Theory of Groups." In Eddington (1935b). Reprinted
in Newman (1956), vol. 3, pp. 1558-1573.
---1935b. New Pathways in Science. Cambridge: Cambridge University Press.
Edwards, P., ed. 1967. The Encyclopedia of Philosophy, 8 vols. New York: Macmillan.
Ehrenfest, P. 1959. Collected Scientific Papers. Ed. M. Klein. Amsterdam: North
Holland.
Einstein, A 1948. "Quantenmechanik und Wirklichkiet. " Dialectica 2:320-324.
Einstein, A, and P. Ehrenfest. 1922. "Quantentheoretische Bemerkungen zum Ex-
periment von Stern und Gerlach." Zeitschrift fiir Physik 11 :31-34. Reprinted in
Ehrenfest (1959), pp. 452-455.
Einstein, A , B. Podolsky, and N. Rosen. 1935. "Can Quantum Mechanical Descrip-
tion of Physical Reality Be Considered Complete? " Physical Review 47:777-
780. Reprinted in Wheeler and Zurek (1983), pp. 138 - 141.
Everett, H ., III. 1957. " 'Relative State' Formulation of Quantum Mechanics ." Re-
views of Modem Physics 29:454-462. Reprinted in de Witt and Graham (1973),
pp. 141-149, and in Wheeler and Zurek (1983), pp. 315-323.
- - - 1973. " The Theory of the Universal Wave Function." In de Witt and Gra-
ham (1973), pp. 3-140.
Fano, G. 1971. Mathematical Methods of Quantum Mechanics. New York: McGraw
Hill.
Fano, U. 1957. " Description of States in Quantum Mechanics by Density Matrix and
Operator Techniques." Reviews of Modem Physics 29 :74-93 .
Feyerabend, P. K. (1962), "On the Quantum Theory of Measurement." In Korner
(1962), pp. 121-130.
- - - 1975. Against Method. London: New Left Books.
Feynman, R. P. 1965 . The Character of Physical Law. Cambridge, Mass.: M.I.T.
Press.
Feynman, R. P ., R. B. Leighton, and M. Sands. 1965. The Feynman Lectures on
Physics, 3 vols. Reading, Mass.: Addison Wesley.
Finch, P . D. 1969. " On the Structure of Quantum Logic." Journal of Symbolic Logic
34:275-282. Reprinted in Hooker (1975), pp. 415 - 425.
Fine, A 1970. "Insolubility of the Quantum Measurement Problem,'; Physical Re-
view 2D:2783-2787.
- - - 1972. " Some Conceptual Problems of Quantum Theory." In Colodny
(1972), pp. 3 - 31.
11.)79 . " Il ow 1\1 <. 'olllli FI'I' lJlIl'l1 'iet>, Ll I rimer for Quan tum Realists." Syllthese
42: 145 154.
- - - 1984. "Einstein's Rea lism. " In Cushing, Delaney, and Guttig (1984), pp.
106 - 133 .
Finkelstein, D. 1969 . " Matter, Space and Logic." In Cohen and Wartofsky (1969),
pp.199 - 215.
French, A. P., ed. 1979. Einstein, A Centenary Volume. Cambridge, Mass.: Harvard
University Press.
Friedman, M., and C. Glymour. 1972. " If Quanta Had Logic." Journal of Philosophi-
cal Logic 1:16-28.
Friedman, M., and H . Putnam. 1978. "Quantum Logic, Conditional Probability and
Interference." Dialectica 32:305-315.
Gabbay, D., and F. Guenther. 1986. Handbook of Philosophical Logic, 4 vols. Dor-
drecht, Holland: Reidel.
Geroch, R. 1984. "The Everett Interpretation." Nous 18:617-633.
Gibbins, P. 1981a. " A Note on Quantum Logic and the Uncertainty Principle."
Philosophy of Science 48:122-126.
- - - 1981b. "Putnam on the Two-Slit Experiment." Erkenntnis 16:235 - 241.
- - - 1987. Particles and Paradoxes. Cambridge: Cambridge University Press.
Giere, R. N. 1973. "Objective Single-Case Probabilities and the Foundations of
Statistics." In Suppes et al. (1973), pp. 467 - 483.
- - - 1976. " A Laplacean Formal Semantics for Single-Case Propensities." Jou r-
nal of Philosophical Logic 5:321-353 .
- - - 1979. Understanding Scientific Reasoning. New York: Holt, Rinehart, Win-
ston.
Gillespie, D. T. 1970. A Quantum Mechanics Primer. Leighton Buzzard, Beds.: Inter-
national Textbook Company.
Gleason, A. M. 1957. " Measures on the Closed Subspaces of a Hilbert Space."
Journal of Mathematics and Mechanics 6:885-893.
Godel, K. 1933. " An Interpretation of the Intuitionistic Sentential Logic." Trans. J.
Hintikka and L. Rossi. In Hintikka (1969), pp. 128-129.
Goldstein, H. 1950. Classical Mechanics. Reading, Mass.: Addison Wesley.
Good, 1. J., ed. 1961 . The Scientist Speculates. London: Heinemann.
Gudder, S. P. 1970. "On Hidden Variable Theories." Journal of Mathematical Physics
11:431-436.
- - - 1972. "Partial Algebraic Structures Associated with Orthomodular Posets. "
Pacific Journal of Mathematics 41:712-730.
- - - 1973. "Quantum Logics, Physical Space, Position Observables and Sym-
metry." Reports on Mathematical Physics 4:193-202.
- - - 1976. "A Generalised Measure and Probability Theory for the Physical
Sciences." In Harper and Hooker (1976), pp. 121-141.
Haag, R. 1973. Boulder Lectures in Theoretical Physics, vol. 14B. Ed. W. E. Britten.
New York: Gordon and Breach.
Hacking, 1. 1983. Representing and Intervening: Introductory Topics in the Philosophy
of Science. Cambridge: Cambridge University Press.
356 Referel/ces

Halmos, P. R. 1957. Introduction to IIi/bert Splice alld the Theory of Spectral MlIltiplic-
ity, 2d ed. New York: Chelsea.
Hanson, N. R. 1967. "Quantum Mechanics, Philosophical Implications of." In Ed-
wards (1967), vol. 7, pp. 41-49.
Hardegree, G. M. 1980. "Micro-States in the Interpretation of Quantum Theory." In
Asquith and Giere (1980), pp. 43-54.
Hardegree, G. M., and P. J. Frazer. 1981. "Charting the Labyrinth of Quantum
Logics: A Progress Report." In Beltrametti and van Fraassen (1981), pp. 53-76.
Harper, W. L., and C. A Hooker, eds. 1976. Foundations and Philosophy of Statistical
Theories in the Physical Sciences. Dordrecht, Holland: Reidel.
Harrison, J. 1983. "Against Quantum Logic." Analysis 43:82-85.
Healey, R. 1979. "Quantum Realism: NaIvete Is No Excuse." Synthese 42:121-144.
- - 1984. "How Many Worlds?" Nous 18:591-616.
Heisenberg, W. 1927. "Uber den anschaulichen Inhalt den quantentheoretischen
Kinematik and Mechanik." Zeitschrift far Physik 43:172-198. Trans. as "The
Physical Content of Quantum Kinematics and Mechanics," in Wheeler and
Zurek (1983), pp. 62-84.
- - - 1958. Physics and Philosophy: The Revolution in Modern Science. New York:
Harper and Row.
Heitler, W. 1949. "The Departure from Classical Thought in Modem Physics." In
Schilpp (1949), pp. 181-198.
Hellman, G. 1982a. "Einstein and Bell: Tightening the Case for Microphysical Ran-
domness." Synthese 53:445-460.
- - - 1982b. "Stochastic Einstein Locality and the Bell Theorems." Synthese
53:461-504.
- - 1984. "Introduction." Nous 18:557 -567.
Hempel, C. G. 1954. "A Logical Appraisal of Operationism." Scientific Monthly
79:215-220. Reprinted in Hempel (1965), pp. 123-133.
- - 1965. Aspects of Scientific Explanation and Other Essays in the Philosophy of
Science. New York: Free Press.
Hintikka, J., ed. 1969. The Philosophy of Mathematics. Oxford: Oxford University
Press.
Hirst, R. J. 1967. "Phenomenalism." In Edwards (1967), vol. 6, pp. 130-135.
Holdsworth, D. G., and C. A Hooker. 1983. "A Critical Survey of Quantum Logic."
In Logic in the 20th Century. Scientia 1983:127-246.
Holland S. S., Jr. 1970. "The Current Interst in Orthomodular Lattices." In Trends in
Lattice Theory. New York: van Nostrand. Reprinted in Hooker (1975), pp. 437 -
496.
Hooker, C. A 1972. "The Nature of Quantum Mechanical Reality: Einstein versus
Bohr." In Colodny (1972), pp. 67-302.
Hooker, C. A, ed. 1973. Contemporary Research in the Foundations and Philosophy of
Quantum Theory. Dordrecht, Holland: Reidel.
- - - 1975. The Logico-Algebraic Approach to Quantum Mechanics, vol. 1: Historical
Evolution. Dordrecht, Holland: Reidel.
Ilugh 's, . E., ,lnu M. J. Cn·llllwdl . (<J6H . 1111 IlIlmt/ll r liolllo Mot/Ill toxic. LlInuon :
Methuen .
Hughes, R. I. C . 1979. Syslems of Quanlum Logic.Ph.D. diss. Vancouver: University
of British Columbia.
- - - 1981. " Quantum Logic." Scientific American 243:202-213.
- - - 1982. "The Logic of Experimental Questions." In Asquith and Nickles
(1982), pp. 243-256.
- - - 1985a. "Logics Based on Partial Boolean Algebras" [Review Article]. Journal
of Symbolic Logic 50:558-566.
- - - 1985b. "Semantic Alternatives in Partial Boolean Quantum Logic." Journal
of Philosophical Logic 14:411-446.
Hughes, R.1. G., and B. C. van Fraassen. 1988. "Can the Measurement Problem Be
Solved by Superselection Rules?" Forthcoming.
Jammer, M. 1966. The Conceptual Development of Quantum Mechanics. New York:
McGraw Hill.
- - - 1974. The Philosophy of Quantum Mechanics. The Interpretation of Quantum
Mechanics in Historical Perspective. New York: John Wiley.
Jarrett, J. P. 1984. "On the Physical Significance of the Locality Conditions in the B "
Arguments." Nous 18:569-589.
- - - 1989. "Bell's Theorem: A Guide to the Implications." In Cushing nnd
McMullin (1989).
Jauch, J. M. 1968. Foundations of Quantum Mechanics. Reading, M()III1.: AddHltlll
Wesley.
Jauch, J. M., and C. Piron. 1963. "Can Hidden Variables Be Exclutl('d III (JIIIIIIIIIIII
Mechanics?" Helvetica Physica Acta 38:827 -837.
Jeffrey, R. c., ed. 1980. Studies in Inductive Logic and Pro!Jn/liIl/I/, vol 7 "",~"I. ,y,
Calif.: University of California Press.
Jordan, T. F. 1969. Linear Operators for Quantum Mecllallics. N('w YIII k ,. 11111 W ,. Y
Kadison, R. 1951. "Isometries of Operator Algebras." 11'11111111 II/ MIIIII,'''II,III .
54:325-338.
Kant, I. [1787]1929. Critique of Pure Reason. Trans. N. Kl'mp Smllh . N.·w III~ ',I
Martin's Press.
- - - [1786]1970. Metaphysical Foundations of Nalliral Scicllce. Tmllll I I'll 111'0 11 ",
Indianapolis: Bobbs Merrill.
Kelley, J. L. 1955. General Topology. New York: van Nostr()nd .
Kleene, S. C. 1967. Mathematical Logic. New York: John Wiley.
Kochen, S. 1978. 'The Interpretation of Quantum Mechani ~ . " Add,,"'" III 1111
Biennial Conference of the Philosophy of Science Association, 197H . 1'1 Ill' It "I
N.J.: Princeton University, mimeo.
Kochen, S., and E. P. Specker. 1965. "Logical Structures Ari si ll~ Il 1111,,111111
Theory." Symposium on the Theory of Models. Amsterdam: Norlh 11011 11101 II.
printed in Hooker (1975), pp. 263-276.
- - - 1967. "The Problem of Hidden Variables in Quantum Me hallln. " '.1/1111111
of Mathematics and Mechanics 17:59 - 87.
358 1~I'fcrc ll cc5

Kolmogorov, A. N. [1933] 1950. FOllllllatiolls of the Th eory of Proballility. Trans. N.


Morrison. New York: Chelsea.
Komer,S., ed. 1962. Observation and Interpretation in the Philosophy of Physics. New
York: Dover.
Komer, S. 1969. Fundamental Questions of Philosophy. Harmondsworth, Middx.:
Penguin Books.
Lakatos, I. 1970. "Falsification and the Methodology of Scientific Research Pro-
grammes." In Lakatos and Musgrave (1970), pp. 91-196.
Lakatos, I., and A. Musgrave, eds. 1970. Criticism and the Growth of Knowledge.
Cambridge: Cambridge University Press.
Landau, L. D., and E. M. Lifschitz. 1977. Quantum Mechanics (Non-Relativistic
Theory), 3d ed. Oxford: Pergamon Press.
Landau, L. D., and R. Peierls. 1931. "Erweiterung des Unbestimmtheitsprinzips fiir
die relativische Quantentheorie." Zeitschrift fiir Physik 69:56-69.
Lang, S. 1972. Linear Algebra, 2d ed. Reading, Mass.: Addison Wesley.
Laplace, P. [1814J 1951. A Philosophical Essay on Probabilities. Trans. E. W. Truscott
and F. L. Emory. New York: Dover.
Leggett, A. J. 1986. "Quantum Mechanics at the Macroscopic Level." In de Boer,
Dal, and Ulfbeck (1986), pp. 35-57.
Lewis, D. 1980. " A Subjectivist's Guide to Objective Chance." In Jeffrey (1980), pp.
263-293.
- - 1986. On the Plurality of Worlds. Oxford: Blackwell.
Lieb, E. H ., B. Simon, and A. S. Wightman, eds. 1976. Studies in Mathematical
Physics: Essays in Honour of Valentine Bargmann. Princeton, N.J.: Princeton Uni-
versity Press.
Loux, M. J. 1979. The Possible and the Actual: Readings in the Metaphysics of Modality.
Ithaca: Cornell University Press.
Liiders, G. 1951. " Uber die Zustandsanderung durch den Messprozess." Annalen
der Physik 8:323-328.
Mackey, G. W. 1963. Mathematical Foundations of Quantum Mechanics. New York:
Benjamin.
MacKinnon, E. 1984. "Semantics and Quantum Logic." In Cushing, Delaney, and
Gutting (1984), pp. 173-195.
MacLane, 5., and G. Birkhoff. 1979. Algebra, 2d ed. New York: Macmillan.
Maczynski, M. J. 1967. "A Remark on Mackey's Axiom System for Quantum Me-
chanics." Bulletin de L'Academie Polonaise des Sciences, Serie des Sciences Math-
ematiques, Astronomiques et Physiques 15:583-587.
Margenau, H. 1936. " Quantum Mechanical Descriptions." Physical Review 49 :240 -
242.
- - 1950. The Nature of Physical Reality. New York: McGraw Hill .
- - - 1954. "Advantages and Disadvantages of Various Interpretations of the
Quantum Theory." Physics Today 7:6 - 13.
- - - 1958. "Philosophical Problems Concerning the Meaning of Measu rement in
Physics." Philosophy of Science 25 :23 - 33.
1'/'//'/1'1/('1'1 ,I!,!)

I ()6:l " M",I IHl rt ' IIU'1I1 1l III 011 'lllulll Ml'dl,II11' N, " /\111111111 (1/ I'hysics 23:469
485.
Maxwell, G., and R. M. Anderson, Jr., cd . 1975 . llIdllctioll, Probability and Confirma-
tion. Minnesota Studies ill Philosophy of Science, vol. 6. Minneapolis: University
of Minnesota Press.
McMullin, E. 1978. "Structural Explanation." American Philosophical Quarterly
15:139-147.
Messiah, A. 1958. Quantum Mechanics, 2 vols. Vol. 1 trans. G. M. Tenner; vol. 2
trans. J. Potter. New York: John Wiley.
Mielnik, B. 1968. " Geometry of Quantum States." Communications in Mathematical
Physics 9:55 - 80.
Mittelstaedt, P. 1981. " Classification of Different Areas of Work Afferent to Quan-
tum Logic." In Beltrametti and van Fraassen (1981), pp. 3-16.
Monk, J. D. 1969. Introduction to Set Theory. New York: McGraw Hill.
Morgenbesser, S., ed. 1967. Philosophy of Science Today. New York: Basic Books.
Mott, N . F., and H. S. W. Massey, 1965. The Theory of Atomic Collisions. Ox-
ford: Clarendon Press. Reprinted in part in Wheeler and Zurek (1983), pp.
701-706.
Nagel, E., P. Suppes, and A. Tarski, eds. 1962. Logic, Methodology and Philosophy of
Science. Stanford: Stanford University Press.
Newman, J. R., ed. 1956. The World of Mathematics, 4 vols. New York: Simon and
Schuster.
Noakes, G. R. 1957. New Intermediate Physics. London: Macmillan.
Pagels, H . R. 1982. The Cosmic Code: Quantum Physics as the Language of Nature. New
York: Simon and Schuster.
Park, J. L., and H. Margenau. 1968. "Simultaneous Measurability in Quantum
Theory." International Journal of Theoretical Physics 1:211-283.
- - - 1971. "The Logic of Noncommutability of Quantum Mechanical Operators
and Its Empirical Consequences." In Yourgrau and van der Merwe (1971), pp.
37-70.
Pauli, W. 1933. "Die allgemeinen Prinzipien der Wellenmechanik." Handbuch der
Physik (ed. H. Geiger and K. Scheel), 2d ed., vol. 24, pp. 83-272. Berlin:
Springer Verlag.
Penrose, R., and C. J. Isham, eds. 1986. Quantum Concepts in Space and Time. Oxford:
Clarendon Press.
Peterseri, A. 1963. "The Philosophy of Niels Bohr." Bulletin of the Atomic Scientists,
September 1963, pp. 8-14.
Piron, C. 1972. "Survey of General Quantum Physics." Foundations of Physics
2:287 - 314.
- - - 1976. Foundations of Quantum Physics. Reading, Mass.: Benjamin.
Popper, K. R. 1959. The Logic of Scientific Discovery. London: Hutchinson.
- - - 1982. Quantum Theory and the Schism in Physics. Totowa, N.J.: Rowan and
Littlefield.
PSSC (Physical Sciences Study Committee). 1960. Physics. New York: Heath.
360 I<efcrcll ccs

Putnam, H. 1962. " What Theories Arc No!." In Nagel, Suppes, and T"rski (1962),
pp. 240-251.
- - - 1965. "A Philosopher Looks at Quantum Mechanics." In Colodny (1965),
pp.75-1Ol.
- - 1969. "Is Logic Empirical?" In Cohen and Wartofsky (1969), pp. 181-241.
Reprinted in Hooker (1975), pp. 181- 206.
Reichenbach, H. 1944. Philosophic Foundations of Quantum Mechanics. Berkeley,
Calif.: University of California Press.
- - - 1956. The Direction of Time. Berkeley, Calif.: University of California Press.
Robertson, H. P. 1929. "The Uncertainty Principle." Physical Review 34:163-164 .
Reprinted in Wheeler and Zurek (1983), pp. 127-128.
Rosenfeld, L. 1971. "Quantum Theory in 1929." In Cohen and Stachel (1979); see
also Wheeler and Zurek (1983), pp. 699-700.
Russell, B. 1917. Mysticism and Logic. London: Allen and Unwin.
Salmon, W. C. 1984. Scientific Explanation and the Causal Structure of the World.
Princeton, N.J.: Princeton University Press.
Schilpp, P. A., ed. 1949. Albert Einstein: Philosopher-Scientist. La Sale, Ill.: Open
Court.
Schrodinger, E. 1935. "Die gegenwartige Situation in der Quantenmechanik." Na-
turwissenschaften 22:807 -812,823-828,844-849. Trans. as "The Present Sit-
uation in Quantum Mechanics" by J. D. Trimmer, in Wheeler and Zurek (1983),
pp. 152-167.
- - 1953. "What Is Matter?" Scientific American, September 1953, pp. 52-56.
Shimony, A. 1980. "The Point We Have Reached." Epistemological Letters, June
1980.
- - - 1981. "Critique of the Papers of Fine and Suppes." In Asquith and Giere
(1981), pp. 572-580.
- - - 1986. "Events and Processes in the Quantum World." In Penrose and Isham
(1986), pp. 182-203.
Sikorsky, R. 1964. Boolean Algebras, 2d ed. Berlin: Springer Verlag.
Simon, B. 1976. "Quantum Dynamics: From Automorphism to Hamiltonian." In
Lieb, Simon, and Wightman (1976), pp. 327-349.
Skyrms, B. 1980. Causal Necessity: A Pragmatic Investigation of the Necessity of Laws.
New Haven, Conn.: Yale University Press.
Stairs, A. 1982. "Quantum Logic and the Liiders Rule." Philosophy of Science
49:422-436.
- - - 1983a. "On the Logic of Pairs of Quantum Systems." Synthese 56:47 -60.
- - - 1983b. "Quantum Logic, Realism and Value Definiteness." Philosophy of
Science 50:578-602.
- - - 1984. "Sailing into the Charybdis: van Fraassen on Bell's Theorem." Syn-
these 61 :351-359.
Staniland, H . 1972. Universals. New York: Anchor Books.
Stein, H. 1972. "On the Conceptual Structure of Quantum Mechanics." In Colod ny
(1972), pp. 367 -438.
I~/ 'fm' " ces 361

Supp " /0'. 1977. " Til' Sl·.lrcll for I' hllosophi Und 'rslanding of Scientific Theories."
In rhe Stru cture of cielltific Theories, cd. F. Suppe, pp. 3-232. Urbana:Univer-
sity of Illinois Press.
Suppes, P. 1966. "The Probabilistic Argument for a Nonclassical Logic in Quantum
Mechanics." Philosophy of Science 33:14-21.
- - 1967. "What Is a Scientific Theory?" In Morgenbesser (1967), pp. 55-67.
Suppes, P., L. Henken, A. Joja, and G. C. Moisil, eds. 1973. Logic, Methodology and
Philosophy of Science, vol. 4. Amsterdam: North-Holland.
Swift, A. R., and R. Wright. 1980. " Generalized Stern-Gerlach Experiments and the
Observability of Arbitrary Spin Operators." Journal of Mathematical Physics 21
(1):77-82.
Taylor, E. F., and J. A. Wheeler. 1963. Space-Time Physics. San Francisco: Freeman.
Teller, P. 1979. "Quantum Mechanics and the Nature of Continuous Physical
Quantities." Journal of Philosophy 76:345-360.
- - - 1983. "The Projection Postulate as a Fortuitous Approximation." Philosophy
of Science 50:413-431.
- - - 1989. "Relativity, Relational Holism, and the Bell Inequalities." In Cushing
and McMullin (1989).
Tierliebhaber, X. 1939. "Katzen und Affen, Affen und Katzen: die Tiere der Philoso-
phen." Zeitschrift fur Philosophische Zoologie 1:1-26.
Toulrnin, S. 1953. Philosophy of Science: An Introduction. London: Hutchinson.
van Fraassen, B. C. 1972. "A Formal Approach to the Philosophy of Science." In
Colodny (1972), pp. 303-366.
- - - 1974a, "The Einstein-Podolsky-Rosen Paradox." Synthese 29:291-309.
- - 1974b, "The Labyrinth of Quantum Logic." In Cohen and Wartofsky
(1974), pp. 72-102. Reprinted in Hooker (1975), pp. 577-607.
- - - 1980. The Scientific Image. Oxford: Clarendon Press.
- - - 1981a. "Assumptions and Interpretations of Quantum Logic." In Beltra-
metti and van Fraassen (1981), pp. 17-31.
- - - 1981b. "A Modal Interpretation of Quantum Mechanics." In Beltrametti
and van Fraassen (1981), pp. 229-258.
- - 1982. "The Charybdis of Realism: Epistemological Implications of Bell's
Inequality." Synthese 52:25 - 38.
- - - 1985. "Salmon on Explanation." Contribution to a symposium at the East-
ern Division Meeting of the American Philosophical Association, 1985. Prince-
ton, N.J.: Princeton University, mimeo.
von Neumann, J. [1932] 1955. Mathematical Foundations of Quantum Mechanics.
Trans. R. T. Beyer. Princeton, N.J.: Princeton University Press.
Wan, K.-K. 1980. "Superselection Rules, Quantum Measurement and the Schr6-
dinger's Cat." Canadian Journal of Physics 58:976-982.
Weyl, H. 1952. Symmetry. Princeton, N.J.: Princeton University Press.
Wheeler, J. A. 1957. "Assessment of Everett's 'Relative State' Formulation of Quan-
tum Theory." Reviews of Modern Physics 29:463-465. Reprinted in de Witt and
Graham (1973), pp. 151-153.
362 Referellces

Wheeler, ]. A., and W. H. Zurek, eds. 1983 . Q ll all/IIIII T heory all d M eaSli re ml' lI/ .
Princeton, N .].: Princeton University Press.
Wigner, E. P . 1961. "Remarks on the Mind-Body Question." In Good (1961). Re-
printed in Wigner (1967), pp. 171-184.
- - - 1963. " The Problem of Measurement." American Journal of Physics 31 :6 - 15.
Reprinted in Wigner (1967), pp. 153-170, and Wheeler and Zurek (1983), pp.
324-341.
- - - 1967. Symmetries and Reflections. Bloomington, Ind. : Indiana University
Press.
- - - 1970. " On Hidden Variables and Quantum Mechanical Probabilities."
American Journal of Physics 38:1005-1009.
- - - 1973. " Epistemological Perspectives in Quantum Theory." In Hooker
(1973), pp. 369-385.
Yourgrau, W., and A. van der Merwe, eds. 1971. Perspectives in Quantum Theory:
Essays in Honor of Alfred Lande. Cambridge, Mass.: M.LT. Press.
Index

Absorption, 182, 189 Bohr, N., 214, 216, 217, 228-231, 241, 266,
Accardi, L., and A. Fedullo, 238 296,297,306,310,317
Accumulation point, 338 Bohr, N., H. Kramers, and J. Slater, 277
Algebra: of events, 194-201, 302; of Bohr-Heisenberg interpretation, 214, 216
properties, 178 -182; of propositions, 203 Bohr's theory of atom, 175
Appro>\imation: empirical, 298; fortuitous, Boolean algebras, 178, 182-186, 189;
297,298;uniform,298 atomic, 185; finite, 185; a-algebra, 220
Aristotle, 156 Boolean manifold, 192
Aspect, A., 241, 245, 247, 308 Borel set, 197
Aspect, A., J. Dalibar, and G. Roget, 241 Born, M., 158, 173,232,302
Associativity, 182, 189 Born's rule, 162
Automorphism, 128 Bub, J., 170, 173, 214,216,217,220,22:1,
Available properties, 215 224, 234, 238, 258, 273, 289, 299, 3 1H, :11 '/
Axiomatic view of theories, 80, 256 Buridan's ass, 279
Busch, P., and P. Lahti, 107, 157,260, 2(,\
270
Ballentine, L., 163
Basis, 13; orthonormal, 48 C2, 11,31-36
Belinfante, F., 173 Cartwright, N., 81, 119, 256, 257, 2H'I, '/'/1
Bell, J. L., and A. Slomson, 183, 186 296
Bell, J. S., 170, 172, 173, 174, 175,238,240, Cassinelli, G. See Beltrametti and .111/1 ,,,,/II
322 Categorial framework, 175,217, 2JO, .lUl
Bell's inequality, 238, 241, 242, 243, 244, 305
245, 297, 306 Causal anomalies, 219, 228
Bell's theorem, 237, 238 CHSH inequality, 242
Bell-Wigner inequality, 170-172, 237 CKM. See Cooke, Keane, and Mor,1II
Beltrametti, E., and G. Cassinelli, 132, 139, Classical: horizon, 312-313, 319; 10111"
146, 147, 148, 162, 178, 191, 196, 198, 184, 186,202,203
200,235,265,270,285,286,287,288,289 Classical mechanics, 175, 194; I III III II"" I"
Bigelow, J., 292 cobi formulation of, 57, 72, 75, 17/,
Bilinear form, 337 Clauser, J., 242
Binomial theorem, 94 Clauser, J., and A. Shimony, 17'1, 1~1I , JoI .
Birkhoff, G., and J. von Neumann, 214. See Cohen, R, and J. Stachel, 297
also MacLane and Birkhoff Coherence, 193, 194
Bianche, R, 80 Collapse of wave packet, 227, 272, 111i1 1111
Bohm, D., 109, 159, 160, 170, 237,238 Common cause, 246
64 I"tlt'

Commutativity, 182, 189 I)irac, 1'., 11 ,92, 108, 287


Compatibility, 193; joint compatibility, 22 1. Dira notation, 26, 254
See also Observables, compatible; Disjunction, 202
Subspaces, compatible Disjunctive property, 212, 217
Complementarity, 214, 231; principle of, 228 Dispersion principle, 155 - 157, 260
Complementation, 182, 193, 221 Dispositions, 62, 69. See also Properties, vs.
Completeness: of physical theory, 158, 243; dispositions
of quantum mechanics, 158, 304 Distributive law, 210, 234
Complex: conjugate, 31; numbers, 28-31 Distributivity, 182, 189
Compton scattering, 271 DuBois magnet, 2
Compton-Simon experiment, 271, 277 Duhem, P., 245, 299, 300, 301 , 305, 306
Conditional probabilities. See Probability, Dynamical evolution. See Continuity of
conditional evolution; States, dynamical evolution of
Conjunction, 202
Connectives, 181, 202, 203, 212; partial, e,31
216; quantum, 203-204; truth-functional, Earman, J.. 76, 291, 292
186, 206, 216 Eberhard, P., 242
Consistency, 201, 202 Eco, U., 36
Constructive theories, 258 Eddington, A , 39
Continuity of evolution, 116, 146 Ehrenfest, P., 8
Conventionalist thesis, 312, 314 EiIfel Tower, 80, 82
Convexity, preservation of, 146 Eigenvalues. See Operators, eigenvalues of
Convex set, 143, 145, 148 Eigenvectors. See Operators, eigenvectors of
Cooke, R., 339 Einstein, A., 157, 158, 162, 163, 174,258,
Cooke, R., and J. Hilgevoord, 196 277,304,319
Cooke, R., M. Keane, and W. Moran, 147, Einstein, A, and P. Ehrenfest, 8
321-346 Einstein, A, B. Podolsky, and N. Rosen,
Coordinate: of momentum, 58; of position, 158,170-172. See also EPR
58; system, 12,27 Einstein-locality, 240; deterministic, 242;
Copenhagen interpretation, 214, 216, 228, stochastic, 242, 250 - 252, 306
293,296,306-310, 311,319 Einstein-separability, 240
Covering law. See Explanation Ensemble, 144, 163
Crane, W., 297 Entailment, 182, 201, 202, 203, 205
Cresswell, M. See Hughes and Cresswell EPR: argument, 158-162; criterion for
Cushing, J.. and E. McMullin, 245, 253 physical reality, 158, 160; experiment,
237, 238, 255, 277, 302, 304. See also Ein-
Dalibar, J. See Aspect, Dalibar, and Roget stein, Podolsky, and Rosen
Dalla Chiara, M.-1., 206, 207 Equivalence class, 198
Daneri, A, A Loinger, and G. Prosperi, Euclidean method, 80. See also Axiomatic
230,288-289,317 view of theories
Davies, E., 270 Event, 195, 198; structure, 299
Davies, P., 266 Everett, H., 290, 291, 293, 311, 312
de Broglie, 1., 231 Expectation value, 71, 134, 261
Deductive logic. See Logic, deductive Experimental question, 60, 68, 69,156,179,
Degeneracy, 50, 70 197, 259
Demopoulos, W., 258 Explanation: covering law model of, 257;
De Morgan's laws, 183, 188 simulacrum account of, 257; structural,
Density matrix, 138 256-258
Designated element, 183 Extremal point, 143
d'Espagnat, B., 172,240,297
Determinism, 58, 74-77, 91; statistical, 116, Faithful measurement principle, 163, 171
146 Fano, G., 44, 51, 54, 98,114, 136, 147, 197
de Witt, B., 289, 290, 292, 293 Fano, U., 139
de Witt, B., and N . Graham, 290 Fedullo, A. See Accardi and Fedullo
Fl'yerobcnd, I' , I I II, 2/1:1 11 111l\illo " '~ I'qll"lill" ~, 73, 76
I'cynman, R., I, 2, H, H2, 1:>5, J OO 11.,nson, N ., lO B, 22 H
Feynman, R., It L 'ighlon, a nd M. Sands, 2, Il ardcgrc', ., 176
5,227,234,300, 303 Ilardegrce, G., and P. Frazer, 193, 194,220,
Field, 40; of sets, 87, 178, 195; a-field of 221
sets, 219 Harrison, J.. 211, 212, 213
Finch, P., 193 Hasse diagram, 185
Fine, A., 158, 163, 234 Healey, R., 163, 165, 291, 293
Finkelstein, D., 178, 199 Heisenberg, W., 44, 218, 267, 268, 269,
French, A., 158 281,302
Friedberg, R, 166 Heisenberg's microscope, 268-269
Friedman, M ., 206, 234 Heisenberg's principle, 267
Functions, 42; composition of, 115; Heitler, W., 271, 276
continuous, 122; frame, 147, 169, 223, Hellman, G., 241-243, 245, 246
323,325-327, 330; generalized probabil- Hempel, c., 230, 257
ity (GPF), 222, 308, 347; Hamiltonian, Hertz, H ., 300
72; regular frame, 148; square-integrable, Hidden variable theory, 109, 172-175,246;
44, 63. See also Probability contextualist, 174; nonlocal, 174
Functional dependence, 199 Hilbert space. See Spaces, Hilbert
Hilgevoord, J. See Cooke and Hilgevoord
G ,2 , 191,203,204 Holdsworth, D., and C. Hooker, 176, 178,
Galileo, 231 194,200
Geometrical optics, 175 Homogeneity of time, 115, 116, 146
Gerlach, W. See Stem-Gerlach experiment Homomorphism, 183, 186
Geroch, R , 290 Hooker, c., 162, 229, 230
Gibbins, P., 206, 234, 260, 266, 269 Hooke's law, 297, 298, 301
Giere, R , 80, 218, 292 Hughes, G., and M . Cresswell, 206
Gillespie, D., 57 Hughes, R , 179,200,206,216, 221
Gleason, A., 147
Gleason's theorem, 146, 164, 169, 223, 224, i,29
260,321-346 Ideal gas law, 298, 299
Glymour, c., 206 Idempotence, 189
Giidel, K., 207 Ignorance interpretation, 96, 97, 144, 14 5,
Goldstein, H., 128 218,283 . See also Sial '8, miXl'd
Goudsmit, 5., 3 Imaginary numbers, 29
GPF. See Functions, generalized probability Incompatibility, 199; prln Ipk, 20 I
Graham, N. See de Witt and Graham Indeterminacy prin ipll', 26. ,21>'> , II ?
Greatest lower bound. See Infimum Infimum, 188
Greechie, R, 191 Inner produ I, pres<'rv(l iioll Ilf, I l 'l ,'"", 1I 1 ~"
Group: Abelian, 39; commutative, 39; Vectors, inn 'r prudUl'1 of
continuous, 114, 115, 146; identity ele- Interaction [I lg -bra, 2 1,)
ment of, 39; one-parameter, 114, 146; Interference, 227, OJ
representation of, 128; symmetry, 128, 194 Intuitionisli logic. f(' 1,(11-1 I ', 1111,11,", . 1,
Gudder, 5., 174, 193, 194,222 Isabella, 306
Gupta-Bleuler formulation of quantum Isomorphism, 39, 40, 44 , Il l , Il
electrodynamics, 287
Jammer, M., 2, 8, 162, 1(,1>, 17.1, ) /01" J'/'/
Hacking, I., 230 Jarrett, J., 243 - 245,246, 147, ~" O , 1', I,
Hall, N., 302 290,304
Halmos, P ., 324, 338, 340 Jauch, J., 148, 14 9, 171l, 1</'1, 01" )/1'1, 1111
Hamiltonian. See Functions, Hamiltonian; measurement, 281 2114
Operators, Hamiltonian Join, 182, 190, 191
Hamilton-Jacobi formulation of classical Jordan, T., 47, 51, 54, 114, II '" 1 111, 11,1,
mechanics. See Classica l mechanics 285,287
366 111111'

Kadison, R , 146 McMullin, E., 256, 266, 269 . See II /SO ush
Kant, 1., 176, 237, 317 ing and McMullin
Keane, M. See Cooke, Keane, and Moran Maczinski, M., 178, 197, 200
Kelley, J., 338, 346 Magnetic core hypothesis, 3
Kinetic energy, 59 Magnetic moment, 2, 3, 296
Kleene, S., 184 Many-worlds interpretation (MWI),
Klein, F., 128 289 - 294
Kochen and Specker's theorem, 164-168, Mapping, 42
170,173,174,177,206, 213,238,260,322 Margenau, H., 266, 270, 271, 272, 273, 276,
Kochen, S., 176,215,216,217 277, 302
Kolmogorov, A., 218, 220, 306 Massey, H. See Mott and Massey
Kolmogorov event space, 255, 307 Matrix, 17; diagonal elements of, 33;
Kolmogorov probability axioms, 88, 142, 219 representation, 18, 32, 128
Kolmogorov probability function, 219, 222, Maximum element, 188, 191
237 Maxwell's equations, 113, 300
Komer, S., 175 Maxwell's theory, 299, 300
Kramers, H. See Bohr, Kramers, and Slater Meaning incommensurability, 230
Measurement, 299, 302; maximal, 273;
£2, 44 problem, 79, 212, 217, 280, 312
U , 44 Medicine Hat, 62
Lahti, P. See Busch and Lahti Meet, 182, 190
Lameti-Rachti, M., and W. Mittig, 297 Messiah, A., 134
Landau, L., 272 Metaphysical nostalgia, 217
Landau, L., and E. Lifschitz, 319 Mielnik, B., 132, 135
Lande, A. See Sommerfeld and Lande Mind-body problem, 294
Laplace, P., 74, 75, 76, 312 Minimum element, 188, 191
Latency, 302-304, 308, 309 Minkowski space-time, 242, 258
Lattices, 189-190; atomic, 204; comple- Mittelstaedt, P., 178
mented, 189; distributive, 189; nondistrib- Mittig, W. See Lameti-Rachti and Mittig
utive, 191; orthocomplemented, 189, 191; Mixture. See States, mixed
orthocomplete, 189; orthomodular, 178, Modal: interpretation, 315, 317, 318; logic,
189, 190,192; ultrafilter on, 204 206; operator, 206
Least upper bound. See Supremum Models, 79 - 82
Leggett, A. , 226, 241, 297 Monk, J., 321
Leighton, R See Feynman, Leighton, and Moose Jaw, 62
Sands Mott, N ., and H. Massey, 296
Lewis, D., 218, 292, 293 MWI. See Many-worlds interpretation
Locality condition, 161
Local realism, 172,237 National Security Council, 293
LOgic: classical, 184, 186, 202, 203; Natural units, 4, 65, 105, 264
deductive, 178; intuitionistic, 207; modal, Negation, 202
206; quantum, 177, 178-217 Newman, J. , 39
Loinger, A. See Daneri, Loinger, and Prosperi Newton's laws of motion, 73
Loux, M., 292 Noakes, G., 298
Liiders rule, 224-226, 235, 236, 248, 249, No-go theorems, 173 -175
250, 253, 254, 274, 275, 278, 297, 298, Non-Euclidean geometry, 209
299,308,315, 347 -348
Ludwig, G., 265 Observables, 59, 63, 69, 82, 97, 98, 155,
197; compatible, 102, 104, 165; Fourtier-
Mackey, G., 116, 146, 147, 178, 197,200,322 connected, 259; full set of, 168; function-
Mackey-Maczinski theorem, 198 ally dependent, 101, 102; incompatible,
MacKinnon, E., 214 104-107, 111, 157, 159,309; indepen-
MacLane, S., and G. Birkhoff, 39, 40 dent, 100; minimal representation of,
1"tI,'\ .11,/

IOil ; IIHII1 ... "III,I\, 10 / , I,", "'"lll1l1y 1'111"""'111 111\11111"1 , " , II!" 2M, 267
Irnns(orm"blt" I 0(', 135, 2S 9; po~ ltjun , Popper, K., 163,220,26 ,266
64,66, 107,264 I'os 'I, 186 190; omplcmented, 188;
Operalionalism, 97, 195, 2 14 , 230 orlhocomplcmented, 188; orthocomplete,
Operations, 38 - 40; on field, 40; on group, 189; orthogonality on, 188; orthomodu-
39; on vector space, 41; partial, 192, 194; lar, 189; separable, 200
set-theoretic, 87, 181 Possible worlds, 292
Operators, 14, 97; addition of, 20, 43; Postulate M, 198, 199
bounded, 137, 138;corrunuting, 15, 103, Precise value principle, 163, 164, 168, 171,
104; continuous, 137; decomposition of, 177,217
25, 36; density, 138; differential. 43; Preparation, 62, 85, 88, 196
eigenvalues of. 22, 23, 32, 43, 49, 64; ei- Principle theories, 258
genvectors of, 22, 23, 32, 43, 49, 64; Probability: classical space, 219, 306;
Fourier-connected, 107; Fourier-Plan- conditional, 223, 232, 306-307;
cherel. 107; Hamiltonian, 77, 81, 115, epistemic, 144; function, 89, 219; general-
296; Hermitian, 32, 34, 47, 49, 52, 63; ized function (GPF), 222, 308, 347;
idempotent, 47; identity, 18, 48; linear, generalized theory of, 219-222;
17,42; matrix representation of. 18,32; Kolmogorov function, 219, 222, 237;
momentum, 107, 264; multiplication of, measure, 89, 142, 146; objective, 218;
18, 32, 43; position, 64, 66, 107, 264; propensity interpretation of, 164, 218;
positive, 136, 147; projection, 15, 19, 23, relative frequency interpretation of. 163,
24,35,46-47; reflection, 14,23; rotation, 218; subjective, 218; transition, 117
14, 23, 78; self-adjoint, 147; statistical, Projection postulate, 271 -275, 299, 314, 315
138; symmetric, 24, 33; trace class, 136; Properties, 155, 176, 237, 259, 302, 303; vs.
trace of, 136; unitary, 78; vector, 134, dispOSitions, 62; emergent, 317. See also
310; zero, 15 Available properties
Orthoalgebra, 196, 199, 220-222, 302; Property-eigenvalue link, 314, 315
associative 221 PropOSitions, 182, 184, 202; maximal
Orthogonality, 193,221; of subspaces, 46; consistent sets of, 202; quantum, 205,
of vectors, 26, 27, 34, 45 208, 259
Orthogonal sum, 221 Prosperi, G. See Daneri, Loinger, and Prosperi
Orthomodular identity, 189, 191 Ptolemaic astronomy, 158
Outcome, 85, 195 Pure states, preservation of, 117. See also
Outcome independence, 243 States, pure
Putnam, H., 80, 205, 209-212, 216, 217,
Pagels, H., 252 234, 289
Parameter independence, 243 Pythagoras' theorem, 83, 84
Park, J., 270
Partial Boolean algebras, 178, 192; Quadratic form, 338
intransitive, 193; transitive, 193, 222 Quantum: conditionalization, 297;
Partial ordering, 185, See also Poset connectives, 203-204; event, 259, 275,
Particle model, 227 314; event interpretation, 276, 278, 296,
Partition, 238 301-306,314; logic, 177, 178-217;
Pauli, W., 276 measurement conditionals, 303, 306;
Pauli spin matrices, 36-38, 65, 67, 131, propositions, 205, 208, 259
135, 139, 165
PBA. See Partial Boolean algebras 1R2, 11, 12-28
Peano's axioms, 80 Ray, 35
Peierls, R., 272 Reichenbach, H., 214, 216, 217, 227, 245,
Petersen, A., 229 246,248
Phase space, 58,175,179 Relation: antisyrrunetric, 185; reflexive, 185;
Piron, c., 178, 199, 206, 322, 330 transitive, 185
Planck, M., 231 Relative frequency principle, 163, 171
3 8 II/de

Relative state formulation, 291, 311 mi ro-, 176; mixed, 63, 93, 11 0, II I, I 6,
Relativity: general theory of, 209; special 141 , 143 (see also Ignora nce interpr ' ta
theory of, 240, 253, 304, 305 tion); position, 63; pure, 63, 69, 91 , 111 ,
Reversibility, 116 121,129,201; reduction of, 149 - 151 ;
Robertson, H., 267 singlet spin, 160, 239, 254; spin, for
Roget, G. See Aspect, Dalibar, and Roget spin-t particle, 63, 131; statistical, 176,
Rosenfeld, 1., 230, 297 215; sufficient set of, 200; value, 215
Ruby laser, 82 Statistical algOrithm, 67, 136, 143, 155, 236
Russell, B., 40 Statistical deternrinism, 116, 146
Statistical interpretation, 162 - 164, 238
Salmon, W., 240, 245, 246, 255, 256, 258 Stein, H., 45, 55, 116, 195, 221
Sands, M. See Feynman, Leighton, and Sands Stern-Gerlach experiment, 1-8, 37, 113,
Scalars, 40 296, 297
Schopenhauer, A., 80 Stone's theorem: on Boolean algebras, 186,
Schrodinger, E., 44, 230, 232, 279, 280, 319 220; on unitary operators, 114, 117, 146,
Schrodinger equation, 77, 78,81, 113-118, 201
145,146,201,235,278,280,298,313,315 Structural explanation, 256-258
Schrodinger's cat, 279, 290 SU(3), 128
Semantic view of theories, 175,257 Sublattice, 192
Senrigroup, 116, 117 Subspaces, 35, 36, 45; closed, 55; compati-
Shimony, A., 243, 245, 246, 253, 304, 311. ble, 102; dimenSionality of, 49; orthogo-
See also Clauser and Shimony nality of, 46
Sikorsky, R., 182 Summhammer, J., 226
Simon, B., 146 Superposition, 143, 236, 303; preservation
Simulacrum, 81; account of explanation, 257 of, 117; principle of, 92,108,111,200
Skyrms, B., 218 Superselection: rule, 285, 318; subspace, 285
Slater, J. See Bohr, Kramers, and Slater Suppe, F., 80
Slomson, A. See Bell and Slomson Suppes, P., 80, 220
Sommerfeld, A, 7 Support: principle, 260; requirement, 273
Sommerfeld, A., and A. Lande, 3 Supremum, 187
Spaces: basis for, 13, 48; Cartesian product Surface locality, 243, 246, 250-252
of, 149; complete, 55; complex, 11; Swift, A, and R. Wright, 113
dimensionality of, 49; direct sum of, 100; Symmetry, 120-121, 127, 135, 309
Hilbert, 11,55,63,69; phase, 58, 69,175, Systems, 86, 301, 302; classical, 57-59, 69;
179; physical, 124; real, 11; representa- composite, 136, 148-151, 161,349-350;
tional, 124; state, 63, 69; topological Newtonian, 79; quantum, 69, 79; spin-I ,
product of, 149; vector, 11, 40-42, 88. 164; spin-t, 3, 11, 119-130,263
See also Tensor-product spaces
Span, 48, 49,87, 190
Spectral: decomposition, 55, 67; decomposi- Taylor, E., and A Wheeler, 240, 242
tion theorem, 25, 36, 50, 54, 98; measure, Teller, P., 210, 212, 274, 278, 297, 298, 299,
51-55,235 302
Spectrum: continuous, 51-55, 66, 67, 69, Tensor-product spaces, 136, 160, 253; inner
210,235,260; discrete, 49, 69, 71, 97 product on, 148; linear operators on, 149;
Spin, 3, 11, 63, 296, 309. See also Pauli spin orthonormal basis for, 148; reduction of
matrices; Systems, spin t states on, 149 - 151; zero vector in, 149
Stachel, J. See Cohen and Stachel Three-valued lOgic, 214
Stairs, A, 165, 174, 209, 212, 217, 250, Tierliebhaber, X., 279
258,275 Tinkertoy,81
Staniland, H., 303 Total ordering, 187
States, 82, 88, 90, 196, 197; classical, 58; Trace of operator, 136
complete set of, 198; dispersion-free, 79, Truth: assignments, 203, 205, 206; tables,
159; dynanrical, 176; dynanrical evolution 184; values, 184, 214, 216
of, 72, 77-78, 113-118, 145-146, 201 ; Two-slit experiment, 226 - 238, 255
111111',1' .Jill

Uhl,'nb" ,'k, C " ,I multiplication of, 12,34, 40; zero, 12, 13,
Ultraflllt: r, 1114 , 1H ~, 202, 2()4 , 205 41 . Sa II/ SO Spa eS, vector
Uncertainty, 262, 267; interpretations of, von Neumann, j ., 11, 45, 55, 108,146,162,
266; principle, J08, 111 , 200,269 173,267, 268, 269, 271, 272, 273, 274,
Unicorns, 217 275,276,277,278,288,294,304,313,315
Universal wave function, 312
Un polarized electron, 144,215 Wan, K.-K., 285, 287
Wave model, 227, 297
Valuations, 207 Wave-particle duality, 228, 231, 302, 303
van der Waals' law, 298, 299 Wheeler, J., 290. See also Taylor and Wheeler
van Fraassen, B" 79, 80,149,176,178,218, Wheeler, J" and W. Zurek, 232, 241, 288
246-247,248,249,271, 272,277,287, Wigner, E., 113, 170,238,246,278,294,
288, 292, 306, 315, 318 295,311,317
Variance, 262 Wigner's friend, 294-295
Vectors, 12; addition of, 12, 13,34,40; Wright, R. See Swift and Wright
inner product of, 26, 33, 34, 44-45;
length of, 26, 33, 88; normalized, 26, 34, Z2, 183
45; orthogonality of, 26, 27, 34, 45; scalar Zurek, W. See Wheeler and Zurek

You might also like