General properties of entropy

Alfred Wehrl*
Institute for Theoretical Physics, University of Vienna, Vienna, Austria

It is rather paradoxical that, although entropy is one of the most important quantities in physics, its main
properties are rarely listed in the usual textbooks on statistical mechanics. In this paper we try to fill this
gap by discussing these properties, as, for instance, invariance, additivity, concavity, subadditivity, strong
subadditivity, continuity, etc. , in detail, with reference to their implications in statistical mechanics. In
addition, we consider related concepts such as relative entropy, skew entropy, dynamical entropy, etc.
Taking into account that statistical mechanics deals with large, essentially infinite systems, we finally will
get a glimpse of systems with infinitely many degrees of freedom.

222 Alfred Wehrl: General properties of entropy

of entropy and may frequently lead to rather obscure pense with an ab initio description of infinite systems.
conceptions and to very speculative or even mystical This will be done in the last section, but again, I can
ideas. (An example is the famous heat death. ) How- present only a very sketchy treatment because there
ever, it has to be stressed that the concept of entropy are severe mathematical obstacles that require exten-
is not at all unclear but a very well defined one. Of sive studies and go beyond the scope of this review.
course, a correct definition is only possible in the But after all one cannot avoid this approach because
framework of quantum mechanics, whereas in classical such important properties as ergodicity, mixing, sta-
mechanics entropy can only be introduced in a some- bility, etc. , can (quantum-mechanically) only hold in
what limited and artificial manner. strictly infinite systems.
Admittedly entropy has an exceptional position among As already mentioned, entropy can be considered as a
the physical quantities. For instance, it does not show measure of the amount of chaos, or, to what extent a
up in the fundamental equations of motion, such as the density matrix can be considered as "mixed. " In See.
Schrodinger equation. Its nature is rather, roughly II.C an elaborate version of this concept of "mixedness"
speaking, a statistical or probabilistic one; entropy can of a density matrix is presented. Since, on the other
be interpreted as a measure of the amount of chaos hand, entropy can also be regarded as a measure of the
within a quantum-mechanical mixed state. However, lack of information about a system (this is just another
entropy by no means has to be considered as an entirely point of view of the preceding statement), it is also
nem quantity going beyond the concepts of classical or necessary to comment on the relation between (physical)
quantum mechanics. This idea has been discussed fre- entropy and information theory (Sec. II.G).
quently in the past and, from time to time, is even Of course, a few words also have to be said about the
found in the present-day literature. I.et me emphasize classical ensembles of statistical mechanics (Sec. I.C)
that for a description of entropy the usual concepts of as well as about the history of the subject (Sec. I.D).
quantum mechanics such as Hilbert space, wave func- Again, this will be rather cursory because there exists
tion, observables, and density matrices are absolutely a rich and excellent literature about all that.
sufficient (Sec. 1.A). I hope that the physics will not be hidden behind &math-
Entropy relates macroscopic and mieroseopic aspects ematical technicalities. At least I have tried to avoid
of nature and determines the behavior of macroscopic this.
systems, i. e. , real matter, in equilibrium (or close to
equilibrium). Why this is true unfortunately is not yet
understood in full detail, in spite of a century's efforts
of thousands and thousands of physicists. There are 6 EN E BALI TI ES
many opinions and proposals for a solution to this
problem; however, none of them seems to be completely A. Definition of entropy
satisfactory. Since there is an abundant literature on As already discussed in the introduction, entropy is
this topic, I will not, in this review, try to take account different from most physical quantities. In quantum
of all the results obtained so far, but will restrict my- mechanics one has to distinguish between observables
self to a few remarks only (Sec. I.B.). and states. Observables, like position, momentum,
What I rather mant to do is to give a survey of the angular momentum, etc. , are mathematically des-
general properties of entropy, i.e. , those properties cribed by self-adjoint operators in Hilbert space.
that do not depend on certain specific systems but are — —
States which generally are mixed are characterized
generally true. This is the main content of Sec. II. Cer- by a density matrix, say, p, i.e. , a Hermitian operator,
tainly some of these properties are well~known whereas ~ 0, with trace = 1. The expectation value of an observ-
others seem to have escaped general attention, as, for able A in the state p is (A) = Trpb.
instance, strong subadditivity. But all of them are very Now entropy is not an observable; that means that
important and indispensable for, say, a correct treat- there does not exist an operator with the property that
ment of the thermodynamic limit and various other its expectation value in some state would be its entropy.
problems. I have tried to indicate in several places It is rather a function of a state. If the state is des-
what' these properties are good for in physics, however. cribed by the density matrix p, its entropy is defined
Sometimes this will be rather sketchy and I will outline by
the main ideas only and will have to refer to the original
papers for a detailed treatment.
Besides entropy itself there are many other quantities S (p) = —A Try slnp.
related to it that are of interest, as, for instance, the
relative entropy and several other concepts. They will This formula is due to von Neumann (1927) and gener-
be treated in Secs. III and H7. One thing should be said alizes the classical expression of Boltzmann and Gibbs
in this connection: there is a tremendous variety of to quantum mechanics. [von Neumann's derivation is
entropylike quantities, especially in the classical ease, based on earlier arguments by Einstein (1914) and
end perhaps every month somebody invents a new one. Szilard (1925)]. k~ is Boltzmann's constant = 1.38
Among all these "entropies" I have tried to select those x10 '6 erg/K. In what follows we will put it equal to 1
that, in my opinion, are of some physical significance. which corresponds to measuring the temperature in
Maybe my choice will be felt to be subjective. ergs instead of Kelvin; thus entropy becomes dimen-
Since statements in statistical mechanics are fre- sionless. (Occasionally we will insert in the formula
quently true in the infinite limit only, one cannot dis- for S(p) an arbitrary, compact, positive operator rather

Alfred Wehrl: General properties of entropy 223

than a density matrix. The quantity thus obtained has, scribed expectation values. Let us assume that in a
of course, no direct physical meaning. )' certain system there are N' different pure states, each
Entropy is a well defined quantity, no matter what of them occurring with the same probability. Then the
the kind or size of the system under consideration is. entropy is S=lnW (remember that we have put ks = 1).
(This statement, however, has nothing to do with the The density matrix of this system is p= (1/W)P, P being
question to what extent entropy is a useful quantity in a PV-dimensional projection. One easily can see that .
physics. ) It is always ~0, and, as we will see im- in%" = —Tr p ln p.
mediately, =0 exactly for the pure states, possibly If p is of a more general type, then one has to look for
=+~. (In a certain sense this latter possibility happens to an expression that interpolates between density matrices
be the usual case. Fortunately, this has no serious conse- of the form 1/W times a, W-dimensional projection. Of
quences in physics, ef. Sec. II.D. ) It is another question course, this is done by S(p) = -Trplnp, but there are
how well it can be measured (ef. Sec. IV. B). Admit- many more expressions which do the same (for in-
tedly in most cases one is not able to perform suffi- stance, —ln Trp', cf. Sec. IV.B). However, S(p)
ciently many measurements in order to determine the = —Trplnp is the only possibility with seasonable pro-
density matrix p, and thus S(p), completely. But this perties (such as additivity and subadditivity, cf. Sec. II.E
problem does not concern entropy specifically, only and F. Furthermore, the latter expression enjoys nice
quite generally the quantum-mechanical concepts of "mixing properties" that are very desirable from the
density matrices and wave function. However, it is true point of view of physics; cf. See. II.B).
that even if one knows p completely, it may be ex- It is rather instructive to pay attention to the combin-
tremely hard to calculate S(p), although, of course, atorial aspects of von Neumann's formula. Each den-
this can be done in principle, because one would have to sity matrix can be diagonalized: p=gj»~k)(k) [where ~k)
diagonalize an infinite matrix in order to compute the =normed eigenvector corresponding to the eigenvalue
trace of a function of it, namely, -plnp. p», ~k) (k~ =projection onto ~k), p» -0, +p»=1]. S(p)
= —Qp» inp» (we understand that 0lnO =0). p» is the
probability of finding the system in a pure state ~k). If
1. Various interpretations of the expression for the one performs N measurements, one will obtain as a re-
entropy sult that (at least for large N) the system is found
Before trying to clarify the relation between the ex- p, N times in the state ~1), p, N times in the state ~2),
~ ~

pression S(p) and physical reality, I want to mention a ete. (Of course, these quantities need to be integers,
few interpretations of von Neumann's formula. but this is only a minor point which easily can be cor-
Ludwig Boltzmann's great discovery was the celebrated rected. ) Now the density matrix does not contain any in-
formula formation about the order in which one will find the
states ~1), ~2), . . . , ete. There are Nl/(P, N)!(P,N)!. . .
possibilities for this; and for N- ~ we find (by virtue
whichappeared' in a paper in 1877 and established the of Stirling's formula) that 1/N times the logarithm of
connection between the variable of state, "entropy, " this number of possibilities converges to S.
which had been derived from phenomenological consid- One may likewise interpret this fact in the following
erations, and the "amount of chaos" (or disorder) of a manner: consider N copies of the same system (re-
system, which, more precisely, means the number of presented by the Hilbert space H H- ~ ~
H, H =Hil-
microstates which have the same prescribed macro- bert space of the original system). In this new system
scopic properties. (This number has been denoted as there are microstates of the form ~1) S ~2). . . , etc. ,
"thermodynamical probability, " in German "thermo- where ~1) occurs p, N times, ~2) P»N times, and so on.
dynamische Wahrscheinliehkeit" —
hence the letter W. ) All these microstates have the same weight. According
Of course, Boltzmann's treatment was a purely classi- to Boltzmann one obtains for the entropy lnW„(with W~
cal one. Since the "number of microstates" does not =N!/(p, N)!(p,N)! ~ ~
). The corresponding portion for

literally make sense in classical mechanics he took it one system is (1/N) inW„, which goes to S(p) as N- ~.
as the available volume in phase space divided by the
volume of an (at first arbitrarily chosen) "unit cell. " 2. Entropy and information theory
In quantum mechanics, however, there is no ambi-
As already explained, entropy is a measure of the
guity at all; the "number of microstates" may be in-
"amount of chaos" or of the lack of information about a
terpreted as the number of pure states with some pre-
system. If one has complete information, i.e. , if one is
concerned with a pure state, entropy =0. Otherwise it
Sometimes, mainly in the mathematical literature, one uses is &0, and it is bigger the more microstates. exist and
the letter H instead of 8 for entropy. It is claimed that the the smaller their statistical weight is. [One easily
H should be a capital "eta"; however, this is not so certain. checks the inequality S(p) ~ in(1/p, ), p, being the biggest
In any case, the letter H was introduced by Burbury in only eigenvalue (= operator norm) of p. ] This principle,
1890, whereas Boltzmann himself originally used "E." In namely, that entropy is a measure of our ignorance a-
physics, H is not a very good notation because of the risk of bout a system, described by a density matrix, or, in
confusion with the Hamiltonian. The name "entropy" is due
to Clausius (1865) and means transformation (vpo~q). The the classical case, by a probability distribution, en-
prefix "en" was chosen to have a resemblance to the word ables one to apply results of mathematical information
"energy. " theory to physics (Sec. II.G). Also the formal corre-
Not quite in this form, which is due to Planck (1906). spondence between the expression —Qp»lnp» and Shan-

224 Alfred Wehrl: General properties of entropy

non's expression for the information content of a dis- & —p" (p, q) lnp" (p, q) and
crete probability distribution suggests such a proce-
dure. We will discuss it in detail in Sec. II.G.
(En the rest of this article, we simply will write S, p in-
3. The classical approximation stead of S', p", if there is no risk of confusion. )
The "classical limit" of the expression for the entropy The above inequality is a consequence of the following
is obtained. by the usual prescription (we first consider inequality for matrix elements: let be a convex (con-f
the case of one degree of freedom only) cave) function, A be a self-adjoint operator and y be a,
normed vector. Then (Pl f(A)l@&- (-)f((plAIQ&). For
density matrix- probability distribution in phase
the proof let us (for the sake of simplicity only) assume
that A. has a pure point spectrum: A =Q o.»lk&(kl. Then
trace— dP dg' (@If(A)l P& =Q le»l'f(~») (-) -
27TS f(P Ic»l'o'»)=f(&II Alt &).
This can be justified mathematically by means of gghex- For many density matrices, the error due to the re-
ent states. placement of &p, qlPlnplp, q& by p" lnP" will be negligibly
Coherent states were introduced by Schrodinger in small. It turns out that the classical approximation is
1927. [A detailed treatment is presented in the book of good as long as P" ( p, q) is a smooth function spread over
Klauder and Sudarshan (1968).] They are functions of the
form U(p~ q)ID~ 0& =— p~ q&~ U(p, q) =—e~' " I »@

p, q =numbers, P, Q=momentumorpositionoperator,
'; a volume in phase space that is»h(Wehrl, 1977). If
there are small distance fluctuations or if p" is con-
centrated on small regions of phase space, then the
spectively; ID, 0& = the wave function in configuration space classical approximation ean be very bad. (For an esti-
't h ' e * ". We have (p, qlP(Q)lp, q&=p(q). The
mate of this error in typical situations, cf. Sec. C.)
Ip, q) are Gaussian wave packets with minimal uncer- There is a striking paradox since quantum-mechani-
tainty. cally one always has S(p) ~ 0, because S(P) = —gp» lnp»,
One can prove the following important relation: and, since, p» ~ D, pp» =1, p» &1, one has p»lnp» ~ 0—
(=0 if, and only if, p» =0 or 1; hence if one p» = 1, all the
&p, qlAlhq&. others must be =0: therefore p is a one-dimensional
projection, i..e. , a pure state). Thus S(p) ~ 0 (ef. , also
One should bear in mind that the p, q& are normed but See. II.A). The conventional classical entropy, however,
may verywell be &0, even —~, in spite of the inequality

not pairwise orthogonal, in fact, using the abbreviation

z =(q+ip)/(28)'t', Iz) —
= Ip, q& one finds 8 ~S'. How can this happen? The reason is that usually
the classical entropy is introduced in a less critical
(z lz') = exp[--,'lz I'+ z*z' ——,'Iz'I') . manner. Namely, it is defined'by every probability dis-
For 8'- 0 the wave packets Ip, q) become more and more tribution f
( p, q) (i. e. , every function with ~ 0, f
J = 1), f
concentrated around (p, q), in the sense that (p, ql(P no matter whether there is a density matrix p such that
—p)'I p, q) 0 and (p, ql (Q —q)' p, q& 0. It is possible to - f(P, q) = p" (P, q) or not. Thus one does not suppose that
incorporate a factor ~ in the definition of coherent
& 1 and, consequently, the "classical entropy"
states: let
/» dp dq
I0, 0&=(&u/artcc!

Ip, q&
= exp(i/~)(~ "pe —~"qP)Io, D&. can become negative (see Fig. 1).
Our following considerations are equally valid for these Suppose that S( f) &0. Because J(dpdq/h) =1, the ex- f
kinds of coherent states. tent of the region, where f
&1, must be &h. Hence a
negative classical entropy arises if one tries to localize
If one defines
p" (p, q) =&p, qlplp, a particle in phase space in a region &Pg, i.e. , if the un-
q& (1.2) certainty relation is violated. Therefore in applying the.
as the classical probability distribution in phase space, conventional classical expression one has to keep in
dP d9'
(p q)=TrP=I

S= — dp —
&p, qlPlnPlp, q&.
The classical approximation consists in replacing
(p, qlP»PI p, q& by P' »P"':
0 I

(1.4) -xlnx
Since —xlnx is a concave function, —(p, qlplnplp, q) FIG. 1 Graph of f (x) = —xl~.
Alfred Wehrl: General properties of entropy 225

mind that not every classical probability distribution can rises if all density matrices under consideration in a
be observed in nature. certain problem commute. In this case, there exists a
Although the conventional classical entropy has some common set of eigenvectors ~i) such that pl ~(i) =PI l)i)
other inconvenient features (for instance, it is not (o labels the density matrices) and S(p") = —+p& lnp;
monotonic as the "true" classical entropy is; cf. Sec. This shows that every general theorem that is true in
II.F), we will nevertheless, for the rest of this paper, quantum mechanics also must be true in the classical
always understand by "classical entropy" the conven- discrete case, or, vice versa, if a theorem is not true
tional one, if not otherwise stated, in order to avoid any in the classical discrete case, it also cannot be true in
confusion. the general quantum-mechanical case.
At this place it should also be remarked —
which, of
course, is well known to everybody —
that in purely 5. Hilbert spaces for statistical mechanics
classical reasoning the expression for the entropy can
only be derived up to an additive constant. For dpdq Although the expression for the entropy does not refer
has the dimension of an action, hence in order to obtain to any special structure of a system, there are some
something dimensionless in the normalization condition particular features of many-body systems. Let me. be-
"f p" =1" one has to divide dP dq by some quantity of the gin. with a short review of the Hilbert spaces of those
dimension of an action (=volume of a "unit cell" ). The systems that are of primary interest in statistical me-
right quantity, as we have just seen, is Planck's con- chanics. For a careful presentation, see Huelle, 1969.
stant h (not h! ). If one takes some other quantity, say
h' (in a classical theory h cannot occur), we obtain the One-particle systems
normalization condition J(dpdq/h' )p"' = 1 and for the
The Hilbert space of a particle moving in a subvolume
entropy V of R~ (d = space dimension) is L'(V) = space of square-
P 9 Cl integrable functions g(x) (x e V). Here and throughout
S Cl the rest of this paper we neglect spin since our treat-
dP dq p' p' ment will be a nonrelativistic one only.
ln —ln
= correct classical entropy -ln —p' =— p gg'
Many-particle systems (Maxwell Boltzman-n statistics)
Here the Hilbert space is the tensor product of N
If, in particular, p = const in a certain region ("phase copies of L'(R~), thus the particles are supposed to be
volume" ) of the (P, q) plane, otherwise =0, then S' distinguishable. Since in nature there are only very few
=logarithm of the phase volume measured in units of distinguishable particles, Maxwell-Boltzmann statistic s
gg =logarithm
of the number of "cells. If the size of is not very well suited for purposes of statistical mech-
the cells is changed, then of course the expression for anic s.
the entropy changes too. (In classical statistical mech-
anics this problem is partly overcome by the ad hoc 8ose-Ei nstei n statistics
postulate of the third law of thermodynamics. ) The Hilbert space of N identical particles obeying B-E
statistics is the symmetric tensor product of N
4. The classical discrete approximation copies of L'(V):
In approximating the expression for the entropy one
can go one step further and discretize the classical pro- H„'(V) = I'(V) . . S L'(V)
bability distribution p". That means that one partitions
the phase space into cells of size h (enumerated by which equals space of square-integrable functions
some index i) and replaces p~~ in each cell by its aver- g(x„. . . , x„) (x, z V), that are symmetric in x„...,x .
age, which we will denote by p, , i.e. ,
dP dq Fermi-Dirac statisitcs
11 Like the 8-E case, but "symmetric" being replaced
Then Qp, =1. The classical discrete entropy is defined
- by
S"' =—
g p, lnp, .
Fock space
Because of the inequality
If the number of particles is not kept fixed but if one
x(lnx —Iny) ~ x —y,
rather wants to take into account the possibility of a
one obtains variable number of particles, one considers Fock space

H'(V)= e H„'(V).
Like the classical (continuous) approximation, the class-
ical discrete approximation may be sufficiently good for [Ho =
—C (one-dimensional space = vacuum). ] If the mea-
many purposes. sure of the intersection of two volumina V, and V, is
It should be noted that the same formal structure a- zero (by abuse of language we will always write V, A V,

Alfred Wehrl: General properties of entropy

= p in that case), then P~ .. ~ d QN

H'{v, u v, ) = H'(v, ) 8(v, ).
but rather
Lat/tce systems
d Pg . . .d @N
Mainly for the study of models (such as lattice gases, P 3NNt
ferromagnets, binary alloys, etc. ) one is interested in
lattice systems. 'There, to each point x of a lattice Z in order to correct for the fact that, for instance, the
of dimension d, one a.ssigns a. Hilbert space H„of fixed points in phase space (p„q„p„q2, . . . ) and
finite dimension. Let V( Z". be a subset. Then (p„q„p„q„.. . ) cannot be distinguished.
H(V)= 8 H„.
6. The generalized Boltzmann-G ibbs-Shannon entropy
Again, H (V, u V, ) =-H(V, ) 8(V, ) if V, n V, = y.
Let me conclude this section with a remark on a pure-
Several spec/ es of particles ly formal level. From the mathematical point of view
the expression for the cia, ssieal entropy ean be con-
We will not consider this ease since it can be treated sidered as a special ease of the so-called "generalized
in an obvious manner once the results for identical par- Boltzmann —Gibbs-Shannon (BGS)" entropy (cf. , Ochs,
ticles are established. 1976). Its definition is: Let (0, A, p, ) be a o-finite mea-
sure space, v be a probability measure that is absolutely
After having sketched the various types of Hilbert
continuous with respect to p, (hence its Radon —Nikodym
spaces that we are interested in let us discuss some as-
derivative dv/dg exists). Then the generalized BGS en-
pects of Eq. (1.1) for the entropy.
tropy is
Independent particles dV
dV . —
d p, (if
is integrable).
Consider N identical particles (fermions or bosons). dp,
Let p be the one-particle density matrix. The N-par-
ticle density matrix might be expected to be p(3 p(3- - ~

Important examples are the following ones:

(3p. However, the trace of this operator, restricted to
the Hilbert space HN or H„, is not =1. Take, for in-
sta, nee, N = 2. Then 1. Boltzmann-Gihhs (Classical) Entropy:
'Trp'{3 p = ~(1 —Trp') for fermions. dp, = d p d'"q/h~~ (or /hs~N t, respectively),
(1.10) ~ ~ ~

2(1+ Trp2) for bosons . 4V= P dP,

If Trp' « I [and, consequently, S(p)»0], Trp8 p= —,',
and, similarly, for N particles, Trp8. . . 8p= l./N!.
2. Shannon Entropy of Information Theory:
The entropy is then = TrN! (p8 —
8p) ln(N! p8
~ ~

lnN! +N—S(p) (The nece. ssity of subtracting a term

—lnN! was first demonstrated by Gibbs' Paradox. )
8p) ~ ~

-], ! Kli) = I 82]) = =I, ((']) = P;.
Let us illustrate this by the simple example of a
"microcanonical" density matrix p =(1/!V)P (P=projec- S= -Q P, lnP, .
tion of dimension W and W»N). Here S= logarithm of
the number of microstates =in(~) for fermions,
=ln( '~~ ') for bosons. In either case S= —lnN! +NlnW.
3. Relative Entropy. dg =cd'"tpd'"q/h'"(Nt, d~
= pd
'Tge term -lnN! that appears in these calculations can P dsq/h'(N! ), cr, p being probability distributions. In
be derived from a rule known a, s correct Boltzmann
this case the generalized BGS entropy is — p (lnp
—Ino)d p d'q/ha~(N!). With the +sign in front, this
counting: microstates of the type, say, ~1) 8 ~2) 8 -- ~
quantity is called "relative entropy" and plays an im-
and ~2) 8 j 1) 8 ~
are to be identified (which clearly is a

portant role as we will see very soon. [In quantum

consequence of the identity of the particles), whereas
the contribution of the states of the form ~1) 8 ~1) 8- ~ ~
mechanics, one defines the relative entropy between
two density matrices o, p as S(o~ p) = Trp(lnp —1no). We
can be neglected. -.
will study this concept in detail in Sec. III.B.]
Similar statements, of course, hold for the classical
4. Itenyi's Information Gain. This is a discrete ver-
approximation. Up to now we have considered one par-
ticle and one degree of freedom only. If there are d
sion of relative entropy. O=(1, 2, . . ) as in example 2, .
but p((i)) =q; instead of l, gq, = l. Then S= — gp,. (1np,.
degrees of freedom, the classical probability distribu- —lnq, ). (See Renyi, 1966.)
tion p" (p, q) is obtained in a straightforward manner.
However, for N identical particles one must not ei- — The most general concept in this direction is the Segal
ther in the normalization condition or in the expression entropy (Segal, 1960). It covers both the classical gen-

for the classical entropy take the integral (we have eralized BGS entropy and quantum-mechanical entropy.
put d=3) (cf. Sec. IV.C.)

Alfred Wehrl: General properties ot entropy 227

B. Entropy and physics system in which the original number of degrees of free-
The relation between entropy and physics is established dom, 10", is reduced to 6.
The Boltzmann equation is by no means an immediate
by an empirical principle, namely, the second law of
thermodynamics. There are several formulations of this consequence of the laws of classical mechanics, i. e. ,
the Hamiltonian equations. Rather it is based on several
law, of varying degrees of validity, which we will. now
assumptions, such as, for instance, the molecular
briefly discuss. However, as mentioned in the intro-
duction, the problem of the second law of thermodynam- chaos, or the "Stosszahlansatz, "andupon the fact that one
ics does not appear to be fully understood yet. considers the one-particle correlation function only, in-
stead of taking the whole probability distribution in phase
't. A paradox
space. It turns out that, although the time evolution of
the total system is given by the Hamiltonian dynamics,
A very common formulation of the second law of ther- under certain conditions the time evolution of the first
modynamics reads as follows: the entropy of a closed correlation function can be described, in fairly good ap-
system never decreases; it cari only remain constant or proximation, by an irreversible equation.
increase. A less sharp formulation is the following The correlation function of a system of X identical
(maximum entropy principle): the entropy of a closed particles (with mass = 1) is defined by
system in equilibrium always takes the maximal possible of particles in the
value. (Of course, both formulations are a little bit
E(P, q, t)d'p tf'q =number
vague and have to be specified in concrete instances. ) volume d'p d'q at time t
These statements are, however, in striking contra- [hence (I/N)E =probability of finding one particie in d'p
diction to the fact that the entropy of a system obeying d'q, irrespective of where the others are]. From the
the Schrodinger equation (with a time-independent Ham-
iltonian) afways remains constant. For the density ma-
— —
trix at time f let us denote it by p(t) is obtained from Ig P d q=N.
the density matrix at time 0 p by the formula

e»ttt e»Ht
E is obtained from p" (P„q„.. . ) by the formula
p(f) (1.12)
E(p, q, t)=
d p'2d $2'''d pg& Qg
Since e»"t is a unitary operator, the eigenvalues of p(t) I»' (N —1))
are the same as the eigenvalues of p. But the expres-
sion for the entropy only involves the eigenvalues of the Xp cl ((f'~q~p2~q2) ~ ~ ' ~i)'
density matrix, hence S(p(t))=S(p). (In the classical
(Because of the symmetry of p the exceptional position
case, the analogous statement is a consequence of I iou- of the first particle is only fictitious. )
ville' s theorem. )
The assumption of molecular chaos states that the
This result seems to be absurd since one knows by ex- number of pairs of particles in the element d'q in con-
perimental experience that the second law is something
figuration space, with momenta in 4'p„or d'p„respec-
very sensible and very useful. There is one way out of
tively, equals [E(P„q, t)d'p, d'q][E(P „q, t)d'P, d'q].
this dilemma; that is, that the time evolution of a sys- From it one derives (we consider the simplest case:
tem is not described by the Schrodinger equation but by no external forces, no internal degrees of freedom, etc. )
some other equation. In fact, in statistical mechanics
the Boltzmann equation
one uses, with great success, equations like the Boltz-
mann equation, the master equation, and other equa-
tions. +p&&, ~&= d d'P2 «Pi -P2
2. The Boltzmann equation
where v(Q) is the differential cross section for a collision
To begin with, et us look at the classical Boltzmann
(iI„P,) —(p'„P,') (0= solid angle), E, = E(p„q, t), E-',
equation (Boltzmann, 1872). In historical development, = E(p'„q, i)(i = i, 2).
this equation was the first one to describe an irrever- The Boltzmann equation implies the Theorem the 0
sible behavior of a system in a rigorous way. Yet this function
equation is still the best known to most physicists. Many
of its features are characteristic of all equations that
aim at overcoming the difficulty that microscopic de-
scription and irreversibility do not fit together. (See
H(t) = —
I d't d'q I" lnF (1.17)

the article by Grad, 1958, or Cohen and Thirring, Eds. , is nondecreasing in time. The following remarks apply:
Tke &oltzmann Equation, 1972. Of course, the Boltz- (1) H, as defined by Eq. (1.17), does not coincide with
mann equation is also discussed in all textbooks on stat- the classical entropy in general. This is the case only
istical mechanics. ) Perhaps the reader should be warned if p" factorizes: p" (w„w„. . . , w~) =~(p", (w, ) ' ' ' p (w&)
that although usual macroscopic equations, such as the —(p„q»)), we then have
(w, =
Navier-Stokes equations, can be derived from the Boltz- Sc' =MS(po')—
mann equation by means of further approximations, the (1.18)
Boltzmann equation also has to be considered as a ypgac- (cf. Sec. A). Otherwise, H & 8 (cf. Sec. ILF).~

xoscopic equation because it provides a description of the (2) The correlation function E is obtained from the

228 Alfred Wehrl: General properties of entropy

"true" distribution in phase space by some sort of aver- @= 0 and therefore we are allowed to choose the energy
aging. shell infinitesimally thin. Of course, here "energy
(3) The assumption of molecular chaos cannot be jus- shell" no longer means a subspace of Hilbert space
tified from first principles. It may be probable to a L'(R' ), but rather a subset of R' . We will denote the
more or less high extent, but it certainly is neither nec- classical energy shell by Qz (or simply 0):
essary nor true for all time.
From our discussion up to now we have learned that the (1.20)
mechanism of nondecrease of entropy is based upon aver- [H(. . . ) = classical Hamiltonian]. In the following we want
aging and probability assumptions. (We will recognize to use the abbreviation m=— (p„. . . , q~).
this in a somewhat clearer fashion in the example of the The restriction of the measure d'p, ' ' 'd'qN/N! (as
"master equation. ") However, it should be mentioned always we a.ssume the particles to be identical) to the
that there is a rigoxous derivationof the Boltzmann equa- energy shell Q~, formally given by
tion (but only for small times) for a. gas of hard spheres
of diameter d in the limit d-0, n'd' kept fixed, where
5(E H(p„-. . . , q„))d'p, "d'q, /~!, (l. 21)
n = number of particles/cm', by Lanford (1975). In this defines a measure dw. (For a more precise definition
limit, the system consists of infinitely many particles. see, for example, Reed and Simon (1972) or Arnold and
This is one of the hints that rigorous versions of irre- Avez (1969), and other textbooks on ergodic theory. ) By
versibil. ity, and quite generally thermodynamical be- virtueof I iouville's theorem this measure is time in-
havior, are to be expected for infinite systems (and pos- variant, i.e. , dm=dze(t). Let me denote by W(Q) (or
sibly for a restricted class of initial states) only [cf. Sec. simply W) the measure of all of 0, by W(A) the measure
!v.c). of a subset A( Q.
Let us return to finite systems and proceed by discus- In classical statistical mechanics the concept of ergo-
sing ergodicity and mixing properties of classical sys- dicity has been introduced by Boltzmann in order to jus-
tems. tify the microcanonical ensemble. A "rnicrocanonical
ensemble" means a uniform probability distribution over
the energy shell, i. e. ,
3. Ergodtclty ancl fnIx)ng
I want to start with the concept of energy shell. I et (1.22)
p(t) be the time evolution of a density matrix p, i. e. ,
p(t) = e '"'p e' ', and let ~n) be the eigenvectors of the [We write p, instead of the more precise notation p"„
Hamiltonian, thus H~n) =E„~n). Then the matrix ele- recalling our remark following Eq. (1.5)]. However,
ments of p(t) are ergodicity certainly is too weak a property to establish
that every probability distribution tends (at least in a.
(l. 19) certain sense) to the microcanonical one. Therefore one
We may classify them as follows: has to introduce a stronger notion: mixing. (This con-
(a) Matrix elements that change in a significant way cept is due to Hopf, 1932.)
only during macroscopic time intervals (say, 10 ' sec). A system is called "mixing" if the following is true:

They are connected with extremely small (unmeasurable) let A, be the time evolution of a subset Ac: 0 (i. e. , A,
energy differences E —E„(&10 erg). " =1m(t): m(0) c-A]). Then, for any two sets A, B c: Q, al-
(b) If the difference E —E„ is bigger, then (p(t))„ is a. way s
very rapidly oscillating function of t. Since macroscopic
measurements last rather long compared with the fre- lim W(A, A B) = W(A) B) (1.23)
quency of these oscillations, they in fact will average t 0
over (p(t))„. The mean value of these matrix elements
being of order of magnitude I/(E —E„)&t (&t =period of Ergodicity only would state that
the measurement), one can neglect them, or, expressed
in other terms, those fluctuations are too rapid to be lim —
W(A, . 0 B)dt' = W(A) W(B)
Now let E = Trp(t)H be the expectation value of the en- There is no direction of time favored in this definition,
ergy. Of course, this is a constant of motion since which perhaps is not easy to recognize at first, be-
Trp(t)H = Tre '"'p e'"'H = Tre-'"'pH e'"' = TrpB. On the cause
other hand, E =Z p(t)„„E„. Due to our foregoing consid- lim W(A, 9 B) = lim W(A 0 B,) = lim W(A 8 B,)
erations for a description of macroscopic changes of the
system, one only has to take into account the matrix ele- W(A) W(B)
ments of class (a). Hence we can restrict ourselves to W(n)
that subspace of the Hilbert space that is spanned by
those energy eigenvectors ~n) for which ~E —E„~ & f with One can think of such a system as a flow with strongly
e being sufficiently small. We will cal. l this Hilbert turbulent aspects. After sufficiently large times every
space the energy shell. (In our considerations we always set A is so ragged that its relative portion in every fixed
have assumed that the Hamiltonian has a pure point spec- partB of 0 is just W(A)/W(Q) (see Fig. 2). It should be kept
trum. However, it is a simple matter to generalize our in mind, however, that suchabehavior hasnothing to do
arguments to Hamiltonians with a continuous spectrum. )
In the classical case (cf. Sec. A) we formally can put
with irreversibility. „
Given A. for I;& 0, one can recon-
struct A. for t = 0. In fact, this sometimes even canbe done

Rev. Mod. Phys. , Vol. 50, No. 2, April 1S78

p lnp dw « lnW(Q) . (l. 29)

For the proof we utilize the concavity of s(x)=— —x lnx. It
implies that s (y) —s(x) «s'(x)(y —x), hence y (lny —lnx)
~ y —x. Putting y = p(w), x = 1/W(Q) one finds —p(M ) lnp(w)
« —p(w) in[1/W(Q)] + [1/W(Q) —p(w)], and, after integra. —
tion, S(p) « lnW(Q). If one inserts another distribution
FIG. 2 Time evolution of the set A . o'(w) instead of 1/W(Q), one arrives at the inequality for
the relative entropy J p(lnp —Incr) ~ 0.
experimentally, for instance, inthe spinechoexperiment
(cf. Blau, 1959; Mayer, 1961). 4. The master equation
Of course, for a mixing system the entropy also re-
mains constant. Nevertheless such a system gives the The numbers I', asintroduced above may be interpreted
as the probability of finding the system in the cell i.
impression that if one does not look at it in a highly ac-
curate manner, every set A after a certain time appears They do not obey simple differential equations; in order
to be distributed uniformly over Q. To make these feel- to compute P, (t &0) it is not sufficient to know all P&(t = 0)
and perhaps their derivatives of low order.
ings precise, let us divide Q into "cells" (not to be con-
fused with the cells of section A) of finite size: Q= Q, Under some simplifying assumptions, however, it i s
possible to derive a simple differential equation (Pauli,
U Q, U (Q, R Q~ = @ if tW g). The idea behind this is
~ ~ ~

that macroscopic measuring apparatuses have only a 1928), which, of course, is of restricted validity but may
restricted precision are not able to distinguish be-
be suitable for practical purposes.
tween points inside one cell. Then also p(w) cannot be
Take a distribution that is constant in cell i and = 0 in
measured by them exactly, but rather only its mean val-
all other cells. By Hamilton's equations one obtains
ue over the cells. (Of course, there is a certain arbi-
from it the density distribution at time t: p(t), and the
trariness with this concept because there is no canoni- probabilities Pz(t). If p(t = 0) were concentrated in the
cal way of defining these cells. ) I et us define the cell -i, but not constant, one would obtain another density
distribution p(t & 0) and other probabilities P,. (t). Now
coarse-grained density as follows:
if the cells are not too small one can find arguments that
in the overwhelming majority of possible cases Pz(t)
p„(w) = p(w') dM' (1.25)
o =P&(t). Starting with arbitrary distributions p(t = 0), no
if wc Q, . (That coarse-graining is essential in statisti- longer necessarily concentrated within one cell, one con-
cludes that ' almost always" P&(t) can be calculated from
cal mechanics was. first pointed out by the Ehrenfests,
1911.) This corresponds to replacing p by a, distribution the P, (0):
that is uniform inside the cells. As discussed before,
one cannot distinguish p and p„by macroscopic mea- (l. 30)
surements. The coarse-grained entropy is
On the other hand, p(t+ t') can be calculated from p(t),
S„(p) = S (p, ) = — g P, lnP, /W(Q, ),
hence, similarly,

PJ(t+t') =p T„(t')P, (t) ~

P(= p zo dt's.
Gg For simplicity and mathematical convenience one may
Of course, S„~S, since we have lost information. (A impose the Markov property on the T's:
proof is easily obtained by means of the inequality be-
low. ) T„(t+t') =P T„(t')T„,(t) (1.31)
Now mixing implies that
(Chapman —Kolmogorov equation). The differential form
W(Q, ) of this equation is obtained by inserting t' =dt. Then
lim p(w(t)) dw = WQ (l. 2V)
T»(dt) must be of the form

because this is true for distributions of the form p(w)

1 —dt W, + dt's~
=)t&(w)/W(A) [~(w) = characteristic function of A], hence 6~
for their convex combinations and eventually for the lim-
'The invariance properties of the Hamiltonian equations
its of them, i. e. , all density distributions. Therefore the
. caorse-grained density tends towards the microcanonical imply that W»W(Q~) = W~&W(Q&) (microscopic reversibil-
one: ity, detailed balancing). Also T»~ 0 in order that P&(t)
~ 0 if P, (0) ~ 0, hence W»~ 0. (Note that the diagonal
p„(w(t)) —W (1.26) terms W» cancel in the above formula. ) We thus arrive
at the (classical) master equation
and S, -lnW(Q). Note that the convergence need not be
monotonic. P~ = Q (Wy/a —Wag P~) . (l. 32)
Here lnW(Q) is the maximal possible value of the en-
tropy: From it one can derive various macroscopic or pheno-

230 Alfred Wehrl: General properties of entropy

menological equations for which I want to refer to the ways many uncontrollable perturbations that will have.
literature only. (For a good bibliography, see Heif, the effect that the "true" dynamics of the system is
1965.) spoiled and that the time evolution of a point in phase
The considerations presented above are not intended space no longer obeys the Hamiltonian equations but
to give any "proof" of the second law of thermodynamics. rather behaves in a stochastic manner.
I rather wanted to draw attention to those assumptions
Properties of the solutions of the master equation were
that are necessary in order to "produce" an irreversible
first discussed by von Mises (1931), Frechet (1938), and
behavior, or to derive equations predicting approach to Feller (1950), just to mention the earliest treatments.
equilibrium. Let me single out once more the main fea- In particular the master equation implies a monotone in-
tures: crease (or nondecrease) of the coarse-grained entropy
(1) Some averaging procedure is needed. Concerning
the Boltzmann equation, it consisted in considering the S P) '
w(Q, )
first correlation function instead of the complete dis-
tribution in phase space. For closed systems, this leads A more general result in this direction is that for density
to a nonlinear equation. For open systems in free space matrices p(t), whose time evolution is given by a dynam-
(which we did not discuss) this leads to a linear trans- ical semigroup, the relative entropy S(p, ~p(t)) de-
port equation. The treatment of open systems in bound- creases. [See Eq. (1.41) below. p, = the stationary state. ]
ed regions (systems in a heat bath) leads to master equa- The entropy production
tions of the type mentioned above or, more generally, to
dynamical semigroups (Kossa~owski, 1972; Gorini, S( po p(t))

Frigerio, Verri, Kossakowski, and Sudarshan,

Davies, 197&). then is positive and S(p, p(t)) is convex (Spohn, 1977). (The

In our last discussion, we introduced coar se- graining. latter fact relies on Lieb's theorem; cf. Sec. III. ) For the
This may seem to be somewhat artificial. However, proof of our first statement, let us, for simplicity, as-
after all, one can regard it in another way. Suppose we sume that all cells have the same size, W(Q, ) = &o. Then
are dealing with a system consisting of two subsystems,
the phase space of one of them being discrete, Q,
=(1, 2, . . . j, the pha. se space of the second one being con-
S„=in~ —
g P, lnP, .
tinuous, and the phase space of the whole system being Now we arrange the P, in decreasing order: P, ~ P,
0 Qj + Q2 In addition, assume that the composite sy s- ~ ~ ~ ~
. For the sum of the biggest nP&'s we find
tem is mixing. Let p= p(i, go)(i c: Q„ave Q, ) be a prob- n n
ability distribution. The corresponding density distri-
bution of the two subsystems obviously is to be taken as dt 4 ~
P, =g
g (W„P„-W„, P, )- 0.

Therefore, the sum of the biggest nP~'s at time t, is

p, (i) = dn p(i, xo) ~ the sum of the biggest nP, 's at time t, &t, . (Note that
(l. 33) the indices of the P& may change with time. )
p, (zo) =Q p(i,
(cf. Sec. ILF for this concept); so that the entropy of the
Now we use the lemma: Consider two decreasing se-
quences of numbers a, » a,
Z n, =ZP;= 1, for which the
p, » p, »
following relations hold:
such that .
first subsystem is n, & P„n, + n, & P, +P„.. . , n, +n, ~ ~ ~ ~ + n„& P,
—Q p, (i) lnp, (i), proof is based on a discrete version of the inequality for
the relative entropy of Sec. A- ZP, (lnP& —inn, )» 0. By
which is just the coarse-grained entropy with respect assumption, Z n, inn, = a, (inn, —lna, ) + (n, + n, )(lna,
to the partition (Q, )(Q& —
—(i]-&& Q, ), except for the term —inn, ) + (n, + n, + n, )(inn, —lnn4) + ~ ~ ~ ~ P, (inn, —lna, )
lnW(Q, ). Thus the entropy of the first system approaches + (J3, + P, ) (inn, —inn, ) + ~ ~ ~, i. e. , Z a, Inn& ~ Z 48, inn&
its maximal value. We will see later on (in detail in Sec. ZP~, lnP,
IV. C) that exactly this mechanism is responsible for the Thus, as far as the master equation is concerned, not
possible "increase" of entropy in quantum systems. only does the entropy never decrease, but also the sum
(2) In order to achieve approach to equilibrium, some of the n biggest eigenvalues never increases. This prop-
ergodicity properties are definitely needed, and one can erty is referred to as mixing enhancing (cf. Se-c II. C. .
expect that the better the ergodicity properties of the The English translation of Uhlmann's original appela-
system are, the more an arbitrary density distribution tion "mischungsverstarkend" is due to C. Fellbaum). Its
will tend to a stable one. meaning is that a la longue the large P's will get smaller, the
(3) The derivations of the Boltzmann equation or the small P' s will get larger, until eventually all P 's are equal.
master equation depend on randomness assumptions that It turns out that mixing-enhancement not only implies that
are supposed to hold at any time (molecular chaos, or entropy does not decrease but also that, for any concave
replacing p(t) by a distribution that is uniform inside (or convex, respectively) function f, Zf(P, ) is nonde-
the cells, respectively). These assumptions may be like- creasing (or nonincreasing, respectively). This result
ly to hold, but certainly cannot be proven at all. Never-
theless one can say that these equations are the best one
can expect because in a realistic situation there are al-
follows from a simple modification of our last proof. Let
f be concave. Then f(P&) f(n&)» (Pq —nq)f'(Pq) +~gf'V
= P, (f'(&, ) —.f'(P. )) + (&, +P,)(f'(&,) —f'(&.))+
ni(f'(&i) "- q).
Rev. tVlod. Phys. , Vol. 50, No. 2, Aprit 1978
—f'(tl, ))+ (n, + n, )(f'(tl, ) f'—

(P, )) + ~ =Q n, f'(P, ) [remem-
~ ~
i. e. , p, =0 or 1. This shows that Eq. (1.35) can only hold
ber that f' is decreasing, because is concave, i. e. , f if the Hilbert space is one-dimensional, which, of
(f'(P, ) —f'(P, )) - 0]. Hence Z [f(P, ) —f(n )] 0 (cf. also course, is of no interest at all because then a nontrivial
Hardy, Littlewood, and Polya, 1934; Polya, , 1950). time evolution does not exist. (A similar consideration
There is a close connection between mixing-enhance- applies to the quantum analog of ergodicity. )
ment and nondecrease of entropy. Suppose that there is Nevertheless it is not excluded that for certain projec-
a linear connection between P, (t) and P, (0) (not necessar- tions Q (or even for certain partitions Q, , by which is
ily originating from a master equation): meant a family of pairwise orthogonal projections with
ZQ, = 1), one always has
P, (t) =EM„P, (0). &u(PQ, ) —(u (P ) (o(Q, ),
If, for all probability distributions, — ZP&(t) lnP, (t) at least, if 8' is large
enough, with arbitrarily good ac-
~ -QP, (0) lnP, (0), then the mapping (P, (0))-(P, (t)) is curacy.
mixing-enhancing (Uhlmann, 1S77). Let me sketch a An early result in this direction, referring to ergodi-
simplified version of the proof, and thus suppose that the city rather than to mixing, however, was obtained by
probability distribution is finite: (P„.. . , P„). The matrix von Neumann, 1929 ("Proof of the ergodic theorem and
M, » must be stochastic, i.e. , P, M;» = 1 and M;» ~ 0 in order the H-theorem in the new mechanics" ) under certain
to guarantee that(P, (t), . . . , P„(t))is aprobability distribu- assumptions on the spectrum of H (no degeneracies, no
tion. On the other hand, for P, (0) = 1/~ (for all i), also resonances), in the limit W- ~ and TrQ~/N kept fixed,
P, (t) = I/n, because otherwise ZP, (t-) lnP, (t) would be for "almost all" partitions Q„
strictly (1nn. From this, one finds that Z»M, » = l. 1
Therefore M is a doubly stochastic matrix, and, byBirkh- lim — (o(P, , Q, )dt' = (u (P)&u (Q, ) .

off's theorem (see, for example, Glasman and Gubich,

1S69), a convex combination of permutation matrices. It is very remarkable that von Neumann's paper ap-
This immediately implies mixing-enhancement. peared even before modern classical ergodic theory was
If we try to adapt our previous considerations to quan- initiated. The latter started in 1931 only with the work
tum mechanics, we are immediately faced with the prob- of Koopman, von Neumann, and Birkhoff.
lem that a perfect analogy cannot exist. According to the If, for some partition Q&, ar(P, Q)- cu(P)m(Q), then,
usual. "dictionary" one would expect that one had to re- similarly to the classical case,
place Tr p, Q, —~(Q, ) (1.36)
subset of phase space by projection =e '"'pe'"'), and the (quantum-mechanical) coarse-
measure of a set by trace (= dimension of the grained entropy
s„(p) =-s(p„),
with (1.37)
density distribution by density matrix.
TrpQ g
Thus "mixing" in quantum mechanics should mean that, ' TrQ
for any two projections (in the finite-dimensional Hil-
bert space describing the "energy shell" ), say P and Q, tends to the maximal value In%'. (That this is in fact the
maximal value follows from Klein's inequality below. )
TrP, Q —T rP TrQ /W (1.34) Setting TrpQ, = P, —
= W, —
where P, =e' 'P e ' t (time evolution in the Heisenberg
picture) and W= dimension of the Hilbert space. With the S„(p) = —Q P, ln (1.36)
notation u(P) = TrP/W,
in close analogy to Eq. (1.26). Pictorially, the coarse-
(o(P,Q ) —(o(P)(u(Q ) . (1.35) grained density matrix p„arises from the true one by
cutting out "blocks" and then replacing every block by
We may even take ~ to be any arbitrary invariant state, a matrix which is a constant multiple of the unit matrix
i. e. , in a Hilbert space of arbitrary, possib-
ur( ~ ) = Trp ~
with the same trace (see Fig. 3). If we meet such a sit-
ly infinite, dimension and ~(P, ) = u(P) for all projections. uation, then an obvious modification of our previous
This implies that TrpP& — —Trpe~~~Pe ~ = Tr e tpe~~~P arguments will lead us to the quantum-mechanical mas-
= TrpP, and therefore, e ' 'p e' ~ = p. Writing p in the ter equation for the P, 's which of course looks like the
form p=Zp»l»(kl, Qp»=1), it would follow that

~(P,Q) =g P&I »e'"'P-e '"'Ql a&-

~ ~ ~ ~
The limit as t- should be equal to g 0
p X
=) cg

Z p, &~IP I» 2 t,
0 ~ ~
~ ~
&~ IQ l~&. 0

In particular, let P =Q = l&(l

l l. It follows that p', =p„ FIG. 3. Construction of the coarse-grained density matrix.

232 Alfred Wehrl: General properties of entropy

classical one. Also the same remarks apply as in the tropy of the beginning of this chapter, the density ma-
classical case concerning its validity. trix with maximal entropy is the most probable one.
In principle, however, all our previous statements (2) Under certain conditions to be discussed in the
about quantum-mechanical mixing are false on a formal next'section, the classical ensembles are equivalent in
basis because of the recurrence paradox ("Wiederkehr- the sense that in the thermodynamic limit they give the
einwand"). In our case it states that, if the Hamilton- same expressions for the intensive thermodynamical
ian has a discrete spectrum, the function Trp, Q is quantities. This shows that for large systems it is not
almost periodic in t. The way out is well known: the really necessary that they be in the state with maximal
time it would take until the system gets close to the possible entropy but that deviations from this state that
original state again is tremendously large for macro- are not too big do not change the thermodynamics. Thus
scopic systems and beyond any sensible imagination. "not too big" means that, for instance, a difference in
Thus, if t = ~ means something like "t = age of the uni- entropy to the maximal value of, say, order vN does
verse, things are certainly okay. To correct for our not at all matter and can be neglected. Before further
above considerations in a mathematically incontestable discussing the classical ensembles let me state some
way we have to deal with strictly infinite systems. ~e mathematical aspects of states with maximal entropy.
will do this in the last section only because of the mathe-
matical technicalities that are involved. At this point let 5. States with maximal entropy
me just mention that ergodicity and mixing make per-
We study the following problem: given E = TrpH (H
fect sense in the infin. ite case. being a fixed Hamiltonian), what does the density matrix
So fRI' R few I emalks (admittedly rather sllpel'flclR1)
with maximal entropy look, like'P The answer is well
have been given on the problem of approach to equilibrium. known: it is
For a more careful and detailed discussion I have to
refer the reader to the literature, as announced in the g0 =e»/Tre» -g 'e 6H

Z = Tre ~" (partition function)
In the rest of this section I would like to comment on
some properties of equilibrium states. (Gibbs state), where 8 is chosen such that Trcr&H =E.
It is often argued on philosophical grounds that the The proof is based on Klein's inequality (see, for ex-
microcanonical state is the equilibrium state (if the en-
tion. Then
ample, Ruelle, 1969): Let be a convex (concave) func-
ergy is fixed), because, after all, there is no physical
principle which would distinguish between the different
energy eigenstates of the energy shell and therefore any Tr[f(B) f(A)] - Tr(B A)f (A). (1.4O)
of them must occur with the same probability. However,
it is not obviously certain that this application of La- This inequality is rather powerful and we will frequently
place s principle of insufficient reason to physical sys- make use of it. I et us state some important special
tems is really legitimate; one definitely has to elaborate cases.
those Physical laws which are responsible for the validi- Take
(1)IAIDO. f (x) = —x lnx. Then
ty of this principle for real matter. TrA (lnA —lnB) ~ Tr (A —B) .
Equilibrium states, and only they, also enjoy remark-
able stability properties which roughly may be charac- If, in particular, A and B are density matrices p, o,
terized as follows: small local perturbations of the —T rp(lnp —inc)
then we find for the relative entropy S(o p) = l

dynamics (which, of course, never can be avoided) only

lead to a local, but not to a global, change of state. In s(alp) -0. (1.41)
contrast, if a state is a time-invariant but not an equili- (2) Let A be the "diagonal" of B. By this we mean:
brium state, an arbitrarily small perturbation may be let Q, be an orthonormal basis, and define A by
sufficient to produce a transition to an entirely different &@I &=&@IIBI@I&'I.. Then, fbei. ngconvex(concave),
state. Again, these phenomena can be described rigor-
ously in the infinite case only, so that we will come back Trf(A) Trf (B), (1.42)
to them in Sec. IV.D.
It is, of course, best known how the other classical i.e. ,
ensembles are obtained from the microcanonical one,
and we will consider them in somewhat more detail in f (&e, lB ly, &) -. T f(B)
the next section. What we will learn and what is of
relevance for the second law of thermodynamics are the (Peierls' inequality, Peierls, 1936). This also fol-
following facts: lows, of course, directly from the concavity inequality
(1) The classical ensembles obey the maximum entropy of Sec. A.
principle, i. e. , the density matrix has the biggest en- (3) Replacing B by A+B, A by A+&B), with &B) —= TrBe"/
tropy among all density matrices with the same pre- Tre, one obtains
scribed expectation value. In our presentation this TreA+B TreA+I& &B))eA+& J3)
principle appears a Poste~ioxi only. It does not seem to
& w Tr (B 0
be quite clear whether this principle can be justified x. e. ,
a Priori. Of course, there are arguments in favor of
it invoking I aplace's principle or using the fact that,
Tre&+& ) Tre&+ (1.43)
based on the interpretation of the formula for the en- (Peier1s-Bogoliubov inequality). The analogous classi-

Alfred Wehrl: General properties of entropy 233

cal inequality generally (take, for instance, a finite-dimensional Hil-

bert space, and let H=0); on the other hand, it is to be
expected for realistic systems. But there does not exist
(1.44) a satisfactory solution to this problem yet.

6. Some properties of Gibbs states

is called Jensen's inequality. We now have to supply the
proof of inequality (1.40). We recall the inequality a. S(op )is decreasingin P (orincreasingin the

&elf(B)l@& .
temperature j
This is a consequence of the following
f be concave
Hence, since convexity (concavity) implies (y) (x) f -f Lemma
with f(0) =0.
1 (Wehrl,
1974): Let (convex),
i (y —x) f'(x), for the eigenvectors @,. of A, belonging
to the eigenvalues &;,
S Sp.
This lemma is itself a consequence of the next one, as
we already have seen in our discussion of properties of
which f
is just Tr(B -A)f'(A). [Since is convex (con-
the solutions of the master equation.
cave), there exist at least both one-sided derivatives
f f
„', '„and the inequality is true for both of them. ] Lemma 2: The mappings p- f(p)/Trf(p) (for concave
Now we return to the Gibbs state. Suppose that T rpII is f) and -p
(for conv'ex f), all with f(0) =0,
~E, and that Tro zH = E. Then are mixing-enhancing in the sense that, for the eigen-
Trp lno& -g TrpH —ln Tre ~H,
values P, & P, & - of p, or P,' &P,' & . . . of f(p)/Trf(p),
~ ~

the following relations are true:

Tra& lno& —-P Tro&II —ln Tre ~H,
P]. P]. ) P j. P2 P]. P2)
and, by assumption,
P)+P2+. . . +P„~&P)+P2+. . . +p„). . .
—Troz lna& ~ —Trp lnoz,
in the first case, and ~ being replaced by ~ in the sec-
hence S(p) = Trp ln-p ~ -Trp ino'z (inequality 1.41) ond case.
~ —Troz 1no8 —S(oz).
In the classical case the same inequality for the en- Proof. We consider the first case only. For con-
tropy is obtained in a similar way, or directly by using cave fuctions f, with f(0) =0, if x ~ y, then yf (x)
& xf(y). Therefore
Lagrangian multiplier s. P„f(P ) &P f(P„) for m &n, and
One can look at our last inequalities in another way.
Define the free energy f(Pi)+ . . +f(P.)gP, -(P, + +P ) Pf(P, ),
r=1 r=Z
1 1
E(p, P, H) = TrpH ——S(p) = —S(o'q l p) +E, x.e. ,

(1.45) f(P )+f(P, )

f (P)PPP~f (

Then, for p=oe, E(p, p, H) is minimal (namely, =E). One (Remember that QP„=1.)
easily verifies the standa, rd relation

b. Kubo Martin Schwing-er (KM-SJ boundary condition

with T = 1/P.
Define the thermal expectation value of an observable
If IJis replacedby several operators, say, A, B, Q. . . ,
A by
then the sam'e argument as before shows: given num-
bers (A), (B), ( C), . . . then the entropy of any density (A) 8
TroaA. (1.47)
matrix p with TrpA. =(A). etc. . . , is ~S(cr), provided
Consider (AiB)g —TrogeiHAe 'e B=Z 8+
that a density matrix v of the form e""' ~ 'i'c'" / Tre
Tre""'" exists with TroA=(A) etc. (This need not be @=@ Trgye~ ~+'&&+/e ' ~ = Z Tr 8+/A '~~+'I &0

canonieal density matrix p, =e ~ e,

the case if the operators A, B, C, . . . do not commute. )
A well-known example in Pock space is the gra, nd-
which has
maximal entropy among all density matrices with given
expectation value of the energy and the particle number.
=Z 'Tre
= (BA„,~) 8 = (A, B)8
"Be" Ae "",
E(t) =(BA, )8, E(t +iP)
As long as we are dealing with finite-dimensional
Hilbert spaces, all operations we have performed are
legitimate. On infinite-dimensional Hilbert spaces one
(Of course, here n and P do not have the same meaning
has to worry about existence and analyticity questions
as above. ) (See also Bayer and Ochs, Ochs and Bayer,
but eventually arrives at the following:
At this place, the problem of the Third law should be %MS condition: There exists a function E(a) that is
mentioned: Does S-0 as P- ~'? This cannot be true analytic at least in the open strip 0(Imz &P and con-

Alfred Wehrl: General properties of entropy

tinuous in the closed strip 0 «Imz «P such that, for C. The classical ensembles
real t, F'(t) = (BA,) 8, F(t + i P) = (A, B)8. The most important kinds of density matrices to be
The KMS condition is of greater importance since it
considered in statistical mechanics are the classical
extends to infinite systems (where o8 does no longer
ensembles (see again Ruelle, 1969).
exist). It turns out that it entails far-reaching con- The microcanonieal ensemble is the density matrix
sequences for the structure of infinite systems (Kubo,
in the Hilbert space H „+(V) defined by
I 957; Martin and Schwinger, 1959; cf. the last section.
'The role of the KMS-condition in infinite systems was
realized by Haag, Hugenholtz, and Winnink, 196't. )
'xqE ,~q(~) .
—characteristic function
(lt!-~, ~ ~ of the interval [E
Finally it should be mentioned that there is an in-
teresting inequality between the quantum-mechanical
—e, E]), e = "thickness" of the energy shell). Here S
the mierocanonieal entropy, is

and the classical partition function, somewhat similar
to inequality (1.5) but much more powerful. If the Ham- S, = ln TrXL-~, ~ ) (H)
iltonian is of the form H = —P", P';/2+ V(x) then the -, =logarithm of the number of energy levels
quantum -mechanic al partition function i s
Z =Tre HH=e H~, between E —e and E.
The number e is undetermined to a, certain extent. In
whereas the classical partition function is
classical mechanics it can be chosen equal to zero
d pd' (cf. Sec. B; it certainly is not necessary here to dis-
Z"= q
p g p +V(q) =e
play the corresponding classical probability distribu-
tion p",); on the contrary, in quantum mechanics it is
(1.46) very convenient to choose e = E, thus
(we put m= 0 = 1 and, for simplicity, suppose Boltzmann
statistics; F' =classical free energy). Define the con- p,= e ~& Tr O(E —II) . (1.52)
volution (We suppose H~ 0.) It turns out that in fact under cer-
tain conditions it does not matter how big e is chosen,
& (9)= — 3N /2
d' 9'('(9')e
9(— P (9, —9()) as we will indicate below.
The canonical ensemble in HN3(V) is given by

3N (1.49) (1.53)
cr8 being the Gibbs state at inverse temperature p. The
and Z'„', F~c) as Z", F"
above, but with V(q) being re- entropy is
placed by V„(q). Then, for all (d,
S, = i3(E —F)
Z (dcl «Zcl
IEq. (1.46)].
or (1.50) 'The grand-canonical ensemble is defined in Fock
sPace H~(V) by
pc «p «~c
—.Hp V —H H
The upper bound for Z relies on the Golden-Thompson gC

inequality (Golden, 1965; Thompson, 1965): Tre~'~ (n = pt). , p. = chemical potential, p=pressure. )
& Tre"e . Inserting for A =kinetic energy, B= V(x), one There are also other kinds of ensembles that are
arrives at sometimes of use in physics (for instance, the "pres-
2 sure ensemble": Lewis and Siegert, 1956. If there are
Z «Tr exp —P 2
exp more parameters, like electric and magnetic fields,
one clearly also has to consider more complicated en-
To exp( —PQP', /2) there corresponds an integral kernel sembles. ) However, I do not want to go into details
K with since these things are covered in all textbooks on
statistical mechanic s.
Z(x, x) =(2~P) '"'= d3Np
)3N P t Q
2 9 I would rather like to concentrate on only a few as-
pects that are of some importance for the rest of this
which immedi. ately yields the desideratum. For the paper.
lower bound we will consider the case of one degree of
freedom only. The general case is obtained in a 1. The thermodynamic limit
straightforward manner. Using coherent states ~z) (see
This is the question whether, if a sequence of volumes
the first section)
V tends to infinity, the limits

' ' -8B)' ) dz
' —.
and by an easy explicit calculation one obtains (z~H~z) I I

=P'/2+ 3+ V, (q). That 1 can be replaced by u follows or of p as defined by Eq. (3.4), exist, provided that
from our remarks concerning the definition of coherent f(t/)iv(i9E/[V) (or e, p) are kept fixed. (We write [V[ for
states in See. A. A similar inequality holds for spin the measure of the region V if there is a risk of con-
systems (i, ieb, 19 t3a). fusion. ) The existence of such a limit only justifies the

Rev. Mod. Phys. , Vol. 50, No. 2, April IS78

A. &d. (One can also formulate a similar theorem for
Hamiltonians including many-body interactions. ) The
proof of this theorem is quite involved and shall be
omitted here. (See Ruelle, 1969. The idea of using
"corridors" in order to prove the existence of the
thermodynamic limit is due to Yang and Lee, 1952, and
van Hove. Rigorous proofs are due to Fisher, 1964, and
Griffiths, 1965.)
Unfortunately, however, these considerations do not
really apply to physics, because, after all, the inter-
action between particles is not "tempered" in the sense
of Eq. (1.56) but, with great accuracy, goes as 1/x:
there is one contribution from gravitation and an
FIG. 4. Definition of nz and nz+. electrostatic part. Also, one usually has to consider
several species of particles (electrons, nuclei, . ). If ..
one neglects electrostatic forces (i. e. , if one considers
usual distinction between "extensive" quantities, like neutron stars or something similar), then a thermo-
entropy, free energy, etc. , and "intensive" ones like dynamic limit in the usual sense no longer exists. If
temperature, pressure, etc. one takes the free energy as a function of particle num-
At first one has to make what is meant by "V
clear ber, inverse temperature, and volume, the limit
tends to infinity. "
'The least restrictive notion in this
direction is "tends to infinity in the sense of van Hove"
(van Hove, 1949, 1950):
Let V(a), with a ~ R or Z,
be a parallelepiped exists inst'ead of
fxc R~: 0 ~x, ~ a,.l. Consider all translates of V(a) of
the form fx: x,. =n, a, +$, , n, = integer, 0 & $& &a, j. Let lim N 'F(N, P, V),
n„or n~, respectively, be the number of all trans-
lates of V(a) that are contained in a given volume V, or i. e. , the scaling behavior of the thermodynamical quan-
have a nonempty intersection with it, respectively (see tities is entirely different from the usual one (Hertel
Fig. 4). A sequence of volumes is saidtotendtoinfinity and Thirring, 1971; Hertel, Narnhofer, and Thirring,
in the sense of van Hove, if, for any a, 1972). This shows that it is by no means self-evident,
for instance, that entropy is an extensive quantity.
If gravitation can be neglected, but there are only el-
More restrictive is the tendency to infinity in the ectric forces, then for neutral systems (and provided
sense of Fisher. Define V„ to be the set of all points that at least one species of particles are fermions) it
that have a distance less than h to the boundary of V. can be shown that a lower bound H„~ -N C is true
Let D(V) be the diameter of V. Then V- ~ in the sense (Dyson and Lenard, 1967, 1968; Dyson, 1967. A much
of Fisher if there is a function w(n), with ~(o. )-0 for better method for obtaining a lower bound was presented
cx-O, such that, for sufficiently small n and all V, by Lich and Thirring, 1975) and a. thermodynamic limit
in the usual sense exists (Lebowitz and Lieb, 1969;
Lich and Lebowitz, 1972). This, undoubtedly, is one
of the deepest results of mathematical physics: stability
and, in addition, IVI-~. of matte+. (For all that concerns stability of matter cf.
One often considers less sophisticated kinds of limits the review article by Lieb, 1976.)
V-~ such as sequences of parallelepipeds V(a), with
all a; —~, or sequences of balls with radii going to in-
finity. However, the disadvantage of these limits is 2. Equivalence of ensembles
that one never can be sure that the limit is shape inde- The next problem is that of equivalence of ensembles:
pendent and, for instance, for a sequence of cubes it
Are the thermodynamic quantities, computed with the
could be different from that for a sequence of balls.
various ensembles, asymptotically equal in the thermo-
Now the result conc@ming the thermodynamic limit dynamic limits This, of course, need not always be
is that it exists for N-particle Hamiltonians of the form true (for instance, in the regions of phase transitions).
On the other hand, it is "normally" to be expected. Let
(1.55) me illustrate this question by two examples only.

provided that there is a lower bound for the Hamiltonian a. Equi valence of the mlcrocanonl cal ensembles
C being a fixed constant, and if, for IxI being suffi- 1
ln TrB(@IVI —&)
ciently large,
s(q) =lj, m (1.57)

Rev. Mod. Phys. , Vol. 60, No. 2, April 1978

Alfred Wehrl: General properties of entropy

(q =energy density; the underlying Hilbert spa. ce is that Boltzmann's ideas caused many controversies and
of X particles, with A7/~V~ kept fixed). It turns out that there were many objections, such as the reversibility
s is a concave, nondecreasing function of q. As long as paradox (Loschmidt, 1876) and the recurrence paradox
it is strictly inc~easing, all the other ensembles de- (Zermelo, 1896; based on work of Poincare: 1890),
fined by (1.51) yield the same entropy density too (see etc. and as we have seen, there are still open problems
Ruelle, 1969). with them, at least if one desires full mathematical
rigor ~

b. Compar/ son between microcanonical and canonical, The next step towards the modern concept of entropy
ensemble was taken by J. W. Gibbs (1902), who adopted the en-
semble point of view and gave a definition of entropy as
Assuming differentiability of s, take P =Bs/Bq, and the ave&age index of Probability-in phase -(this prob-
define =c —P 's. Then the limit of the density of the ability-in-phase is just the classical probability dis-
free energy of the canonical ensemble exists and tribution of Sec. A: the "index" is ln 1/p, hence the
equals that f.(For more details, once more Ruelle's avera. ge is f p lnl/p).
book should be consulted. ) How the paradox that entropy, after all, should remain
What one can learn from all this is that for large sys- constant, could be resolved, was pointed out by the
tems, provided that the thermodynamic. limit exists, Ehrenfests (1911), who recognized the role of coarse-
the precise structure of the density matrix is not so gr aining.
important. To come back to our last example, the dif- Finally the quantum-mechanical expression for the
ference of the entropies in the microcanonical and the entropy was given by von Neumann in 1927.
canonical ensemble is of order in%, which is big but As far as more recent developments are concerned, I
on the other hand is negligible in the thermodynamic have tried to give the relevant literature in the text.
limit. Therefore. our starting point of Sec. B, the There have been significant contributions concerning
maximum entropy principle, must be formulated as fol-
classical entropy and classical statistical mechanics,
lows: given some intensive quantities (such as tem- and there has been a strong impetus in creating such
perature, density, or energy density, etc. ), the en- fields as ergodic theory and theory of dynamical sys-
tropy density of the corresponding equilibrium state is tems. In contrast, the properties of quantum-mechani-
maximal. cal entropy have not been investigated in detail for a
very long time, and it certainly is to the credit of E.
D. Historical remarks Leib to put forward their study in the last few years.
I would like to conclude this introductory. chapter by
some remarks concerning history. There are many
reviews, historical surveys as well as reprint volumes, II ~ PROPERTIES OF ENTROPY
containing the decisive papers in this field (for in-
A. Simple properties
stance, by Brush, 1965, 1966; Klein, 1972; Koenig,
1959; Roller, 1950), that certainly describe the histori- In this chapter we come to the very object of this re-
cal development of the subject much better than I could view, namely, to describe the various general proper-
do. I only want to make a very few comments that con- ties of entropy. Let me start with a fewextremely sim-
cern "entropy" itself, without pretending that they are ple ones. Some of them me already have discussed in
exhaustive or take into account all important steps in Sec. I as, for instance, the following:
the past. Entropy is defined for every density matrix, it is al-
Thermodynamics in the modern sense has its origin ways «0, possibly =~. For the pure states, and only for
in the work of Mayer (1842) and Joule (1845), to whom them, 5 = 0.
major credit is to be given in the recognition of the first
Again this shows a weak point in a purely classical ap-
law of thermodynamics (conservation of energy, which
proach because in the classical case the "pure states"
in times past wa. s called "force"), and of Clausius (1850)
certainly are density distributions that are concentrated
and Thomson (Lord Kelvin) (1852, 1857), who, based on
at one point (i. e. , 5 functions), and their entropy is
previous work of Carnot (1824), formulated the prin- This does not fit into the interpretation of entropy
ciple of dissipation of energy and the second law of as ' lack of information. "
thermodynamics. That this principle leads to the heat
death was worked outbyvon Helmholtz (1854). The One easily verifies that the range of S(p) is the whole
notion of entropy finally was introduced by Clausius in extended real half-line [0, ~], i. e. , to every number c:
1865. 0 ~ c ~ ~ there exists a. density matrix p such that S(p)
At about the same time the kinetic theory was put for- = C.
ward by Maxwell (1860) and Clausius. An important step The range of the generalized Boltzmann —Gibbs-Shan-
towards the understanding of irreversibility was Max- non (cf. Sec. LA) entropy is [u, v] or (u, v), u—= inf in', (X)
well's Demon (Maxwell in a letter to Tail, 1867) which [2C =non-null measurable subset of 0, v: in', (Q), depend-
illustrated the statistical nature of it. ing on whether the infimum is attained or not (Ochs,
The main contributions, however, in this direction, 1976)] . For the classical Boltzmann —Gibbs entropy, in
are due to L. Boltzmann: the Boltzmann equation and particular, it is al. l of R.
the H' theorem (1872), the relation between entropy and An important property of entropy is invariance. Since
probability theory (1877), etc. etc. S(p) = —ZP in(, P being the eigenvalues of p, S only de-

Alfred Wehrl: General properties of entropy 237

Pl that are first established for finite-dimensional ma-

trices only, to the general case.
Pp Pp In the next section we now want to turn to less trivial
properties of entropy.


B. Concavity
Concavity states that for the density matrix p: A. ]p]
+ A, p, (p„p, density matrices A„A, ~ 0, A, + X, = 1)
FIG. 5. Construction of p' = p EB O.
S(p) Z, S(p, )+~,S(p , ). (2 1)

pends on the (strictly) positive part of the spectrum of (see Lieb, 1975, or Ruelle, 1969). [There is an equality
for X„X,&0 only if p, = p, or S(p, ), or S(p, ), respective-
p. Any mapping p- p'
unchanged also l. eaves
that leaves the positive spectrum
the entropy unchanged. Examples ly is equal to ™.
The proof is very simple. Let p =Z p~ k&(k I. Then,
for such mappings are the following: I

because of the concavity of the function s(x) = —x lnx, S(p)

1. p' =U*pU (U=unitary). (cf. Sec. I. B.) By the way, -
=-&t, »p„=&s(&i I pl») &, &s(&i p, I») I

for the coarse-grained entropy the above invariance +&.&s(&&lp. l»)-& ~&) ls«»I»+»&I ls(p )I»
property is not true, hence, it may change with time. = A. ,S (p, ) + X,S (p, ).
If, in particular, U is a permutation matrix with respect Of course, we did not use any special property of s(x)
to the eigenvectors of p, then the invariance property is besides concavity; thus, for any concave function f, the
also called symmetry. In other words: S is a symmetric mapping p- Trf(p) iS concave.
function of the p 's. Why is concavity considered to be important? Entropy
2. LetH'= HS H", p' = p6) 0. In that case the invariance is a measure of lack of information. Hence if two en-
property is called expansibility. In graphical language sembles are fitted together (what in mathematical lan-
one adds zeros to p as seen in Fig. 5. guage is described by the convex combination A. ,p,
Another simple property is insensitivity. S is a con- + X,p, ), one loses the information that tells from which
tinuous function of finitely many eigenvalues pro- p, ensemble a special sample stems, and therefore entropy
vided that the rest of them are kept fixed. (This, how- increases (Wigner and Yanase, 1963).
ever, does not mean that S is a. continuous function of p,. Let us illustrate things by a simple example. Let p„
cf. Sec. D. ) p, be one-dimensional projections, i.e. , pure states. In
Let us give an example of a function of a density ma- a case in which A„A., &0, p= A]py+A. 2p2 is a mixed state
trix for which the insensitivity does not hold: The quan- (unless p, =p, ). Therefore, S(p)&X,S(p, )+X2S(p, ) =0. By
tum-mechanical version of the Hartley entropy S, (p) is the way, in that case it is no longer possible to recon-
defined by S, (p) = logarithm of the number of eigenvalues struct p„p, from p.
that are t0. If, for instance, p, = .
=p„= 1/ ,n„p„=p„„ ~
Mixing of pure states (the forming of convex combina-
tions of them) yields a mixed state. More generally, it
=0, then So(p) =S(p). If now p, =p, = ~ ~ =0, then S(p) ~

= —p, lnp, —p, in/, is continuous in p, and p„whereas seems to be plausible to argue: if we are given two
mixed states that are unitarily equivalent (p, = U*p, U),
S,(p) = ln2 if 0&/„p, & 1, otherwise =0, hence S, (p) is dis- then mixing of them yields a new state that is more
continuous in p, and p, . However, it should be remarked
that, apart from insensitivity, many properties of the strongly mixed than the two original ones, and S(p)
~ XP(p, )+ &,S(p, ) =S(p, ). We will make this consideration
Hartley entropy are shared in common with the "right"
entropy, such as additivity, subadditivity, and the above
precise in the next section.
invariance property. It should be anticipated that concavity is a consequence
of subadditivity (Sec. F).
If p is of finite rank, i. e. , if only finitely many eigen-
values are &0, then S(p) &~. Now let p be an arbitrary Clearly concavity generalizes as foll. ows: let p„
density matrix. The canonical approximations are ob-
tained as follows:
p„. . . , p„be density
~0, with ZX, =1.
X„A.„. . . , A.„numbers

p'"'=- g p, l»(ul g p, . S A.
] p] & A]S pq (2. 2)

Since Let us now take a fixed density matrix p and let p, be

its time evolution p, = e '~'p e'~'. The time-averaged
density matrix (1/T) J, p, dt shall be denoted by pr.
~ j ~k~&~A ~k I ~It. Then S(pr) ~ S(p) by a. straightforward modification of our
previous proof If p„= li.mr P'r exists, then, for all T,
g (t, Inp, g t, + in P t „
S (p r) & S (p„).
In the next section we will see that the mapping p- p~
this tends to S(p) because —Z~. , p~ in@~-S(p) and Z». , p~ is even mixing-enhancing.
This fact may be of use in generalizing theorems Since the entropy of a time-averaged density matrix

238 Alfred Wehrl: General properties of entropy

always is bigger than the entropy of a density matrix at

a fixed time, one should not be surprised that many phy- w +".+p -w + "+u -Z(e(lple(&
sical measurements yield a value for the entropy that is
bigger than the correct one, because they last rather
long compared with the "relaxation time. " One should
bear this in mind in discussions about the second law of
ther modynamics.
There is an upper bound for the convex combination -Z Z~&@ II' I@ &=+~ .
p=ZX, p, given by
Remark: Whereas inequalities (2. 1) and (2. 2) extend to
S (p) ~ Q X,S (p, ) —Q l(. , inl(, . (2.3)
the continuous case, the continuous analogs of (2. 3) and
(see Lanford and Robinson, 1968). Let us first assume (2.4) are false in general. For instance, consider the
(2.3) in the special case that all p, are one-dimensional projection onto coherent states, lz&(z l, which are pure
projections, p; =P . states, with entropy = 0, and let f(z) be a non-negative
function with
dzf(z)/i(= 1. Let p = J (dzf(z)/g) lz&(z l.
S (p) ~ — g y, 1nl(, . (2.4)
S(p) — f(z) lnf(z)
ease follows from the
Inequal. ity (2. 3) in the general
special case (2.4) because if one decomposes the p, into
rather than &(Wehrl, 1977).
one-dimensional spectral proj ections,
The term -Z l(, 1nl(( occurring on the right-hand side
Q of inequality (2.4) is called mixing entropy. It is most
important if the ranges of the p& are pairwise orthogon-
then al; in that case, there is equality.

S(p) =S g fk
l( p "Q"' ~—
g ia
l(. p" (Ink, + lnp„" )
It should be remarked that this fact allows an axiomatic
characterization of entropy: I et 4 be a mapping of the
set of density matrices into the extended real half-line,
= —
Q q inn, —Q l(, Q p(, "Inp,"' which fulfills the invariance and continuity properties of
Sec. A. Also let M = H, S ~ ~ s SH„, p, = density matrices in
— H, p= l(, p, (B ~ ~ ~ SX„p„. If C (p) always satisfies 4(p)
Q X( Ink(+ Q +(S(p() ~

A. 4
, +4 A, A being a diagonal matrix in the Hil-

Now let us turn to the inequality (2. 4). Let p, ~ p,

bert space C" with entries X„.. . , A.„, then 4 (p) is a. con-
stant multiple of S(p). [This is a quantum-mechanical
be eigenvalues of p, arranged in decreasing order,
and let also the numbers A. , ~ A2~ ~ ~ ~ be arrangedinde-
version of the characterization theorem of Faddeev and
Khintchine (see Renyi, 1966). The above form of the
creasing order. %'e will show that
theorem was written down by Thirring, 1975.] Here is
Pl 1& Pl P2 1 ~2& '
(2. 5) a sketch of the proof: Because of our remarks of the
'+P„~ , +A2+ '+ ~„, . . . last section we can suppose that all H& are finite-dimen-
f), +P2+' A.
sional, . Let p be an n &&+-density matrix. Because of in-
As already discussed in Sec. I.B, this implies that variance, C Q) is a, symmetric function of its eigenvalues

ZP(inP(~ -Z
X( 1nX(. In order to prove the inequalities only, i. e. , 4(p)=I„(p„.. . , p„). Now the mixing property
(6.5) we will use Ky Fan's inequality (Ky Fan, 1949): for implies that I, (1, 0) =I, (1)+0 I, (1)+I,(1, 0), i. e. , I, (1) = 0.
any set of pairwise orthogonal, normed vectors Q(, (Take H„H, to be one-dimensional. ) Furthermore we

- j„(p
have (using P' =P, + P, )I„(P . . , P2) =P'j, (P ~/P', P2/P')
p+) g.
/(1 —P'), . . . , P„/(1 —P' )) + I, (P', 1 —P')
+os ~ (2. 6) (1 p')
&p lplp a.ndj„, (p', p„. . . , p„) =p'I, (l)+I, (p', 1 —p')
+ (1 —p')I„, (p, /(1 —p'), . . . , p„/(1-p')), hence I„(p„.. . , p„)
One ean prove this as follows: l. et g„.. . , g„be another
set of pairwise orthogonal, normed vectors, all of them
=j„, (P, +P., P„. , y„)+ (P, +P, )j, ll, /(P, +I, ), P, /(II, +P, )).
See Fig. 6; the left-hand side refers to the first equality,
contained in the subspace spanned by . . , @„. Then P„.
it is ~~~~di~t~ly seen that Z ((t ( p I l
Now choose g„& l1&, l2&, . . . , ln —1& (remember that lk&
=eigenvector of p belonging to the eigenvalue p~). This (I —dim. }
is always possible since the dimension of the subspace (2-dim. )
spanned by qb, ' ~ ~ qb„ is &n —1. Continue by choosing
I», l» -, ln-2&, 4.; 4. , &», , ln-3&, e„,
~ ".
Hp Hp
at inequality (2. 6). ((n-2} —dim. ) ((n-2) —dim. )
Define H " as the subspace generated
by P, H, . . . , P„H,

and denote by d its dimension. Let @„.. . , be an or- &f&~

thonormal basis for H~"~. Because of Ky I'"an's inequality, FIG. 6. Illustration of the preceding identity.

Alfred Wehrl: General properties of entropy

infinite-dimensional case as well as quite a few of the

theorems presented below are due to Wehrl (1974).]
qW Of course, this notion also makes sense in the clas-
sical discrete case. There, "density matrix" is re-
placed by "discrete probability distribution, and "big- "
FIG. 7. Horv to obtain I 2 from
gest eigenvalue" by biggest probability, " etc. (See, for
~n. instance, Ruch, 1975.)
Let us recall where mixing-enhancement has appeared
so far:
(1) In our discussion of the master equation.
(2) The mapping p-f(p)/Trf(p), f
being concave and
non-negative, is mixing-enhancing (lemma 2 of See. I.B).
(3) "Deleting off-diagonal matrix elements" implies
the right- hand side to the second one. We mant to show that mixing-enhancement because of Ky Fan's inequality: Let
the information function" I, (p, 1 —p) = c(— p Inp —(1 —p) ln(1 p be any density matrix, @, be an orthonormal basis,
—P)). Once this is established, one proceeds by in- and define p" by .
duction: I3(pi, p~, ps) =I2(p, +@2, p, )+ (p, +p2)I, (p, /(p, +p, ),
p, /(p, +p, )) = c(—p, lnp, —p, lnp, —p, inp, ) and so on. Now
let a„=l„(1/n, . . . , 1/n) On. e ea. sily verifies that o.'„ Then the eigenvalues of p" are the diagonal elements
= &„+& . It remains to rule out all other solutions of
{Q; ~p~@;) and, by Ky Fan's inequa. lity, the sum of, say,

this equation except &„=c inn. There is a theorem of
Erdos which states that this is the case if lim„„(n„
—a„, ) =0. In order to prove the latter relation, we note
n of them
-P, (p)+ +P.(p). ".—
in particular of the n biggest ones is

(4) If p =Q X;P;, (the P; being one-dimensional pro-

that, similarly as before, jections, &, ~ 0, QX, =1), then p ~ p' where p'=+A. , Q, ,
o.'„=I (1/n, 1 —1/n)+ (1 —1/n)o! the Q; being any family of pairwise orthogonal one-di-
mensional projections.
Let 0„= a„—o!„„y„=i,
(1/n, 1 —1/n). Here y„-0 as n-~ Other examples of Uhlmann's relation will be given
by continuity (insensitivity). On the other hand,
below. This relation plays a role quite frequently, one
P, +P, + . +P„, reason being its connection with monotone increase of
n entropy (see Sec. I. B).
Here p i p' implies that S(p) ~ S(p'), but even more,
soby Mercer's theorem also P„-0. Turning back to the namely that Trf (p) ~ Trf (p') for every concave function
information function, it suffices, of course, to show that
I, (p, 1 —p) = c[—p inp —(1 —p) ln(1 —p)] for rational p. Let f (we have seen this in Sec. I.B already). This yields,
by the w'ay, a characterization of "p ~ p'": p ~ p' if,
P =q/r; q, r being integers. Take dimH, =q, dimH, =r and only if, for every concave function f, Trf ( p)
—q, and let p = (I/h)1. Then
- »f (p').
The proof is obtained by considering functions of the
form (x) =x if 0 &x & c, f (x) =c if x ~ c. Suppose that
and inserting n„= c inn, p ~ p' is not true. Then there exists a smallest integer
n such that p, (p) + +P„(p) p, (p') +
~ ~
+p„(p'). Now
~ ~ ~

=c ——ln —— 1 ——ln 1 —— choose c =P„(p'). Then Trf (p') =nP„(p') +Q;" „„P„(p')
& nP„(p')++~"=„„p.(p) ~
Trf(p) since P„(p') &P„(p).
which completes the proof (see Fig. 7). Similarly, p ~ p' if, and only if, for every convex
function f, Trf(p) & Trf(p'). (Unfortunately it is not
C. Uhlmann's theory true that Trp ~Trp'~ for all P: 1 ~P&~ implies that
p ~ p'. )
On seve ral occasions we have already met the notion of Now' let us consider another example of mixing-en-
mixing-enhancement. It states that for the eigenvalues of hanc ement:
two density matrices p and p', arranged in decreasing (5) Let U„. .. , U„, ... be
unitary operators, A.; ~ 0,
order, the inequalities Z&; =1. Thenthe convex combination p= &, U,"p'U,
p, (p)- p, (p'), p, Q)+p. (p)- p, (p')+p. (p'),
+ ~ . ~ +~„U„*p'U + ~ .
is more mixed than p'.
This is again a consequence of Ky Fan's inequality.
p (p)+ +p (p)- p (p')+ +p (p')
Let Q, be the eigenvectors belonging to the P~(p). Then
hold. One then says that p is mo re mixed then p' (or ~, (p). ~ ~
(p) =Z."-,Z;~, {e.IU;*p'U;~3
more chaotic), or that p' is purer than p, and writes =ZZ~;«;e. p'Ue. = ZZ~, p, (p ) =Z",=,u. (p').
This explains the word "mixing-enhancement. " Re-
l &

p a p', or p' & p, respectively. Properties of this relation

were first investigated by Uhlmann (1971, 1972, 1973); member our remark of Sec. B: p' and U*p'U, U being
therefore we propose the name "Uhlmann's theory" for an arbitrary unitary transformation, have to be con-
this field. [Uhlmann's original papers referred to the sidered as equally mixed; although they do not contain
finite-dimensional case only; the generalization to the the same information, they certainly contain the same

240 Alfred Wehrl: General properties of entropy

amount of information in any sensible interpretation. trix elements reduces the information and increases the
(Cf. also Sec. G. ) Now p is obtained from the U*p'U degree of mixture. 'The proof is easily obtained by
by a mixing procedure, hence there is a loss of infor- means of Ky Fan's inequality.
mation. (Note that the constituents of p cannot be re- The coarse-grained
constructed. )
density matrix p =+A;P, (A. ; TrP;
By the way, in example 5 it is not necessary that the
= TrpP, , cf. Sec. I.B isi+P, pP;, . hence
s. p; there-
fore not only S(p ) & S(p), but also Trf(p ~) & Trf(p)
U, be unitary; they may be only isometric (i. e. , U,*U;
for any concave function f.
(Remember Fig. 3:
both mappings are mixing-enhancing. )
Clearly the relation ~ is transitive and reflexive, i. e. , .
Et is worth mentioning that Uhlmann's theory has been
a preorder. Thus it generates an equivalence relation: generalized to arbitrary von Neumann algebras by
p- p' if, and only if, p ~ p' and p'I- p, hence if p and p' Wehrl (1975), Alberti (thesis, 1973), and Uhlmann him-
have the same positive spectrum. This equivalence
self. It turns out that this theory provides a powerful
relation may be regarded as the most general concept tool in the investigation of tPe structure of von Neumann
of invarianee (Sec. A): from the entropy, or more gen-
algebras and, in a certain sense, is the "dual" of the
erally from the information-theoretical point of view,
von Neumann —Murray dimensidn theory.
density matrices with the same positive spectra are
equally good.
Uhlmann's theorem states that, in essence, mixing- D. Continuity properties
enhancement is always produced by the mechanism
described in example 5: p ~ p if, and only if, p is in In infinite-dimensional Hilbert spaces, entropy, as a
the (weak) closure of the convex hull of {U*p'U: U function of density matrices, is discontinuous in the
usual topologies. There are only a few restricted con-
tinuity properties. The problems that arise in this con-
We will only sketch the proof. It consists of four nection may be divided into two groups:
steps: (1) Those which are of more mathematical interest
(1) The set A of all operators. A & 0 such that P, (A) and which we will not treat in great detail here.
- p, (p'), p, (A) +p, (A) - p, ( p') +p, (p'), . . . is by virtue of (2) Technical considerations that are of use in ex-
Ky Fan's inequality convex and weakly closed, hence tending theorems that can be proven for finite-dimen-
weakly compact. sional matrices, to the general case. (Cf. the end of
(2) Its extremal elements a.re exactly those A for Sec. A. For a typical example, see Sec. III.A. )
which P;(A. ) =P;( p') for all i or P;(A) =P;(p') for i From section A we already know insensitivity. Other
& n, p;(A. ) = 0 otherwise. restricted continuity properties are:
(3) Apply the theorem of Krein and Milman: Louer Semicontinuity. (This fact seems to have been
= closed convex hull of the extremalA. . well known for a long time, but was written down only
(4) All extremalA. are in the weak closure of by Naudts, 1969; Wehrl, 1976. For other proofs,
EU*p'W. cf. Secs. III.B and IV. B). Let p„, p be density matrices,
such that Trip —p„l- 0. Then S(p) &liminfS(p„).
In the finite-dimensional case, there is another way
of proving the theorem invoking Birkhoff's theorem Ky Fan's inequality tells us that, for the eigenvalues
mentioned in Sec. I. B. Namely, for two sequences of of p„, or p, respectively, arranged in decreasing order,
numbers n, & n, & ~ & n„, or P, & P, & ~ & P„(n; & 0, P& . +p. (p)
p, (p. )+. . +p&(p. )-»lp. —pl+p. (p)+
~ ~ ~ ~

&0, +";=,n, =Q";—,P,. =1) the relations n, &P„n, +n, &P,

+P2 etc. are true if and only if there is a. doubly- since p„= (p„—p) + p. Vice versa,
stoehastic matrix T such that n; =Q& T;, p, (see
p, (p) + . + p (p) - »
p. —p + p, (p, ) + . . + p, (p. ),
Hardy, Littlewood, and Polya, 1934). Then a straight-

forward application of Birkhoff's theorem yields hence l(p&(p) —p. (p. ))+ . +(p, (p) —pa(pn)l - 0 and
Uhlmann's theorem. eventually p~(p„) —p„(p). Thus
I.et us now make a short remark on the order struc-
ture of density matrices (this expression is due to —
Q p, (p) lnp, . (p) = liml. — Q p, (p„) lnp, (p„)j
Thirring, 1975. The lattice structure of density ma-
trices was recognized by Wehrl, 1974): and
For any two density matrices p„p, there exist (up to
equivalence) a "purest" density fnatrix ~ p„p, and a
"most mixed" one &p» p, . Thus the equivalence classes S(p) = suPQ ( —p, (p) lnp, .(p))
of density matrices form a lattice. Its "purest" element
clearly is the equivalence class of the pure states. A
most mixed element does not exist in infinite-dimen- & lim inf supp ( —p, (p„) lnp;(p„)) .

case, namely, p = 1/dim H.
Remark: It is also true that, if p„"~"-p, S(p)
Next, let us generalize example 3: let P; be a family
&lim infS(p„), prozrided that p is a density matnx
of pairwise orthogonal projections (not necessarily one-
this case, also Trip„—pl-0. If p is not a density ma-
dimensional) with QP; = 1. Then gP;pP, ~ p.
trix, it can happen that S(p) & lim infS(p„)(Wehrl, 1976;
This means intuitively that deleting off-diagonal ma- see also Davies, 1972, and dell 'Antonio, 1967).

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

¹i Alfred Wehrl: General properties of entropy 24't

Unboundedness in Every ghborhood. Let p be a One can even dispense with the requirement that
density matrix and e &0 be an arbitrary number. Then Trp„H- Tro H if Tre "& ~ for all P: 0 & P & ~. Then
there always exists another density matrix p' with S(p) in continuous on the sets (p: TrpH «C & ~], even if
Tr~p —p'~ &» andS(p') = ~. (Clearly this implies that Trp„H- TrpH. Namely, Trp(PH) —S(p) & lim inf[Trp„(PH)
S(p) i s disc ontinuou s. ) —S(p„)]; hence —S(p) & lim inf [—S(p„)] + lim sup~ Trp„(pH) ~

+ Trp(PH) for all P &0. Since the sum of the last

For this one only has to change, beginning from a suf-
two terms is &2PC, we have as above -S(p) &liminf
ficiently large index l, the eigenvalues of p in such a
[—S(p„)] and consequently S(p) = limS(p„).
manner that p~ = p„.
. . , P,' = p, , and Now let us turn to the second group of theorerns con-
1 cerning continuity properties, those that are of use for
p» for h&l.
h( h), practical computations. Let me mention two of them
(Simon, 1973):
Entropy i s Almost Always Infinite Due . to lower semi-
continuity, the sets (p: S(p) &n} are closed; also they
1. Let A„,A be compact operators, ~0 (not necessarily
being density matrices), w —limA„=A. Suppose that for
are nowhere dense because of our above statement, the eigenvalues, arranged in decreasing order, p»(A„)
hence (p. S(p) & ~j= U (p: S(p) &n] is a set of first cate-
«p»(A) for all k. Then S(A„)—S(A).
gory (i.e. , the topological analog of a set of measure 0). 2. Dominated convergence theorem for entropy. Again
The set of density matrices with finite entropy also
let A„, A as above, and@ —limA„=A. If there is a com-
has an interesting algebraic property: its finite linear
pactoperator B such that A„&B and S(B) &~, then
combinations are a, two-sided ideal in the set of all
bounded operators (G. and G. Lassner, 1977}.
S(A„)- S(A).
The proof is obtained by using the following criterion
(see Dixmier, 1957): A set of positive operators, say E. Additivity
J, is the positive part of a two-sided ideal J, if and
only if the following conditions are fulfilled:
Additivity states the following: let H„H, be two Hil-
bert spaces, and p„p, be two density matrices. Then the
(i) If A. & J', then U*AUe J' for all unitaries U. entropy of 'the dens lty matrix py p2 in H I H, is
(ii) If Ae J' and 0 «B &A, then Bc J'.
(iii) If A, Be J', then also A+ Bc J'. S(p,P, ) =S(p&)+S(P2) . (2.9)
(Note that any element of an ideal is a finite linear com- Qf course, this generalizes to
bination of elements of its positive part. ) Now if S(A)
& ~, then S(U*A. U) = S(A) & ~. Furthermore, let B &A. S(p, 8 P, '3 ~ ~ ~ 8 P„) = S(p, ) + S(P, ) + ~ ~ ~ + S(P„) . (2.9)
Denote by P, ~ P, ~. . . , or o. , ~ n, ~. . . , the eigen- The proof is very simple: let P» or $& be the eigen-
values of B, or A, respectively; We have P, &o. ;, hence vectors of p„or p„respectively, belonging to the
for all except possibly a finite number of indices, eigenvalues P», or qj, respectively Then .the $»8 $»
s(P~} &s(o.;), thus S(B) &~. Concerning condition (iii), are the eigenvectors of -p, p,, and the corresponding
due to the fact that S(XA) =AS(A) —A jnX for P. ~ 0, we can eigenvalues are PI, q& .
assumethatTr(A+B) =1. (One easily verifiesthat S(A)
&~ implies that TrA &~.) Define A'=A/TrA, B'= B/TrB.
By assumption, S(A. ') &~ and S(B') & ~. Then; S(A+ B) S(p, '3P.) =- Q p»qjln(p, q~)
= S((TrA)A' + (TrB)B') & (TrA)S(A') + (TrB)S(B')
—TrA (ln TrA) —TrB(ln TrB) [inequality (2.3)], and,
consequently, is &~.
=— gP lnP„—g q&lnq& =S(p, )+S(p, } .
From these results one may doubt that entropy is a,
sensible concept in infinite-dimensional Hilbert spaces. From the information-theoretical point of view, this
But fortunately these theorems do not really affect property is quite clear: if we are given two independent
physics. Let B be a, reasonable Hamiltonian such that systems described by Hi pi or H2 p2 respectively
Tre "& ~, and let o 8 e "/Tre
be the Gibbs state. then the information about the total system, described
Suppose that the energy Tro&JI is finite. Whenever, for by H, H» p, p» equals the sum of the information
some density matrix p, TrpH & TroBH, then S(p) &S(vs) about its constituents. As concerns physics, additivity
Hence a density matrix p with S(p) = ~ would also must not be confused with the scaling behavior of en-
have infinite energy. The assertion that S(p) is mostly tropy. It is often thought to be of ajodictical t~th that
infinite therefore is as good as the assertion that Trp~ the entropy of a piece of rnatter composed of two parts
is mostly infinite. However, this is a trivial fact equals the sum of the entropies of these parts. However,
since II is an unbounded operator, but it has no physical it is not, as we have seen in Sec. I.C. It is approximate-
significance at all. ly true for "normal" matter, and exactly true only if
Also the discontinuity of S(p) is not as bad as it may there are no correlations at all between the parts. If
look at first. We will show in Sec. III.B that the relative there are considerable correlations between them, the
entropy S(o ~p) = Trp (lnp —inn), and, consequently, the entropy of the total system may be much smaller (even
free energy E(p, p, H) = TrpH —B 'S(p) (cf. Sec. I.B) is 0).
lower semicontinuous in p. Therefore if not only In the classical case additivity reads as follows: let
Tr~p„—p~- 0 but also Trp„H- TrpH, i.e. , if the energy Q„Q, be two phase spaces (with elements zo„go, ), and
expectation values converge, then S(p) &lim infS(p„), let p, (w, ), p, (w, ) be two probability distributions. For
-S(p) &lim inf[ —S(p„)]= —lim supS(p„), hence S(p) = limS(p„). p(w„w, ) —
= P, (m, )p, (so, ), S(P) =S(P, ) + S(P, )

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

242 Alfred Wehrl: General properties of entropy

In the earlier sections the special role of entropy did tributions p, (w2), or p, (w, }, from a distribution p(w„w2),
not appear; rather, the traces of any concave function by integrating over the other variable:
was taken to be more or less as good as entropy. How-
ever, the property of additivity distinguishes entropy
among all functionals of the form p- Trf(p), where is f
p, (w, ) = f dw, p(w„w, ) (2.11)
a measurable function: if Trf(p, p, ) = Trf(p, )+ Trf(P, ), and vice versa.
then f(p) = constp lnp. Now subadditivity states that
Due to the assumption Zf(p„q;) =Zf(p„)+Zf(qj) S(P) &S(P, )+S(P,) =S(p) HP2) . (2.12)
=Q[q~f(p„)+p„f(q, )] fo.r all sequences p„, q&, hence
f(P, q ) =q;f(P, )+P,f(q;). For g(x) =f(x)&x, g(P. q, ) This appears plausible since, when forming p, and p„
=g(p„)+g(q, ), i.e. , g(x) = const lnx, f(x) = constx lnx. one loses the information about the correlations. (Also,
There are, of course, other additive functionals of p, one cannot reconstruct p from p, and p, .) However, it
'but they are not of the form p- Trf(p). is false that p, (3 p, )- p. (Lieb, private communication.
An example is provided by the so-called n entropies.
[In the classical case they were introduced by Renyi.
This follows also from the fact that the n entropies for
o. o 0, 1 are not subadditive; see below ).
proof is obtained from the inequality for the relative
See Renyi, 1966. I or the quantum-mechanical case,
see Wehrl, 1976; Thirring, 1975. The case n = 0 was entropy (1.41) S(p, (2 p, p) -0. Now S(p, (3 p, p)
~ ~

= Trp [lnp —In(p, (3 p~)] = Trp(lnp —lnp, (31 —18 lnp, )

(classically) invented by Hartley, 1928. The case o. =2 = Trplnp- Trp, lnp, —Trp, lnp, . In the classical case
has been considered occasionally in the past, for instance
= 1/ the proof is quite analogous, even for the generalized
by Fano, 1957, and Prigogine, 1972.] S„(p) —
Boltzmann-Gibbs-Shannon entropy .

(1 o. )ln Trp for o. )O, 1, ~; So(p) Hartley entropy
W c =—

(see See. A), S, (p) =—S(p), S„(p) —

= —Iniip(~. (One verifies Subadditivity is a stronger property than concavity.
easily that, for fixed p, the mapping o- S( )pis con- In fact, our next example shows that concavity is a con-
tinuous for 1 & n & ~.) In infinite-dimensional Hilbert sequence. of subadditivity.
space, these n entropies, for a &1, are not concave, In order to verify this, take two Hilbert spaces I-l, and
however (and also not subadditive; this property only
holds for S, and S. Cf. next section). On the contrary, H» and let e„e„. . . be an orthonormal basis in H, .
they are compatible with Uhlmann's relation: p~ p'
Then H =H, (3 H, =EBH ", '=
H — H, (3 e,-. Now let p be the
~S density matrix EB X;p; (p; = density matrix in H„X; ~ 0,
(P) ~S„(p').
Z ~; = 1}. Tr„p =Z &; p, Tr, , p is a matrix in H that
is diagonal with respect to the basis e, , and its entries
F. Subadditivity are the &;. Then S(p) =EX;S(p;) —Z&;In&; by subaddi-
tivity is &S(p, )+S(p~) =S(ZX; p;) —ZA, Ink, , hence
Like additivity, subadditivity refers to a system com- S(Q X,p;) ~EX; S(p, ). One ean argue in the opposite di-
posed of two subsystems; however, they are no longer rection: subadditivity is implied by the properties (a)
supposed to be independent. That means that, in the Hil- that deleting off-diagonal matrix elements increases en-
bert space H = H, (3 H„we consider now density matrices tropy (example 3 of Sec. C), (b) the mixing property
of a more general kind than p, (3 p, . S(~F A. , p,. ) = X,.QS(p,. ) —QX,. Ink, , if the ranges of the p,.
What information about, say, the first subsystem is are orthogonal, and (c) concavity (Wehrl, 1976).
contained in p'P It is given by the partial trace (reduced
—TrH p. p, is a density matrix in H, We already have indicated in the preceding section
density matrix) p, =
that the Hartley entropy is subadditive too. There are
with matrix elements ((p p, ~(t))((t), g(= H, ) defined as fol-
no other functionals than linear combinations of 8 and
lows: let e»e». . . be an orthonormal basis in H, . Then
S, that are invariant (in the sense of Sec. A), additive,
and subadditive (except. for some trivial possibilities
(2.10) like "entropy" =0 if p has finite rank, otherwise =~)
(characterization theorem of Aczel, Forte, and Ng,
It can be shown that the right side of (2.10) is independent
1974. The quantum-mechanical version is due to Ochs,
of the e,. basis. (p, is, so to say, some average over the
second system. ) In an analogous way, p, — = Tr„, p is con- 1975). If, in addition, insensitivity is required, we are
left with the constant multiples of S only.
structed. Of course, for density matrices of the f orm
p = p, 8 p„Tr„,p = p„and TrH, p = p2. [ The notion of par- The proof is rather involved, therefore we can give
tial trace is a special case of a so-called "conditional here only an outline. As in the proof of the character-
expectation, (Umegaki, 1954; cf. Takesaki, 1972; ization theorem of Khintehine and Faddeev (Sec. B), it
Guichardet, 1974).] suffices to consider the finite-dimensional case only.
If T (=- B(H) is an operator of the form T, (3) 1 Then 4 (p) is a symmetric function of the eigenvalues of
(T, (= B(H, )), then TrpT = Trp, T, . (This property may p; 4 (p) =f„(P„.. . , P„). Because of expansibility (Sec.
equally be used as a definition of p, .) Since the state of A),
a system is determined by the knowledge of all expecta-
tion values of observables, one can say that in that
sense p, contains all the information about the first sys-
I„(p„.. . , p„) =I. „(p„,p. , o) .
tem, and p, all the information about the second one. Additivity implies that for n numbers P„.
. . , P„with
In the classical case one constructs the reduced dis- Qp,. =1 and m numbers q„.
. . , q with Qq, =1

Rev. Mod. Phys. , Vol. 50, No. 2, April 'l978

Alfred Wehrl: General properties of entropy

Inm(P1 q1»
. Paqf». Pn qm) or, with P, =y, P, =x,
= I„(p„.. . , p„)+ I (q„. . . , q.), (&-~&f( ) +g(~&=(&-»f(& ) +&(»
whereas subadditivity states that
Differentiating with respect to x and y yields
I„(r„, . . . , 2„,, . . . , 2„)&X„(p„.. . , p„)+I (q, , . . . , q ) .
In the last relation, the x~z are a double sequence of
non-negative numbers with Z„, J22f =1, and P; =Z22';~,
(1 —x)' 1 —x 1 —y2 -(, )
(One can prove that all derivatives exist. ) For s =y/
q; =Z„2». As in Sec. B, the first part of the proof con- 1 —x, t=x/1 —y, this becomes
sists in showing that, for 0&p&1, the information func-
tion I,(p, 1-p) =aS(p, 1 —p)+A, ; with S(p, 1 —p) = —plnp f
s(1 —s) "(s) = t(1 —t) "(t) f .
-(1 —p) ln(1 —p) and a and A2 being constants. For this, The left-hand side depending on s only, and the right-
we need three lemmas:
hand side depending on t only, both sides must be a con-
(1) Let f(p) =I (p, 1 —p). We note that f(p) =f(l —p). stant, and by integration one arrives at
From the next lemma it follows that is nondecreasing f
in [0, 1/2] and nonincreasing in [ 1/2, 1], and that it is f(t) = —a[ t (ln t —1) + (1 —t) (ln(l —t) —1)] + bt+ C .
Because of the symmetry, 5=0, and thus we have ob-
(2) Symmetry, additivity, and subadditivity can be
used to obtain the inequality tained the expression for f(p) asserted above, with A. ,
=C+a. How do we get from I, to I ? Suppose all P,-&0
I.(1 —q, q) —g(1 —P)(1 —q)+P(1 —~), (1 —P)q+P~) and consider the expression
I (p(l —q), pq, p„. . . , p

—I (P(1 —2'), P2', P ) P„~,

4(p, +P„P.. .P )= ~,(p, +P-. , p. . . .p )+(P, +P.)A.
—aS(p, + p„p„.. . , pm)
&I.(p(1 —q)+(1-P)(1 —2"), Pq+(1-P)2')-I2(1 —2 2') .
It is symmetric in p„. . . , p, and
Pm (Pl +P2& P3& P4» ' ' ' Pm) (m(P 1 +P3» P2& p4& ~ ' ' p»!)

I2(1 —q, q) —I2((l —p)(1 —q) + p(1 —2.), (1 —p)q+ pz) From this one concludes easily that it is a constant, A
- I.(P(1 —q)+(1-P)(1 —3), pq+(1-P)2') - I.(1 —2", 2) . Therefore

Inserting ~=1 —q, one arrives at I (p„. . . , p )= (p, +p, )as

Pl P2 Pl P2
f(q) &f((p(1 —q)+(1 P)q) . -
Given q, the set ( p(l —q) + (1 —p)q: 0 &p & 1) is just
+p„p„.. . , p )+A
[q, 1 —q], hence f(p) is nondecreasing in [0, 1/2] and =as(p„p„. . . , p )+A. .
nonincreasing in [1/2, 1] . Inserting P =1/2, we obtain
It is immediately seen that A „=A +A„. Inequality
f(2 q+ -'~) -2 f(q) + 2f(~), (2) and our last result yield
hence is weakly concave, and, because of monotonicity,
liminf(A „—
A ) ~ 0.
(3) The above lemma is also the essential ingredient in This relation is similar to the one we have obtained in
proving the following recursive relation: For 0& q& 1, Sec. B. There we used a theorem of Erdos to conclude
the difference that 4 = blnyn. In our case this is guaranteed by a
I stronger theorem of Katai, 1967. The proof is now
((1 —q)P, Pq, P. . .P )-PI.(1 —q, q) completed by a simple application of expansibility'.
is independent of q,
I (p„. . . , p ) =as(p„. . . , p )+A„,
I.( ~ ~ ~ )-PI.( ~ ~ ~ )=~. ,(P, P. , ",P. ) . where k =number of p's that are 40. A final remark ap-
Now plies: there are various other characterization theo-
rems (see Aczel and Daroczy, 1974). However, from the
+ 2 1+ 2
physical point of view, the two theorems of this chapter
seem to be the only "natural" ones.
Let g(x) —
= Z2(l —x, x). Because of the symmetry of I, one
obtains the functional equation Before discussing some applications of subadditivity, let
me make some remarks on the question of monotonicity,
(p, + p, )f ' +a(p. ) =I.(p„p., p. )
Neither quantum-mechanically nor in the classical con-
1 2 tinuous case is it true that S(p, ) &S(p)! In the classical
case, one could, for instance, have S(p, ) &0, then S(p),
=I,(P„P„P,) for p(3(&„w'2) = p, (3(&, )p2(24&2), is &, S(p, ). (One can easily
give other examples not involving negative values of en-
=(P, +P, ) f 'p +g(P.), tropy. ) In quantum mechanics, p may be pure, but p,
p j. 3 may not be, hence S(p, )&O=S(p).

Rev. Mod. Phys. , Vol. 60, No. 2, April 1978

244 Alfred Wehrl: General properties of entropy

Take, as an example, H, = H, = C, @„P, (or g„g„re- It should be noted that the triangle inequality is false
spectively) = orthonormal basis in H, (or H.„respective- in the classical continuous case, because the analog of
ly). Let p= projection onto 2 above does not hold.
Now let us describe some applications of subadditiv-

Existence of mean entropy for translationally invariant

This is a strange phenomenon, even though its formal
origin is quite simple. In Sec. III. Awewill discuss why it Let Vc: R~ (or Z~) be a bounded region. We attach to it
i s not observed in "real matter. " the Fock space H(V) (Sec. I.A; as concerns statistics, we
do not care for +; our results are independent of whether
In connection with the heat death, it has been noted that there are Fermi or Bose statistics). Remember that
one could imagine that the universe is in a pure state, H(VU V') = H(V)IIH(V') if VA V' = (in the sense of Sec.

with entropy = 0, but nevertheless the entropy of suf- I.A).

ficiently small subsystems (earth, galaxies, . . . ) in- A state on H(V) is described by a density matrix p».
creases (Lieb, 1975). After all, such a possibility is It is plausible to require that all these density matrices
not excluded. However, this field certainly is very are consistent in the sense that p~= TrH~, p&,„~. if VA V'
speculative and I do not want to proceed further in this Note: p» is not necessarily a Gibbs state.
direction. The entropy of a subvolume is defined by
Monotonicity is valid for the classical discrete case S(V) =-S(p, ). (2. 14)
and (in the opposite direction) for the relative entropy
(even in the quantum-mechanical case): S(a', p, ) i
Subadditivity implies that S(VU V') ~ S(V)+S(V') if V
~ S(o'i p). (For a. proof, see Sec. III. B.) It is also valid 8 V' = The problem we want to study is the following:

for the right" classical continuous entropy (not the con- Does there exist a. limit S(V)/i Vi (i Vi being the volume
ventional one), i. e. , for p" as defined in Sec. I. A(Wehrl, of V, or the number of lattice points in V, respectively),
1977). as V-~ in some suitable sense (for instance, in the
Let us come back to our example above (p pure, p, sense of van Hove; see Sec. I. C), provided that the sys-
not). Two remarks are appropriate: tem is translationally invariant, i. e. , S(V+ a) =S(V) for
(1) If p is pure, then S(p, ) =S(p, ); moreover, the posi- all a(= R", or Z", respectively.
tive spectra of p, and p, coincide. To begin with let us consider the case of a one-dimen-
sional lattice system. I et V be an interval of length l:
Let Q„g~ be orthonormal bases in H„or H„respec-
V=(k, @+1,. . . , k+ l —lj. Because of translational in-
tively. Let y be the vector Zc&„p&@g . p=—iaaf)(pi
variance, S(V) is a. function of l only: S(V) =F(l), l — = Vi.

=Bc,~c&~i @,)(&f&& ~. Let C be an infinite matrix with en- By subadditivity of the entropy, is also a subadditive
function of l:
tries c,~. The eigenval. ues of p, equal those of CC*.
Similarly, one finds that the eigenvalues of p, equal those F (l, + l, ) ~ F (l, ) + E(l, ) . (2. iS)
of C*C. But it is well known that. CC* and C*C have the
same positive spectrum. Using a classical theorem of analysis (e. g. , Polya and
Szego, 1970) one concludes that the limit F(l) jl exists:
(2) Given p„one always can find a. Hilbert space H, and
F(l) = lim
. S(V) . . F(l)
a pure density matrix in Hy(3H2 such that p, = TrH.,p. lim = s, with s =—inf (2. 16)
g ~ce
Let p, =ZP, @,)(P, i. Take for H, a Hilbert space with
The same argument would work in the continuous case
the same dimension as H„and with an orthonormal bas-
Is (Z being replaced by R) too, provided that one would
ki~ 4~~ ~ ~ ~
have some bound on E(l). However, subadditivity is not
sufficient to provide such a bound.
'I p= I x&&~ I x—+
= ~P»
0 ~ «. As an example, consider for V intervals [a, b) and de-
fine "S"([a,b]) = 0 if b —a is rational, =™if b —a is ir-
From remarks 1 and 2 one can derive the triangle in-
equality (Araki and Lich, 1970) which gives a partial
r ational.
We will see in Sec. III.A that such a bound is in fact,
compensation for the failure of monotonicity:
for quantum systems, provided by strong subadditivity,
iS(p ) —S(p ) i
- S(p)- S(p, )+S(p, ). (2. 13) which is a sharpening of the subadditivity property dis-
cussed in this section. Namely, strong subadditivity
(Of course, the right-hand side is merely subadditivity. )
yields, for /' &lo, the inequality
We want to prove the inequality S(p, ) & S(p)+S(p, ); in-
terchanging remarks 1 and 2 yields the rest. p is a den-
F(l') +F(2l, —l') ~ 2F(l, ), (2. 17)
sity matrix in Hy(SHp Due to remark 2 there exists a hence in the quantum case, E(l') ~ 2E(l, ), because the
Hilbert space H, and a pure density matrix 0 in. HyH2 quantum-mechanical entropy is always &0.

H, such that p= Tr o'. Let o', = p. S(o, ) =S(p) But inequality (2. 17), which also holds in the classical
because of remark 1. p, = TrH. ,o'. S(p, ) =S(o'»), o» case (see Sec. III. A), allows us to prove the existence

= TrH o'. By subadditivity, S(p, ) =S(a'») ~ S(p2)+S(o'3) of the mean entropy even for classical continuous sys-
=S(p, ) +S(p). tems, where it is not true that E(l') ~ 2E(l, ). We have,

Rev. Mod. Phys. , Vol. 50, No. 2, April 397S

Alfred Wehrl: Generaf properties of entropy 245

with s = infF(l)/l, Then

F (l') ~ 2F (l, ) —(2l, —l')s . s(v)- o,
Choose l, such that F(l, ) ~ lo(s + c). Any l can be written S(V') ~ S(V) if V'~ V (2. iS)
as l =nl, +l' (n= integer, I' &l, ). Therefore F(l) & nF(l, )
+F(l') (by subadditivity) and and

F(l) (2+n)F(l, ) —(2l, —l')s S(V') ~ S(V) + S(V'gv) .

nl +l' (b) Configurational entropy of classical statistical
and, since for 3-~ also n —~, mechanics (Robinson and Ruelle, 1967). In classical
statistical mechanics it is often sufficient to consider on-
11m suI'
F (l)
F (l, )
ly the probability distribution in configuration space in-
stead of the complete distribution in phase space. Tak-
thus limF(l)/I =s. ing into account the possibility of a variable number of
particles moving in the (bounded) volume V, one thus
arrives at a family of symmetric distributions
2. IVlonotonicity f&)
pv (Vii ~ ~
i A)i & =Or Ii . . ~

(2. 19)
If S were monotonic there would be no need to refer (&) =
to strong subadditivity in order to establish the exis- 1)
tence of limF(l)/I. But monotonicity is generally true Since in a classical theory there is no need of introduc-
only in the classical. discrete case. It will turn out, how- ing the 'correct" normalization condition (1.3) one
ever, that it is also true in our case of translationally chooses it as follows:
invariant systems; again, this result relies on strong
subaddit ivity.
Nevertheless there are some instances where some
sort of monotonicity can be proved even without any
e- fVi
f " dd, dd„pt"' =1 (2. 20)

and defines as the configurational entropy

knowledge of strong subadditivity.
of them:
We will discuss two

(a) Quantum lattice systems. (This, of course, also

comprises the case of classical lattice systems, cf. Sec.
S„„(V)= — g,
X o
e-l Vt


Note that this kind of entropy is defined in some sort of

dq, " 'dq~p' ~
lnp~~ ~. (2. 21)

I.A. ) Letus drop the assumption d=1. Remember that "classical Fock space. " It is very different from the
H(V) = H„, H„being Hilbert spaces of fixed finite di- grand-canonical entropy of Sec. I for two reasons: (i)
mension, say v. One readily verifies that the normalization condition, and (ii) the kinetic energy
is omitted.
One can show, extending our previous arguments, that
Let VC: V'. By subadditivity, S(V') ~ S(V)+S(V"), V" the analog of (2. 18) holds:
= V'gv), hence

S(V ) -S(V)- S(V")- in~=() V'~ — V() in~.

s, „,(v) o,

In the classical case, also S(V) ~ S(V'). In the quantum-


s. „,(v) s. „,(v) . v v'~ v, (2.22)
and also that subadditivity (and even strong subadditivity)
mechanical case, let
S(V) =S(V) — V~ in~. ~

ddl~ ' ' ' dg's pp (lnl —Inpv )

ee- )V)
dq, ~ ~
dye(l —pcs') (by Klein's inequality; cf. Secs. I.A and I.B) = 1 —1 =0 .

'The second inequality follows from the first one and subadditivity: For VC V',

. .
s, „,(v') -s, „,(v)+s. „(v . i v) -s..„,(v) .
Turning to subadditivity, we first have to define S„„,(V) and pP' in terms of pg':

~g, N
yl yu pP ( 1& ' & N&yl»yhf)

etc (V"=V'g. V, x;c V, y~c=- V"). Similarly, S„„,(V") and pt~f,' are defined We have.

Rev. Mod. Phys. , Vol. 50, No. 2, Aprit 1978

Alfred Wehrl: General properties of entropy

dg ' ~ ' dgN @vs

(N )
e- lv~l ' '
~ ~ ~
dy, ~ ~ ~ dy„pv", '(xl ' ' N yl ' '' yN) 'I Pp. (' ')
N, M=0 I

since every point q(= V' must either belong to V or to V", and
t'~ l
r N
dq dq =
g I

( 3 V
dXj ~ ~ ~ dX

S, „,(V) =- g

dx, ~ ~ ~
dxN pp'(x„. . . , x„)Inp&~"'
~ ~ ~

e- lvl e- V" I
dX 1
¹& M.
~ ~ ~ CfXN
~ ~ ~

(x1). . . , xN, y1, . . . , yN) Inpv(N)((x1). . . )xN)

(N+ M)(
x pv)

No te that IVI+ IV" I= IV'I ~ A similar formula holds for S, ,(V"), and therefore

S„„f(V)+S„,~(V") —S„,f(V')
e- l V l
(N+ M)
dX ~ ~dX N y1 dyNP'v) (x1) ' ' )xN)y1) ' ' ' )yhl)
N! M! pgM

[ lnpvN'"'(x„. . . , yN) —lnp&~"'(x„. . . , x„)—Inp&~"„' (y„.. . , yN)]

e- I V'I
~ ~ ~
dxN y1 yN(PV )( 1) ' )yN} PN ( 1) ' ' ) N)PV) ~ (yi) )yg))
t VggM

(again by Klein's inequality) ~ 1 —1 =0. To illustrate this let us consider 9 or the configura-
Let us return to our example and indicate how mono- tional entropy. For simplicity we will use the same
tonicity can be used in order to establish the existence letter 8 for both S itself and the configurational entropy.
of Iim~(V)/IVI or IimS„,r(V)/IV (Of course, forquantum Let a =(a„.. . , a~) be a vector in Z ~ or R~, and let V(a)
lattice systems the existence of limS(V)/ V also implies

be the box $xc Z~ or lR~: 0&x,. & a,. Let us also ).
the existence of limS(V)/ V I.) We shall consider the
above case (a) only (for the configurational entropy things
work in quite the same manner) since for our following
! V(a)!
arguments we need only relations (2. 18), or (2.22), re-
spectively. Suppose now that a sequence of volumes V tends to in-
Choose &, l„
l, and n as before in example 1, re- finity in the sense of van Hove (cf. Sec. I.C). We choose
placing S, however, by S in the definition of P(E), thus a, such that B(V(aQ)) ~ IV(aQ) I(s+&). Define nv, or nv,
defining s = infP(l)/l, etc. Inequality (2.1V) is then re- respectively. , as in Sec. I.C. By assumption, nv/n'„- l.
placed by Monotonicity and subadditivity imply in the same way as
before that
Z(nt, ) nl, Z(l, )
l 2 20 n-„-
S(V} ~ —"
(s+e) . (2.24)
It remains to show an inequality of the kind
&(I) . I'(I)
) n'„-V
which would be a consequence of the inequality
3. Dimensions & 't S(1' ) - s I V( . I,) (2.26)
In the case of dimension &1, subadditivity is definite- where I"v denotes the union of the n+„ translates of V(a, )
ly too weak to establish the existence of the mean en- that cover V. However, the latter inequality can only be
tropy, even in lattice systems. Also monotonicity is not obtained by invoking strong subadditivity (cf. Sec.
suf f icient. III.A).

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

Alfred Wehrl: General properties of entropy 247

4. The Kolmogorov-Sinai invariant sponding average lack of information is

The construction of the Kolmogorov-Sinai invariant of —PP; log, P; . (2.28)
classical ergodic theory also makes use of subadditivity
(see Sec. IV.A). This is usually called Shannon's formula, although it was
Let us once more come back to the mean entropy. One discovered by Wiener independently.
easily can show that the mean entropy, shouM it exist, From here there is a short way to physics: if the set
is affine on the set of translationally invariant states. E is intereted as a set of N measurements, and the P;
Again let us consider case (a) only. Since, by Eq. (2.3) are the probabilities of finding the system in the pure
for pv= p~ v+(1 —X) p2 v, state ~i), then, expect for an irrelevant factor ln2,
XS(p, ~)+(1 —X)S(P, v) ~S(P~) Shannon's expression equals the definition of entropy.
This enables one to apply quite a few mathematical re-
sults from information theory to entropy, and we already
have done this on several occasions. Examples are, for
instance, the characterization theorems of Secs. B and
~ »(p, v) + (1 —X)S(p2, v) + ln2, F.
However, one has to bear in mind that information
in the limit ~
V ~— theory does not contain any quantum mechanics, so that
it can be applied directly to the classical discrete case
only, i.e. , if there is no noncommutativity involved at
all. If one wants to apply it to the general quantum-
mechanical case, one usually has to worry about prob-
lems arising from noncommutativity, so that not every
G. Entropy and information theory
result of information theory has a quantum-mechanical
In principle we should now treat strong subadditivity. "translation. "
However, since it is As. concerns noncommutativity, it seems quite natural
(1) closely related to the concepts of relative entropy to ask for quantities that measure the amount of non-
and skew entropy, and commutativity of two operators rather than the amount
(2) requires quite a lot of nontrivial mathematical of information contained in one density matrix. We will
preparations, we prefer to devote an extra part of this do this in Secs. DI.B. and C. , as we just have said, in-
review to these problems and to close this second part formation theory does not cover that subject.
with a rough account of the connection between entropy Turning back to Shannon's formula, it has tto be added
and inf ormation. that one can conceive of many other measures of the
The principle that entropy is a measure of our ig- amount of information contained in a probability distribu-
norance about a given physical system was recognized tion or a density matrix. These measures usually have
very early (see, for example, Weaver, Appendix to only little importance, as was already pointed out in the
Shannon and Weaver, 1949; v. Smoluchowski, 1914); introduction. In the previous section we occasionally
Boltzmann was also aware of it. were concerned with the quantum analogs of Renyi's n
On the other hand, the mathematicaE theory of in- entropies. Other quantities one could think of were, for
formation (Shannon and Weaver, 1943) originally was instance,
intended as a theory of communication. The simplest
problem it deals with is the following: take. any message
(for instance consisting of words or of digits). One can f being an increasing convex or concave function, or
represent it as a sequence of binary digits and thus, if
f ' [ Tr pf(-Inp)]
the length of the "word" is n, one needs n digits to char-
acterize it. The set S'„of all words of length n contains (Henyi, 1965; Aczel and Daroczy, 1963) or
2" elements, therefore the amount of information needed
to characterize one element of its is log, of (the number (Tr p" —1),
of elements of E„)=log, N, with N=2". Elaborating on
this a little bit, one arrives at the result that the amount etc. (Daroczy, 1970).
of information which is needed to characterize an ele- What'one can learn from considering these "entropies"
ment of any set of power N (not necessarily of the form is that mixing-enhancement means loss of information
N=2") is log, N Now let E.be a union E,U ~ - UE» of in the worst possible way because not only does entropy
pairwise disjoint sets, =number of elements of E;.
increase but also all the other measures of lack of in-
Let P, =N, /N, N =EN;. If o-ne knows that an element of formation increase.
E belongs to some E,-, one needs log, additional in-
By means of information theory it is possible to re-
formation in order to determine it completely. Hence phrase the maximum entropy principle in other terms:
the average amount of information needed to determine suppose that for some system you know' only a few,
an element, provided that one already knows to which macroscopic quantities, and you have no further knowl-
E, it belongs, is Z(N. ;/N) log, N; =Zp; log2Np; edge of it. Then the system is expected to be in the
=gp, . log, p,. +log, N. Now we just have seen that log, N state with maximal entropy, because if it were in a state
is the information that is needed if one does not know with lower entropy it would also contain more informa-
to which E', a given element belongs, hence the corre- tion than previously specified (Jaynes' principle, Jaynes,

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

248 Alfred Wehrl: General properties of entropy

1957). However, as we have already discussed in Sec. P(l') 2F(l) .& (3.3)
I.B, one has to be careful with such arguments because
This is true because any interval of length E' can be
they only make plausible, but do not actually prove, the
maximum entropy principle.
represented as the intersection of two intervals of
length 7; thus by strong subadditivity
On the other hand it is amusing to note that in practical
applications of information theory, such as in technology, F(l') & F(l') + F(2l —I') & F(l) + E(l),
biology, etc. , the second law of thermodynamics has
On the other hand, a:s we have seen in Sec. II.F, the
been adopted and there called the negentropy principle
second inequality of the latter relation is also sufficient
(see Brillouin, 1962). Thus we find a mutual interaction
between physics and information theory rather than a to prove the existence of the mean entropy in the classi-
perfect understanding of statistical mechanics on the cal continuous case (see Sec. II.F).
grounds of information theory.
The following remark, due to E. Lieb, applies: let
x =2l —I', y = I' in the last formula. Then 2E(x+y)/2
~ E(x)+E(y), i.e. , E is weakly concave. To show that
III. STRONG SUBADDITIVITY AND LIEB'S E is concave, i.e. , F(Xx+(I —X)y) ~'A F( x)+( I—X)E(y),
THEOREM it is sufficient to have I bounded above in any interval.
A. Strong subadditivity Conversely, if F is concave, this implies strong sub-
In Sec. II.F it turned out that mere subadditivity often additivity.
is too weak a property, and that strong subadditivity is (2) This problem is closely related to the problem of
needed. By this, the following is meant: given three monotmzicity of the quantum-mechanical entropy, i.e. ,
of proving that F(I') &F(l) (cf., our remarks of Sec. II.F).

Hilbert spaces H„H„H, let p be a density matrix in
H1 H~(3 H3. Define the partial tra If there is no translational invariance, we already have
seen that this need not be true, However, if the system
p» —Tr„p, etc. (In order to have a less cumbersome no- is translationally invariant, then one can use strong sub-
tation, f rom now on instead of p we will write p»„ instead
additivity to show that
of Tr„„we
Hp H3 will write Tr», instead of Tr„weH&
write Tr„etc.) Then F(l) —F(l') F(l+m) —F(l'+ m)
S(pz2, )+S(p, ) &S(p„)+S(p„) . (3.1) for every m ~ 0, in particular for m =zz(l —l'), zz being an
If H, is one-dimensional, this reduces to normal sub- integer. Consequently,
additivity. 'The same inequality holds in the classical — P [ F(l + zz(l —I ')) —E(l '+ n(l —I'))]
case, there being given three "phase spaces" Q„Q„Q,; E(l) —P(l ') ~ ~ n=i
p„, is a probability distribution in 0, x 0, Q, , p, (w, )
—[F(I+a(f —I')) —F(I)]

= f dzz, dw, p„,(w„w„w, ), pz2(w„w, ) = f dw,

pz2, (w„w„w, ),
= .
In the classical case, the proof of inequality (3.1) is If N-~, the right-hand side is ~ infE(l")/I", which we
very simple: one only has to use the inequaltiy already know to be limP(l")/l", and which, in quantum
mechanics, is ~~0 ~

dw, dw, dw, p», (lnp», —lno) ~ 0, 3. Now let us consider translationally invariant sys-
tems in dimensions &1. If we are given a lattice system
valid for every probability distribution o, and to take and consider a. sequence of boxes whose lengths tend to
o = p„p„/p, . Then infinity, then again, as in example 1 of Sec. Q.F,
limS(V)/I V exists. I

dw, dwzdw,p», (lnp», + Inp, —Inp» —Inp») ~ 0,

V being a parallelepiped (x: 0 & x,. & a,. S(V) is a },
which is just the assertion.
function P of a„.
. . , a„and VI is a, ~ ~ ~ aa. By suI

additivity alone, I" is a subadditive function of every

Qn the other hand, in quantum mechanics the proof is variable a,. separately. A straightforward modification
extremely difficult. Therefore, before turning to it, let of the theorem used before shows that
us consider how strong subadditivity can be used in
physical problems. E(a„.. . , a„) .
= lnf E'(a„. . . , a„)
Let. us first put inequality (3.1) into another for m: con- Q]f ~ 0 0 ting 1 1
sider two volumes V, V' (not necessarily disjoint) and the On the other hand, if we are given a continuous system,
associated Pock spaces H(V), H(V'), H(VA V'), H(VUV') again we have to make use of strong subadditivity. The
(or the Hilbert spaces for lattice systems as indicated following argument is due to Araki and Lieb (1970).
in Sec. I. A). Then, with the notation of Sec. II.F, "',
Choose the box V(a, ) with edges a, . . . , az„" so that
S(VA V')+S(VU V') S(V)+S(V') .
& (3.2)
S[V(a~ ]
Now we are in a position to state some applications. ( )
It (a.)I
(1) In Sec. II.F we were concerned with the problem of
the existence of limS(V)/ V for one-dimensional,
S(V(a)) . E(a„.. . , a, )
translationally invariant, continuous systems and saw I V(a) I
that some bound on the function P(l) would suffice for Now, as in Sec. II. F, for large boxes V(a) with lengths
this purpose. Strong subadditivity provides this bound +a, . . . , a„ there are integers . . , &~ such tha +„.
in the quantum-mechanical case: let E'& E. Then, =n a. +b. 0 ~b. ~a-t . Then

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

Alfred Wehrl: General properties of entropy

course, a similar statement holds in arbitrary dimen-

sions d&l. )
ap One rather needs some monotonicity, which, for in-
stance, is the case for the confzguwational entroPy, as
ap ap ap we have discussed already in example 2 of Sec. II.F.
This provides a bound for S(V) in one direction, but
Qp remember that we were left with the necessity of
proving a bound in the other direction, namely that
FIG. 8. Construction of the mean entropy for bo~es. s(r'„) n, s v(a. ) .

[One checks easily that the configurational entropy is

S[V(a)] strongly subadditive too, in fac t, e ven the gene ralize d
IV(a)l Boltzmann-Gibbs —Shannon entropy is strongly sub-
additive (Ochs, 1976).] This inequality, which is due to
and S[V(a)] ~s[v(nao)]+contributions of smaller boxes:
Robinson and Ruelle, is obtained by a more elaborate
see Fig. 8. [V(Na, ) is the box with lengths
version of the method we used in example 2, and is
n, a,(0) , . . . , n, a„(o) ] based on a combinatorial argument that is a little bit
S[V(na, ) ] ~(s+e) tricky. Note that in the case of boxes there was no need
by subadditivity.
IV nao of proving such an estimate because by definition, al-
ways S[v(a)]/IV(a)I ~ s; however, since r» need not it-
As concerns the smaller boxes, in quantum mechanics,
self be a box, a Pmoxi it is not impossible that S(1 «)
due to strong subadditivity, the entropy of any one of
& sI1"„I, and this is the point where strong subadditivity
them is ~2S[v(ao)], because they can be represented as
comes in to exclude this possibility.
the intersection of two translates of V(a, ). The number
Now, after having indicated some applications, which,
of these smaller boxes being of the order of the surface
of V(a) only, one concludes as in Sec. H. F that
as l hope, show why strong subadditivity is an important
property and what it is good for, I should like to make
iim '[V' '] =s. a few remarks about history.
Iv(a) I Strong subadditivity was known for many years in in-
formation theory, but not generally called that. For
(4) One might think it easily possible to generalize
statistical mechanics, at least, it was Robinson and
this proof to volumes that are not boxes but arbitrary,
Ruelle (1967) who coined the word and who first realized
provided that they tend to infinity in the sense of von
that is was important. 'They proved it in the classical
Hove. Unfortunately strong subadditivity does not give
case. Then Lanford and Robinson (1968) conjectured it
a bound for the entropy of the "surf ace te rm s" of a
in quantum mechanics.
covering of V by translates of V(ao), i.e. , those trans-
lates of V(ao) that have a nonempty intersection with V
For five years this conjecture was an open problem.
There were several attempts either to prove or to dis-
but are not entirely contained in V. Namely, in general
prove it but only two partial results:
these volumes cannot be represented as an intersection
of two translates of a and therefore strong subadditivity (a) a proof by Baumann and Zost (1969) for 2x2 ma-
does not help. (As E. Lieb has remarked, the previous
trices and (b) a weak version of strong subadditivity by
Araki and Lich (1970), which was powerful enough to
argument does work for states that are also rotationally
establish the existence of the mean entropy for transla-
invariant, if, in three dimensions, the surface is com-
tionally invariant states of continuous quantum systems.
posed of finitely many flat polygonal pieces. In this case
Finally, the "Lanford-Robinson conjecture" was
the shaded part of the volume, according to Fig. 9, can
proven by Lich and Ruskai (1973), using the results of
be decomposed into tetrahedrons, and every tetrahedron
Lieb concerning the so-called '%'igner —Yanase —Dyson
can be represented as an intersection of four boxes. Of
conjecture" (see Sec. C).
Let us turn to the proof of strong subadditivity for
quantum-mechanical entropy. The crucial quantity to
be considered is the conditional entropy (Lieb, 1975)

s(2l 1) = s(p„) —s(p, ) .

For simplicity we will from now on write S» for S(p»),
S, for S(p, ), etc.
We will later prove that the conditional entropy is
concave in p» (Lieb and Ruskai, 1973). This is true
both quantum-mechanically and classically, but we will
consider the quantum case only. Lieb (1975) uses the
expre ssion relative entropy. " For finite- dimensional
Hilbert spaces, it differs from our relative entropy,
see next section, by a term ln dim H, .
Ffo. 9. "Surface terms" of a covering of V by translates of The concavity of Sy2 S, implies the following in-
V(up). equality: for a density matrix pg23 in H, SH2 H3,

Rev. Mod. Phys. , Vol. 50, No. 2, ApriI 't978

250 Alfred Wehrl: General properties of entropy

Si+ S2 - S~3+ S23 (3.5) ) e~'+ (1 —)(. )e~" .

= —
[S, Trp~lnp, (Lich and Ruskai, 1973), etc. ] This On the other hand, the Peierls —Bogoliubov inequality
statement, somewhat similar to monotonicity, is ob- (Sec. I.B)
tained by considering A+a a+&a&
~ =(S„-S, )+(S„—S,) .
(B) =- TrBe"/Tre",
The mapping p», —p» being linear, S» —S, is concave
with A =Inp~2, B=(lnp —lnp lnp' +Inp') yields e&'
in p»„and, similarly, the same is true for S23 S2. '
«Tre "~~, with K =—lnp» —1np„and in the same way
Hence ~ is concave. For pure states, 4 = 0 since S» «Tre~+'"~~. By Lich's lemma,
=S„S»=S, (remark 1 of Sec. II.F). By concavity, for
mixed states b, must be ~ 0. [It should be rema. rked that ) TreIc+ Ill )) 1
+ (I y) TreE+ )nP f «Tr Ic+( k)&)) I+ (I x) P f )
inequality (3.5) is false in the classical continuous case. ]
Let us now proceed by choosing a fourth Hilbert space «Tre~' ~ = Tre '"J'» = I .

H, such that, according to remark 2 of Sec. II.F, there So we have got a proof of the concavity of the condi-
is a pure p»34 in (H, S H, H, ) (3 H, such that p»3 tional entropy by assuming the va, lidity of the lemma, .
T r4 p»34 Then
Unfortunately, the proof of the latter is not easy at all
+ S2 —S~2 —8~4 (see Sec. C).
by (3.5), which establishes strong subadditivity. B. Relative entropy
One might think that there are other inequalities of the
We have met the concept of relative entropy, which in
type of the above ones, for instance between S»3+Sy+S2 general form is due to Umegaki (1962) and Lindblad
S3 and S» + S» + S», but thi s is not the c ase . Al so S»
(1973), on several occasions already, the first being in
Sy S2 is neither c one ave nor convex. For a further
Secs. I.A (as a special case of the generalized Boltz-
discussion of which inequalities are true and which are
mann —Gibbs —Shannon entropy) and I.B. [in our dis-
not, see Lieb (1975). cussion of the free energy E(p, P, H)].
'The above is the original proof of Lieb and Ruskai of Remember that it was defined as S(ol p) — = Trp(lnp
strong subadditivity. There is another way, due to —1no). We have proven that S(vip) ~ 0 for all
Uhlmann, of proving strong subadditivity from the con- density matrices cr, p', by the way, going through our
cavity of S» —S, . Let all Hilbert spaces under con- proof of Klein's inequality, one sees that S(crlp) =0 if
sideration be finite-dimensional. Now, as above, S», and only if 0 = p',
S23 i s concave . Denote by dU3 the H aa r m ea, sure of The second important property is joint convexity for
the group of unitary operators in H3. Then density matrices pa~ p2~0j~a'2 and A. : 0 «A « I,

3p»3 3
— p» I
S((alp) )(.S(o, p, ) + (1 —)).)S((x, lp, ),
l (3.6)
where o =—)(.o, + (1 —X)cr„p=
—)(.p, + (1 —)()p, .
(d, =dimension of H, ; U, is identified with 1(g)U, ). Thus Joint convexity arises from Lieb's concavity theorem,
which we will discuss in the next section. The latter
(dU (S„—
S„)(U p U, ) - (S —5 )
p Sl), states that TrKA'K~B' ', for 0 «t «I a,nd A~O, 8& 0,
and any K is jointly concave in A and B. Hence setting
K = I, taking the derivative for t =0, one finds that
S», —S» «(S» —lnd, ) —(S, —lnd, ) . Tr(A'B' ') l, , = TrB(lnA —lnB) (3.7)
So what we have to do is to prove the concavity of S»
—S, . We will do this for the finite-dimensional case; is concave, or S(alp) is convex.
the general case follows from an application of our re- As a consequence, the conditional entropy S» —S, is
sults of Sec. II.D. 'To make things more transparent, concave: suppose all Hilbert spaces to be finite-dimen-
let us also abuse language and write p~ instead of p, sional. Then
(SI in H~(3H~. I
'The essential ingredient of the proof is the following. S» —S~ = -S' p» p ~ + lnd
I.emma (Lich, 1973b). For finite-dimensional ma-
trices, the mapping A- Tr exp(K+InA) (for A &0, & self (d, = dimension of H, ), observing that ln(p, I/d, )
adjoint) is concave. =Inp~ 001 —1(3(lnd2). (The transition to the infinite-di-
Now let p» —)).p,', + (1 —)(. ) p,",(0 «)(. « I). Define mensional case follows the methods indicated in Sec.
6 = Trp»(lnp» —lnp, ) —)). Trp» (lnp~, —ln p', ) II.D. )
There is a representation of S(alp) in which the argu-
—(1 —)() Trp,", (lnp,", —lnp,'), ment of the trace does not contain a product of two non-
E' = Trp~2 (Inp» —Inp~ —Inp~2+Inp~), commuting operators:
and b. " similarly. b. =AD'+(1 —A)I), '.
We want to show S(ol p) = sup S„(olp),
tha. t A «0, or e «I. Because of the convexity of the
exponential function, S,(o l p) =-(I/)(. )[S()(o+ (1 —~) p) —)(. S(o) —(1 —) )S(p) J (3.8)
Rev. Mod. Phys. , Vol. 50, No. 2, April tS78
Alfred Wehrl: General properties of entropy 251

(0&X&1). First note that d/dA(Xsl)(X =0) =S(v~p).

ond, X- Xsl(v~p) is concave by Eq. (2.1); hence the dif- S(v, ip, ) =S(v, e lip, el)
ferences (&S„—O. s, )/&, where clearly O. S, is under-
stood to be =0, are decreasing (Lindblad, 1974).
=S I'-'- )
Our aim is to show the lower semicontinuity
of S(v~ p).
1g U,*o 1g U,
Let us, as a preparation, give another proof of lower
dU2S 1(3, U2*P lg U,
semicontinuity of the usual entropy: (3. 12)
The function s(x) = —xlnx being continuous on the com-
(cf. Sec. II.F. In particular, S= s —IVI ing for lattice
pact interval [0, 1], one finds easily that Tr~ p„—p~ —0 systems is just —S(v~ p), with v = 1/a' " .)
implies that @(p ) —s(p) —0 since ~l& = sup&@ I& @& /
II ~l ~~ I
Equation (3.12) is a special case of a theorem of
(P~@). Thus, for every finite-dimensional projection P, Lindblad (1975), which states that S(C v~4 p) ~ S(v~ p) for
TrP[s(p„) —s(p)]-0 because of the standard inequality every completely positive, trace-preserving mapping
TrP& ~TrPII&ll (see, for instance, Dixmier, 1957). On
the other hand,
4, which maps B(H), the bounded operators in the Hil-
bert space H, into the B (H, ), the bounded operators in
Trm [[P)f Trx, another space H, .
and The notion of complete positivity was introduced by
Stinespring (1955). It means the following: let A be an
TrA = sup TrI'A.
P nxn matrix with entries A, ~c B(H), and let O'„A be the
Therefore nxn matrix with entries 4 (A&~). If 4„ is positive for
all n, then C is called completely positive. Important
S(p) = sup TrPs(p) & lim inf [sup TrPs(p„)] special cases are: (a) the partial traces, and (b) doubly
n P stochastic mappings of finite-dimensional spaces (cf.
=liminf S(p„) . Sec. I.B). In the latter case, S(p) = —S(v~ p) =1nr, with
v = I/r (r =dimension of the space). Thus we recover
our result of Sec. I.B: S(Mp) = —S(v~Mp) +In~
= —S(Mv~Mp)+1nr ~ -S(v~p)+1ny =S(p). For a discussion
S(v p) = sup TrP[s(Av+ (1 —A) p) —&s(v) —(1 —X) s(p)]
of the physical meaning of complete positivity see, for
example, Lindblad (1977) or Kraus (1970).
~ iim inf (sup TrP[s(Av„+ (1 —&)p„) + &s(v„) In the classical case Lieb's theorem is not needed for
n P, X a proof but one can argue directly:
—(1 —&) s(p. ) ]]
= lim infs(v„~
S(vl p) S(v, p, ) =
l w, du, p(xu„zv, ) 1n-p
d&ldW2p In
i.e. ,
the relative entropy is, like the usual entropy, ( )
lower semicontinuous. We have already used this fact
in Sec. II.D. xe, d Ml2p ln
pi/' 0'x
There are other theorems similar to those at the end
of Sec. II.D. For instance, define, for general A, B&0,
not nece ssarily being density matrice s,
S(A~a) = Tr[A. (InA —In@) + (a-a)]. due to the inequality 1nx ~ 1 —1/x.
Then, if P„0 1 (P„=finite-dimensional projections), Taking three Hilbert spaces, application of Eq. (13.7)
S(W„~ B„)—S(W ) a). yields S(p-. lpl p. 3) =S(pl)+S(p ) —S(p ) ~ S(p»lpl p2)
We may use convexity to prove monotonicity. Equa-
=S(p, )+S(p, ) —S(p»), hence S(p», )+S(p, ) ~s(p»)+S(p»),
tion (3.6) generalizes to S(v~ p) ~+X, S(v, ~p, ) (A, , etc,
i.e., strong subadditivity. Hence the latter is a special
being defined as usual), in finite dimensions. Also,
case of the monotonicity of the relative entropy, which,
in turn, follows from the convexity.
noting that
As an application which, of course, we could already
S(V*vV ~V+pV) =S(v~ p) (3.10) have mentioned in Sec. H. F, let us consider a variation-
for unitary operators U, me have for density matrices al principle for lattice systems. Let p„be an arbitrary
o, pin a tensor product H, {3H, of finite-dimensional family of consistent density matrices as in Sec. II.F.
Hilbert spaces, using a representation similar to that Define S(V) as usual, and E(V) = Trp~H„, H„being the
used in the last section, Hamiltonian of the volume ~V~. Also define P~
1 =(I/~V~) ln Tre v. The limits, for V- ~ in some ap-
0, = dU, 1(IU2* O. 1(SU2 propriate sense (cf. Sec. II.F), or S(V)/~V~, E(V)/~V~,
(3.11) and P» are known to exist (Araki, 1975; Ruelle, 1969).
p, = dU, 1U2* p 1(3U2 The usual techniques establish, by combining the meth-
2 ods of Sec. II.F and our arguments about the Gibbs state
and which, as one might remember, were just expressing

Rev. Mod. Phys. , Vol. 50, No. 2, April 'l978

Alfred Wehrl: General properties of entropy

the positivity of S(@8 p)

(Sec. I.B), the variational prin- No one had any idea that the Wigner —Yanase-Dyson
ciple conjecture was related to strong subadditivity. until Lieb
(1973) realized this connection. In fact, the concavity
p-s —Pg, (3.13) in p of Trp K*p ~K constitutes the key to the quantum-
= limPv, s =
with P — —limS(V)/IVl, ll =limE(V)/IVI. The left mechanical strong Subadditivity problem. Certainly,
side of (3. 13) is the "true" pressure (it does not depend without Lieb's work only a few experts would have known
on p~). The right side is a function of p„. of the Wigner —Yanase —Dyson conjecture. The Wigner-
Yanase —Dyson conjecture was proven by Lieb (1973).
It is possible to think, at least in the classical dis-
(A proof of the conjecture for 2x 2 matrices was given
crete case, of generalizations of relative entropy that
by Baumann, 1971.) Lich proved even more, namely
are analogous to the generalizations of entropy to that Tr[p, K*][p' ~, K] is also concave for K not neces-
Trf(p), being a concave function. Namely one could sarily being self-adjoint. This statement follows from
consider expressions like
the fact that, for 0 & P & 1,
E(o, p) = P t;f — *
Trp K~p ~K

is conca, ve in p. But Lieb even succeeded in proving

f being a concave function, and pl, q, being the values of
that, for A. , B» 0, the mapping
the probability distributions p and g. Based on this
observation, it has turned out to be possible to modify (TrA'K+B 'K)'"
Uhlmann's theory and to define a relation ~ roughly by
where 2 ) 1/(p+q)
is jointly concave in A and B (for
A, B) 0) and, for 2" &2, convex in K. This is Lieb's
*—QPf( )-Qt';f(' )
concavity theorem. (We already have used the special
However, these ideas have not been completely worked
case K = 1 to derive the convexity of the relative entropy
and the concavity of the conditional entropy from it. ) It
out yet (Uhlmann, 1977; Ruch and Mead, 1976).
has to be mentioned that shortly after the appearance of
Lich's proof, Epstein (1973), motivated by Lieb's work,
C. Skew entropy and the Wigner-Yanase-Dyson found another proof. Epstein's method is based on the
conjecture theory of Herglotz functions and is very powerful, be-
cause it also provides quite a few other examples of con-
In I963, Wigner and Yanase proposed a measure for ca,ve ma, ps.
the noncommutativity between a density matrix p and a The following elementary proof is a variant, due to
fixed observable K, which they called "skew informa- Simon (1977), of Uhlmann's proof (1977). The original
tion". Lieb proof is shorter, but uses complex interpolation
I(p, K) =-2Tr[p / K] (3. 14) and so is less direct.
'The "skew entropy" is its negative: Lemma. Let, for 2=1, 2, R;, S, , T, )0, [R„R,]=
. [S„S,]

S(p, K) = —,'Tr[p'/' K]' (3.15) ) Sl /2S 1

1 /2
2 + T 11/2 T 21/2 ( q)
They were able to prove a fundamental property that is P«of: lls,"yll lls,'/'qll+ IIT,'/'@ll IIT,"yll
l(ylqly)l &

valid for ordinary entropy, for skew entropy, namely, -(yl(S, +T, )ly)"(yl(s, +T, )llr)'/'because of Schwarz's
concavity in p. [Of course, it cannot be expected that inequality sls, + t, t2 & (s', + t, )' '(s', + t', )' '. (s, =- lls', "y
etc.) Hence IIRl QR2' 'll -1~ by taking + =R, ' 'x,

inva. riance holds, except for the trivial statement that, ttt

for U =unitary, S(U*pU, U*KU) =S(p, K).] (Actually, they C on sequently

did not suppose K to be bounded, so one usually has to
worry about whether [p /', K], etc. , makes sense, but I I R, ' 'R, ' 'QR, '/'R, '/' = large st eigenvalue of
ll (~ ~ ~
we will neglect these problems and henceforth hitherto =(where "spr" denotes
suppose K to be beunded. ) Of course, S(p, K) &0 for all
p, K, and it is exactly =0 if, and only if, p and K com- the spectral radius)
Later on, Dyson generalized the Wigner —Yanase en-
= spr (R, '/2qR2 ")
tropy to - IIR. "QR, "II -1.
S, (p, K) = —,'Tr([ p', K][ p'-', K]) (3.16)
(We have used that spr (AB) = spr(BA), spr(A) & II All, and
(0 & p & 1), (the degenerate case p = 0 becomes the commutativity of Rl and R, . )
' T r([ K][1n—p, K] ].
, Consider now the space of Hilbert-Schmidt operators,
i.e. , the operators with TrA. ~A &~. It is well known that
and conjectured that it was concave too. Since they become a Hilbert space with the scalar product
S~(p, K) = —TrpK'+Trp' ~Kp~K, (3.17) (A B) = T rA*B. Define
I , by R„R„etc.
which follows from Tr(AB C) = Tr(CAB ), the as-
~ ~ ~ ~ (xlR, y') = Trx+(~, + (1 —x)A, h,
sertion that S& is concave in p is equivalent to the asser- (xlR, r) =»x*1(~ B, +(1 ~)B,),
tion that Trp' ~Kp'K is concave in p. [—TrpK' is linear,
hence concave. ] (XIS, I ) = ~ TrX*A, r,

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

Alfred Wehrl: General properties of entropy 253

ete. tells us that

'Then the lemma both sides are equal), and for p» —pure, S (p») ~S„(O,)
+S (p, ).
(X ~a", 2B,"X)= TrX+(~ A, + (1 —~)A )"X(~ B, + (1 —~) B )'t'
~ TrX+(ZA )' 'X(X B )' '
+ TrX+((I y)A ) ~x((1 )B )'~
=A TrX*A'~'X B' '+ (1 —A. ) TrX*A' 'X B' ' A. Dynamical entropies
i.e. ,the lemma proves Lieb's theorem for P = 1/2. Let me start this last part with a description of the
Our method shows even more. Define, more gener- celebrated golmogorov-Sinai invariant of classical
ally, dynamical systems (Kolmogorov, 1959; Sinai, 1961,
(X~B,Y) = TrX*A'YB'-', In classical mechanics, one is given the phase space
(X~a. Y) = TrX*A'YB'-', Q and a time evolution, which is a one-parameter group
S, : (X~S, Y) = A. TrX*A, Y'B,'
of mappings 4, : 0-
B. By Liouville's theorem, these
mappings are measure-preserving; in addition, they
S, : (X~ S, Y) = A. TrX*A,'Y B', ', are diffeomorphisms.
The Kolmogorov —Sinai (KS) invariant (or KS entropy)
T, : (X~T, Y) =(1 —Z) TrX*A,'YB,' '; is constructed as follows: take a partition fn, f of Q,
T, : (X~T, Y) =(1 —~) TrX*A;YB,'-', i.e. , let Q = UO,-, the Q,- being measurable subsets of
—P. )A„B= X B, + (1 —A. ) B,) . Q, Q, 9 Q~ 0 P for i e k. (We do not concern ourselves
(A = X A, + (1 with sets of measure 0 with respect to the Liouville
'The validity of Lieb's concavity theorem at p, or q, measure introduced in Sec. I.B.) The entropy of the
states that R, ~ S, + T„R, ~ S, + T„hence by the lemma partition u =(n;j is defined as
A', 'R,' ' ~ S', 'S,' '+ T', 'T,' ', i.e. , its validity at (P+ q)/2.
'The rest follows from a simple induction and continuity
argument. (Since (X~R,X) ~ 0, a standard Schwarz in-
S(~) =- — P W(n, ) lnW(n, . ) (4. 1)

equality-type argument shows convexity in X. ) (remember our notation of Sec. I.B).

We have seen in the last sections that Lich's theorem Now consider two partitions ~, =(nfl), cu, = fnI'i).
~convexity of the relative entropy and concavity of the 'They generate a partition m, ~ ~, which consists of all
conditional entropy ~strong subadditivity. Remember, intersections QPi & Qi2i. Setting P,- =W(QPi), q,.
however, that our first proof of concavity of the con- =W(n~. 'i), ~, q . W(n, ' . p Q,.' ), S(~ v Cu, ) = —
Qx, , luau;~,
ditional entropy was based on a lemma which looked hence, by subadditivity,
rather innocent, namely on the concavity of the function S(cu, v(u, ) ~S((o, )+S((u, ) .
A- Tre~'+". We did not give a proof there and also
will not do this now since it is surprisingly complicated. „
Let 4 be one of the mappings 4 for fixed t (i.e. ,
some sort of "discrete" time evolution). Since W(C'A)
I want to indicate only that the concavity of Tre»' ~ can
be obtained from Lich's concavity theorem through a =W(A) for any measurable subset Ac:n, one has
sequence of lemmas. (See Lich's Advances in Mathe- S(c'~) = S(~) (4.3)
matics paper, 1973b. Epstein's proof of Lieb's theo-
rem also gives, among other results, a direct proof of for every partition (with C &u =—(4Q;)). Therefore our
the concavity of Tre~''"". ) So, in conclusion, Lieb's arguments of Sec. II.F show the existence of
theorem is in fact the essential tool in all the considera-
tions of this section. (4. 4)
Let us come back to skew entropy. In consideration of
another most important property of ordinary entropy, There is another way of looking at this limit. Define
Wigner and Yanase proposed the following generaliza- the conditionalentropy S(&u„u, ) of two partiti. ons as
tion of subadditivity:
" =-g~„&n~, +p
S,(p. . L) - S (p„K,) +S (p„K,)
—K~1+1 K2. This assertion ean
(3.18) s(w„w)=g qs ( q,. &nq, .

with L= be rewritten = S(~& v 4&2) —S((a)2),

(4. 5)
i.e. , as the classical analog of the conditional entropy
'K, + T r p, K, p,'
T r p~K, p, ~K, - —2 T r p„[K, K, J we frequently were concerned with in Sec. III. (The
+ Trpg2 Lpj2 L . (3.19)
Thi. sis true if pn = pi(3p2, or if K~, or K„respectively, r, , w(n'. "nn(2&)
=0 (Lieb, 1973). Also is can be proven for p= 1/2 pro q, w(n,'. ")
vided that py2 ls pure. It is an open question whether it is called "conditional expectation" in probability theory;
is generally true. Anyway, after all, skew entropy is a this explains the word "conditional entropy. ") Note that
sort of "relative o. entropy" (cf. Sec. II.E), and for n S(&u„w, ) ~ 0 since entropy is monotonic in the classi-
entropies subadditivity does not hold, although they also cal discrete ease.
have the property that S„(p, R p, ) = S~(p, )+S~(p, ) (in fact, 'Therefore the difference h„equals

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

Alfred Wehrl: General properties of entropy

—S((u ~ .
6„= .~ C "(u) —S((u v - ~ ~ ~ C" 1g) system (clearly, C then means "space translation"; Rob-
=S(C'" cu, u ~ . ~ ~C " 'cu) . (4. 6)
inson and Ruelle, 1967). Then, if s (the mean entropy)
is &~, KS and mean entropy coincide. This means that
Due to strong subadditivity, the KS entropy, in essence, is a, mean entropy, (By the
way, it is possible to generalize KS entropy by replacing
S(o. , 0) ~ S(n, Pvy) (4. 7) the group of discrete time translation by more general
for any partitions n, P, y [because of Eq. (15.5)], hence groups, for instance, Z~. Many of the important results
consequently, lim&„exists, and since then, after obvious modifications, remain valid. )
There is a serious problem with the KS entropy because
S((uv ~ v C" '(u) =S((u)+ &, + ~ + &„, ,
~ ~ ~
it refers to a discrete time evolution. In the more real-
— istic case of a continuous one-parameter group 4, of
lim (S((uv. v C" '(u)) = limS(C" '(u, &uv ~ ~ v C" '~) time evolutions the construction presented above does
not work for two reasons: (a) it is not obvious by what
= lim &„=s(~, C ) . (4. 8) quantity S(wvC &uv vC" 'u) has to be replaced. In pa, r-
The entropy of C (KS invariant) is defined as ticular, there may arise measurability questions be-
cause in the continuous case uncountable unions and in-
s (C ) =— sups (~, C ), (4. 9) tersections of the sets C,Q& are involved which need not
the sup being taken over all finite partitions co. be measurable. (b) If we adopt the view that the KS en-
It should be noted that, in contradistinction to usual tropy is a mean entropy, then, certainly, in the con-
notions of entropy, this kind of entropy is not a function tinuous case strong subadditivity enters in a very essen-
of a state but rather a function of the dynamics of the tial way. Thus, in any case, the construction of an anal-
system. og of the KS entropy in the continuous case must be much
The Kolmogorov-Sinai invariant has the following im- more sophisticated.
portant properties: As concerns quantum mechanics, one could think of
(1) It is an inva1'lant of the dynamical system in the fol- imitating the original method of Kolmogorov and Sinai
lowing sense: the system is described by Q, the Liou- according to our "translation table" of Sec. I.B. How-
ville measure p, , and the measure-preserving one-to- ever, this does not work in general. The difficulty lies
one mapping 4: Q- Q. Suppose there is another triple in the possible noncommutativity. In quantum mechan-
Q', p, ', 4 ' with the same properties, and an isomorphism ics, clearly a partition ~ has to be defined as a set of
f: Q-Q', etc. , such that the diagram pairwise orthogonal projections P„wit hZP, = 1. How-
ever, if we are given two partitions cu, =(PI and co, "}
=(P&"']-, then it is unclear how to define a, v w„since
the products I"& "I'&" in general will not be projections;
they will not even be Hermitian. Also the dimension of
the algebra generated by co, and co, can be exceedingly
large, so that in any case subadditivity arguments cannot
be used.
There is partial success in constructing a KS entropy
for quantum-mechanical K systems (Emch, 1976). They
is commutative. Then s(C') = s(C ). are analogs of the classical K systems (Kolmogorov,
(2) Kolmogorov's theorem. The partition o is called 1953), which are systems with a mixing property that is
a generator if the 0 algebra generated by the sets much stronger than the mixing property we have used
C (&)(m = 0, al, a2, . . . , A c &) is all measurable subsets in Sec. I. B. Unfortunately, this is rather lengthy to de-
of Q. Then scribe and demands a good knowledge of the theory of
von Neumann algebras, so I must refer the reader to the
s(C) =s(o, C) (4. 10)
original papers. There is also a construction for Ber-
(Kolmogorov, 1953, 1959). noulli shifts on the hyperfinite II, factor by Connes and
Before stating the next important property, one re- Stdrmer (1975).
mark should be made. Namely, all our considerations Recently, Lindblad (1977) succeeded in giving a de-
above apply to abstract dynamical systems too, where finition of a quantum analog of the KS entropy which is
Q need not be phase space or even any smooth manifold, not based on a noncommutative generalization of par-
but can be any set. Also 4 need not have anything to do titions but is rather analogous to the definition of the
with time evolution but can be any automorphism, for mean entropy for quantum lattice systems.
instance, space translation, or any symmetry operation. Besides its interpretation as a mean entropy, KS en-
There is the following theorem relevant to classical dy- tropy can also be taken as a measure of the strength of
namical systems. mixing of 4. Remember that
~oucknir enko's tkeoxem. The KS entropy of finite clas-
sical dynamical systems is finite. (For abstract sys- s(Q3, @) = lim [S ((dv ' ' ' v4 (d) —S (cd v ' 'v g? co)] .
tems it may be infinite. ) (Kouchnirenko, 1965, 1967.)
The construction of the KS entropy is very similar to Let n= 1; the following argument can easily be trans-
the one of the mean entropy in Sec. II. F. It can be shown ferred to the general case. If S(srvC u) =S(~), then C u
that for classical lattice systems one can find a trans- =m, i.e. , 4 leaves the sets Q, unchanged. If, on the
formation such that they become an abstract dynamical other hand, the difference S(mvC &o) —S(w) is big, this

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

Alfred Wehrl: General properties of entropy

[i.e. ,S~„(p, u) is just the S(&u) of the last section, but

with being a partition of one-dimensional projec-
(o now
tions rather than a finite partition. ] The effect of a. mea-
surement may be described by transforming the original
density matrix p into

(4. 12)

F IG. 10. Interpretation of the Kolmogorov-Sinai-invariant. We know that p„pp, hence S(p„)~ S(p). Performing
another measurement corresponding to another partition
co', one obtains
means that the intersection of every set 40, with the
original Q~ must be quite significant. (See Fig. 10. The
shaded area is 4Q„. we have not represented the other sets i.e. ,again a loss of information. For details see Wehrl,
4Q, ' ~ CQ, for reasons of clearness. ) Thus big KS en- 1977, and Staszewski, 1977. It also arises in other sit-
tropy means that the sets of any partition co get rapidly uations. Let co be the set of spectral projections of a
distributed over the whole phase space, and that the sys- Hamiltonian. H (and let us assume that there are no de-
tem exhibits strong mixing properties. generacies). Then, using the notation after Eq. (2. 2),
Similar to KS entropy are Kouchnirenko's A entropies: S,„(p, (u) =S(p„) .
let A be a sequence of integers a, &a, &a, & ~ ~ . Then .
IU entropy has (of course, besides invariance) many
s A. (m, C ) = lim sup —
[S(C "(ov ' ' ' V O' "M)] properties in common with classical discrete entropy,
n for instance concavity, additivity, and subadditivity (the
latter ones in some appropriate sense). There are also
continuous analogs of it (Grabowski, 1977). Since S(p)
s~ (4') = sup sA(~~ 4 ) . = inf„S~U(p, co), and =Sz„(p, (o) if and only if u consists
of the spectral projections of p, the quantity Sz„(p, &u)
They also are invariants of dynamical systems(cf. —S(p) may be considered as a measure of noncommuta-
Arnold and Avez, 1969). tivity between p and the partition co.
Some concepts measuring the amount of information
have been described. The list is not exhaustive and it
B. Various other concepts is left to everyone to invent new such quantities. How-
ever, it will be very hard to establish their physical
On several occasions we already have met entropylike meaning.
concepts that were of a certain use, either directly in
physics, as, for instance, the coarse-grained entropy,
or in order to show that certain properties of the "right" C. Systems with infinitely many degrees of freedom
entropy were not as obvious as one might think at first.
Many theorems of statistical mechanics refer to the
Let me write down a short list of these concepts, as
infinite case, i. e. , systems with infinitely many particles
far as we were concerned with them, or as they seem
moving in an infinite volume. %e have seen that only in
to have a certain relevance for physics.
this case phenomena such as quantum-mechanical. er-
(1) Coarse-grained entropy (see Sec. I.B).
godicity, etc. can be expected to hold in a rigorous man-
(2) n entropies (see See. II. E). One property of n en-
tropies should be added: for n&1, they are continuous,
i. e. , Tr~ p„—p~ -0
implies S (p„)-S(n). For fixed p,
the mappings n-S (p) are convex and decreasing; since 1. Description of infinite systems
S(p) = sup~» S (p), this provides a third proof of lower In Sec. II. F we obtained a description of infinitely ex-
semicontinuity of entropy.
tended systems by attaching to every bounded region V
(3) Daroczy and other entropies (see Sec. I.G). a Hilbert space H~ and a density matrix p~; thus we sup-
(4) Measures of noncommutativity (see Sec. III. C). posed the family of density matrices to be compatible.
(5) Inga. rden —Urbanik (IU) entropy (Ingarden and Ur- Remember that in the continuous case H~ was the Fock
banik, 1962; Ingarden, 1965, 1973). This concept in fact
space SH~(V), with H~~(V) being the space of symmetric
appeared very early, namely in the papers of the Ehren-
fests (1911), Pauli (1928), and von Neumann (1929), but (or, antisymmetric, respectively) square-integrable
functions g(x„. . . , xz), where the arguments x, were re-
was intensively studied in the 1960s. It arises in con-
nection with conslderatlons about the measul ement pro- stricted to V.
cess. Let ur =(P, } be a partition of one-dimensional pro- It seems to be quite natural to describe an infinitely
extended system in d dimensions simply by replacing V
jections, i. e. , commensurable "counters" in physical
language. Then a measurement yields the numbers p& by R". The Hilbert space then would be
= Trp P&, and the amount of information obtained by this H'(R~) = C 6 L'(R~) 6 [L'(R~) L'(R )] S ~ ~
. (4. 13)
measurement clearly is s(a)
This construction makes perfect sense. The unfortunate
(4. 11) thing, however, is that,in general, there is no density

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

256 Alfred Wehrl: General properties of entropy

matrix in this space describing the state. To be more For a Gibbs state this means that it depends on the tem-
precise, for any bounded region V, as in Eq. (1.9), perature.
H+(V)@H'(R )V) = H'(R ), (4. 14) We have not yet said anything about time evolution. Re-
garding that, from the algebraic point of view, the time
but there is no density matrix p in H'(R~) such that
evolution in 8 H (V)], the bounded linear operators on

Tr H ~(R~y v) p pv H (V),

If such a density matrix existed, then, for instance, the etHtT e- fHt )
particle density n = IimN(V)/~ V would be = 0. This
means that the Hilbert space (4. 13) cannot be the right
is else than an automorphism of the algebra A(V)
one for the description of the system.
= &! H(V)]; itis natural to consider the time evolution in
A also as an automorphism of A (or, better, as a one-
The algebraic approach (see Ruelie, 1969; Eckmann
and Guenin, 1969; Emch, 1972) now essentially proposes
parameter group of automorphisms Tt: A —A). In gen-
the following procedure:
eral there will not be a Hamiltonian II(=—A such that
Since it is at first unclear what the right Hilbert space etHtA e tHt-
of the system is, one should not worry too much about.
However, in the GAS construction performed with a
it. One should rather concentrate on the operators re- time-invariant state td, i. e. , an cu such that (d(A) = ~&(&g)
presenting the observables of the system. Let A(V) be
for all t, there exists a H„such that
the algebra, for all operators on H(V), for V bounded. If
V'~ V, then every operator T c A (V) can be identified (gg) eiH~t~ (~) 8-tH~t (4. 17)
with the operator T' = Tl on H(V') = H(V)Q H(V' jV),
hence A(U) c: A(V') (isotony). Define
In that case, Q„ is invariant:
A—U A(V). (4. 15)
[Note that H„ in general neither belongs to v„(A) nor can
(4. 18)

This is again an algebra since sums and products make

be constructed from it by some limiting procedure. ]
sense. Also there is a norm defined on it, namely, if
Admittedly the above scheme looks a little bit com-
Tc: A(V), just the usual operator norm Hence . A is a
normed algebra, and, if we take its norm completion
plicated but on the other hand it is very powerful. be-
cause it not only gives a description of infinite continuous
(which by abuse of language we also will denote by A),
quantum systems but also covers the cases of finite con-
it becomes a C* algebra.
tinuous quantum systems, finite and infinite quantum lat-
Every family of density matrices pv defines a state on
this algebra: l. et T c A(V). Define
tice systems, and all sorts of classical systems. [Once
more, Ruelle's book (1969) should be consulted for these
ct) (T ) = T r p v T . questions. ]
We know that, for V'~ V,
Trp„.(TI31) = Trp„T = (u(T), 2. Mixing
Tl being an operator on H(V)E H(V')V) = H(V'). Hence, Utilizing the algebraic approach we are now in the pos-
this definition of w extends to every element of all of A ition to deat. with the problems of ergodicity and mixing
and makes sense. Note that u(1) = 1, and, for T ~ 0, co(T) (cf. Sec. I. B) in quantum mechanics. We have discussed
~ 0. Clearly, ~ also is linear, i. e. , in the language of in Sec. I. B the fact that mixing means
mathematic s, it is a positive, nor med, linear functional —u(P)co(Q) as t-+~
on A. u(PtQ)
There is now a canonical way of constructing a Hilbert [Eq. (2.23a)] . Whereas this turned out to be impossible
space for the system: the so-called Gelfand —Naimark- for finite quantum systems, it is very well possible in
Segal (GNS) construction (Segal, 1951; cf. also Dixmier, infinite systems that there is an invariant state m such
1964). It tells us that (up to isomorphisms) there is ex- that, for any two elements A, E(= A,
actly one Hilbert space H„, a representation &„of A
[i.e. , a homomorphism w„: A —B(H„)], and a. unique cy- lim (u(vtA. B) = co(A)td(B) . (4.19)
clic vector Q„E H„such that
(The concept of ergodicity for invariant states on C*
algebras was introdued by Segal, 1951.)
(u(A) =(Q„ i
tt„(A. ) i
0„) In the GNS representation this reads
for all AaA. By "cyclic vector" is meant that the set (4.20)
tt„(A) 0„) is dense in H„.

for t -+~,by virtue of Eq. (4.18). H„does not have any

Of course, as a rule, this Hilbert space will be entire- other eigenvector than Q„, because 11„/ =X', $ ortho-
ly different from Fock space. The mathematical reason gonal to Q„, implies
for this is that, in the case of infinitely many degrees of
freedom, there are infinitely many inequivalent repre-
sentations of the canonical commutation (or anticommu- Hence II„has a continuous spectrum on the orthogonal
tation) relations (cf. Emch, 1972). It is important to complement of Q„.
note that the Hilbert space H„depends explicitly on cu. It is usually supposed that the time evolution is asymp-

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

Alfred Wehrl: General properties of entropy 257

totically Abelian (Doplicher, Kastler, and Robinson, KMS states have attracted great interest in recent
1966; Huelle, 1966; Doplicher, Kastler, Kadison, and years both from the physical and the mathematical side,
Robinson, 1967), i.e. , that the commutator [v, A; B.]. and there is a rich literature about them (the study of
vanishes in some appropriate sense as t-+~. (There KMS states was initiated by IIaag, Hugenholtz, and
are different notions of this property which we do not Winnink, 196V). Let me just mention a few results.
want to discuss in detail here. So let us for simplicity (1) KMS states are automatically time invariant.
suppose that ll[v', A, B]11-0 as t-~.
) This is certainly (2') To a given state u there is exactly one group of
true for free systems where the commutator goes as time automorphisms r, such that a is KMS for them.
t ' '.
For systems with repulsive forces only, one can (However, there may be more than one KMS state for
expect even stronger commutation properties, and for a given time evolution. )
attractive forces, asymptotic Abelianness will pre- (3) KMS states can be decomposed into extremal ones,
sumbaly hold as long as the attraction is not too strong. i.e. , those that cannot be written as a genuine convex
If asymptotic Abelianness is true, then lim&u(C(v, A)B) combination of two other KMS states. These extremal
=lim~(r, A BC) =&@(A)co(BC), hence in the GNS con- KMS states are factorial, i.e. , w„(A)" is a factor.
struction m„(v, A} —co(A) times the unit operator in H„. [w (A)' =set of all operators on H„ that commute with
Now let co' be a state that is normal with respect to (d, all of 7t'„(A), m„(A)" =all operators that commute with
by which we mean that there exists a density matrix p' ~„(A)'. m„(A)' is called the commutant of m„(A), m„(A)"
in H„such that ~'(A) = Trp'~„(A). Then ~'(v, A) the bicommutant. "Factor" means that the center
= Trp'm„(v, A)- ur(A); i.e. , states not too far from a m„(A)'Q w„(A)" consists of the multiples of the identity
mixing state converge towards the latter. This is a only. j
rigorous result concerning approach to equilibrium (4) Factorial states (whether they are KMS or not) are
(Sec. I.B). always mixing.
Now take a bounded subvolume V. For any A A(V), c (5) m„(A)' and m„(A)" are anti-isomorphic. There is a
(A) = ~'(v, A) —~(A). Let p~, p~ be density matrices
— deep theory studying this symmetry; the so-called
on H(V) defined by Tomita- Takesaki theory, which is one of the mo st f ruit-
ful recent concepts in the field of operator algebras
(d (A) = Trp~A,
(Takesaki, 19VO).
u(A) = TrpvA,
and let p~(t) be the time evolution of pv, defined in an
obvious manner. Then 4. Stability.
Tr p„'(t) A —Tr pvA, We already have mentioned stability properties of
equilibrium states: small perturbations of the dynamics
and, consequently, Tr~pv(t) —pv~-0 (see Davies, 19V2; do not lead to global changes of the state. I et me sketch
Wehrl, 1976). As we have seen in Sec. II.D, this does one result in this direction (Haag, Kastler, and Trych-
not necessarily imply that S(p~(t))-S(pg, but under Pohlmeyer, 1974; Haag and Trych-Pohlmeyer, 1977;
some weak additional assumptions (which in general can for another approach, cf. Araki and Sewell, 1977).
be expected to be fulfilled) this will in fact be true. A small, local perturbation of the dynamics may be
It usually is not possible to define the entropy of the described by changing v, to v~", where ~~~" is defined via
state co of the whole system; any sensible definition its infinitesimal generator (which is in mathematical
would give S =~. However, in addition to the mean en- language a derivation of the algebra A) as
tropy (Secs. II.F and III.A), one can define the relative
entropy of two states by i —
et '
v' — +Ah
=i et 7'

S(~' ~) =- limS(p~ pv) . (4.21)

with bc=-A. (One can also write down v~" directly as an
~ ~

This concept turns out to be very useful for infinite sys- infinite series involving time-ordered integrals of multi-
tems too; however, due to mathematical complications commutators. ) Let co (or m~", respectively) where &u~"
(one has to know about Tomita —Takesaki theory}, we is defined in a similar way to r ~", be a time-invariant
have to refer the reader to the literature (Araki, 1975). state of the unperturbed, or perturbed, system, re-
spectively, and suppose that for every h

3. KIVIS states
Iles~" cuff-0 as x-0 . (4.23)
~ (for s~all ~),
If f II[v, A, B] lid« ~, fll[v& "A, B] lldt&
In general, for an infinite system there exists no
then, for factorial states ~, they turn out to be KMS for
operator H belonging to A or which can be constructed some P. (The P comes in as some "modulus of stabil-
as a limit: of elements of A such that the time evolution ity. ") On the other hand, every factorial KMS state has
is given by A. —O' 'A. e ' '. Therefore one also cannot the stability property (4.23).
use Eq. (1.39) to describe Gibbs states. But one can
Let me close with a few words about a very general
use the KMS condition (Sec. I.B) in order to obtain an
concept of entropy that refers to von Neumann algebras,
analog of them: A. state (d is called a KMS state at in- i.e. , weakly closed*- algebras of operators containing the
verse temperature P if there is a function E(z) with the identity. [Examples are m(A)', w(A)", in fact von Neu-
analyticity properties stated after Eq. (1.47), namely mann algebras are exactly those operator algebras N
cu(Bv', A) =E(t), &u(v, AB) =E(t+zP) . (4.22) for which N = N".]

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

Alfred Wehrl: General properties of entropy

5. Segal entropy (Segal, 196O) Aczel, J., B. Forte, and C. T. Ng, 1974, Adv. Appl. Prob. 6,
We have remarked at the end of Sec. I.A that this is in Alberti, P. , 1973, thesis, Leipzig.
.some sense the most general concept of entropy. It is Alberti, P. , 1977, to be published in Wiss. Z. Karl-Marx-Univ. ,
defined as follows: let N C B(H) be a von Neumann alge- Leipzig.
bra. Let 4 be a faithful normal-semifinite trace on N, dell'Antonio, G. F., 1967, Commun. Pure Appl. Math. 20, 413.
i.e. , a mapping of the positive part of N into [0, ~] such Araki, H. , 1975, preprint RIMS 190, Kyoto.
that 4 (R) w 0 if R w 0, 4 (tR) = X4 (R) (X ~ 0), 4 (R + S) = 4 (R) Araki, H„and E. Lieb, 1970, Commun. Math. Phys 18, 160 .
Araki, H. , and G. L. Sewell, 1977, Commun. Math. Phys. 52,
+4'(S), 4'(&*R&) =4 (R) for U =unitary, HN; furthermore 103.
if R„0R, then 4(R„) %4(R); and finally that, to every R Arnold, V. , and A. Avez, 1969, Ergodic Problems of Classical
there exists S c0, ~R, with 4 (S) & ~. Mechanics (Benjamin, New York).
(The usual trace Tr ~ fulfills all requirements; how Baumann, F., and R. Jost, 1969, in The Problems of Theore-
ever, there are algebras such that Tr T =~ for every tical Physics: Essays Devoted to Bogoliubov (Nauka,
¹ ¹

positive Tc 0). Moscow).

If P is a normal state that can be written as g( ~ ) Baumann, F., 1971, Helv. Phys. Acta 44, 95.
=4(p ~ ), for some p (the set of all those $ is dense), Bayer, W. , and W. Ochs, 1973, Z. Naturforsch. A 28a, 693.
Berezin, F., 1972a, Izv. Akad. Nauk SSSR Ser. Mat. 36, 5.
then let p = fo" XdE(h)be its sp. ectral decomposition. Berezin, F. , 1972b, Mat. Sb. Nov. Ser. 86 (128), 4 (12); 88
[All R(&) belong to N, ] The Segal entropy is defined as (130); 2 (6).
Billingsley, P. , 1965, Ergodic Theory and Information (Wiley,
(4.24) New York).
Birkhoff, G. D. , Proc. Natl. Acad. Sci. USA 17, 656.
The most important special cases are: Blau, J. M. , 1959, Prog. Theor. Phys. 22, 745.
Boltzmann, L. , 1872, Wiener Ber. 66, 275.
(1) Let (0, p, ) be measure space. Take H =L'(Q, p},
Boltzmann, L. , 1877a, Wiener Her. 75, 67.
N =L"(0, p, ) (which is a von Neumann algebra). The
4 (f ) = f d p, for f c I." is a, trace in the above
Boltzmann, L. , 1877b, .Wiener Ber. 76, 373.
mapping f Boltzmann, L. , 1896, Vorlesungen uber Gastheorie (Barth,
sense. Let g(f) = f g . g is the Radon —Nikodym Leipzig) .
derivative of the measure v defined by Brillouin, L., 1962, Science and Information Theory (Academ-
ic New York). Reflections
g fdp, ,
Brush, S., 1965, 1966, Kinetic Theory I, II {Pergamon, Ox-
Burbury, S., 1890, Philos. Mag. 30, 301.
and hence Carnot, S., 1824, sur la puissance matrice du feu
S(P 4) =S(v
i i
p, ), (Bachelier, Paris).
Clausius, R. , 1850, Uber die bezvegende Kraft der Warme
i.e. , the generalized BGS entropy (Sec. I.A) . (Ostwalds Klassiker, Leipzig, 1898).
(2} Let=B(H), 4 =the usual trace Tr, $ be the state
N Clausius, R. , 1865, Ann. Phys. 125, 353.
given by P( ~ ) = Trp ~ . Then S(g ~4) =the usual quantum- Cohen, E. G. D. , and W. Thirrirg, eds. , 1972, The Bolt@mann
mechanical entropy S(p). Equation (Springer, Vienna).
Connes, A. , and E. Stgrmer, 1975, Acta Math. 134, 289.
Many of the properties of classical and quantum-me-
Daroczy, Z. , 1970, Inf. Control 16, 36.
chanical entropy have generalizations to Segal entropy, Davies, E., 1972, Commun. Math. Phys. 27, 309.
and there are also characterization theorems similar to Davies, E. , 1976, Quantum Theory of OPen Systems(Acadeenic,
those of Sec. II.B and F (Ochs and Spohn, 1976). London) .
Dixmier, J., 1957 (1969), Les algebres d'opAateurs dans
l*espace Hilbertien (Gauthier-Villars, Paris).
Dixmier, J., 1964, Les C algebres et leurs representations
ACKNOWLEDGMENTS (Gauthiers-Villars, Paris).
Doplicher, S., D. Kastler, R. Kadison, and D. Robinson, 1967,
I am most grateful to Professors S. B. Treiman and Commun. Math. Phys. 6, 101.
E. H. I,ieb for suggesting this paper and for their en- Doplicher, S., D. Kastler, and D. Robinson, 1966, Commun.
couragement. Moreover, I have obtained invaluable aid Math. Phys. 3, 1.
from Professor I,ieb during the preparation of this work. Dyson, F. J., 1967, J. Math. Phys. 8, 1538.
Dyson, F. J., and A. Lenard, 1967, J. Math. Phys. 8, 423.
I also acknowledge useful comments by H. Spohn. I have Eckmann, J. P. , and M. Guenin, 1969, Methodes algebriques
learned many facts about entropy verbally from my dans la mecanique statistique (Springer, Berlin).
teacher, Professor W. Thirring. Several theorems were Ehrenfest, P. and T. , 1911, Encyclopadie der math. Wiss. 4,
communicated to me by Professor A. Uhlmann. Finally Article 32.
I want to thank my colleagues Heide Narnhofer, A. Pf lug, Einstein, A. , 1914, Verh. Dtsch. Phys. Ges. 12, 820.
and G. Siegl, as well as Franzi Wagner. Emch, G. , 1972, Algebraic Methods in Statistical Mechanics
and Quantum I"ield Theory (Wiley, New York).
Emch, G. , 1976a, Acta Phys. Austriaca, Suppl XV, 79.
Emch, G. , 1976b, Commun. Math. Phys. 49, 191.
REFERENCES Epstein, H. , 1973, Commun. Math. Phys. 31, 317.
Erdos, P. , 1946, Ann. Math. 47, 1.
Aczel, J., and Z. Daroczy, 1963, C. R. Acad. Sci. Paris 257, Fano, U. , 1957, Rev. Mod. Phys. 29, 74.
1581. Feller, W. , 1950, An Introduction to Probability Theory and
Aczel, J.,
1974, On Measures of Information and Thei Char- Its Applications (Wiley-Chapman, New York).
actemzations (Academic, New York). Fierz, M. , 1955, Helv. Phys. Acta 28, 705.

Rev. Mod. Phys. , Vol. 50, No. 2, April 1978

Al fred Wehrl: General properties of entropy

