This Content Downloaded From 146.199.60.115 On Mon, 19 Dec 2022 20:16:43 UTC
This Content Downloaded From 146.199.60.115 On Mon, 19 Dec 2022 20:16:43 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
and Wiley are collaborating with JSTOR to digitize, preserve and extend access to International
Economic Review
BY BENOIT MANDELBROT'
SUMMARY
79
1. INTRODUCTION
1.1. The strong Pareto law. Let us begin with a basic variant,
which will be referred to as the strong Pareto law. Let P(u) be
the percentage of individuals with an income U (over some fixed period
of reference) exceeding some number u (u is assumed to be a conr-
tinuous variable). The strong Pareto law asserts that:
((u/?u)-, when u > u0
1P when u < u.
n This associates Pareto's name with that of Paul Levy, wh
tributions intensively. (See in particular Levy [10], [11], [12].) To the best of our knowl-
edge the application of these laws to economics is entirely new as well as are the definition
and properties of the "Pareto-Levy" processes of [18), [19), [20).
P(lu), (u/u? Mo
That is:
P() -1, as u oo
(uIuo)}0
or,
planation. We shall see that the weak Pareto law (b = 0) has some
crucial properties, which wsTill be used as a basis for a theory, and
which disappear if b / 0. We shall therefore disregard the case b t 0
(the more so since the formula 'above provides no improvement for
middle values of ut). (See 2.4 for a further discussion of b = 0.)
B. The best known statement about income distribution, apart from
the weak Pareto law, is the log-normal law [8], [1], which claims that
the variable log U [or perhaps better log (U-u'), where u'>0] is well
represented by the Gaussian distribution. The empirical evidence for
this is that the graph of (P, u) on log-normal paper seems to be straight.
Such a graph emphasizes a different range of values of u from that
of the Pareto graph, so that the graphical evidence for the two laws
is not contradictory. The motivation for the log-normal law is, how-
ever, largely theoretical (see 1.5). Roughly speaking, a log-normal U
can be explained by assuming that log U is the sum of many additive
components.
1.5. Some existing models of income distribution considered as
"thermodynamic" theories. There is a great temptation to consider
the exchanges of money which occur in economic interaction as analo-
gous to the exchanges of energy which occur in physical shocks between
gas molecules. In the loosest possible terms, both kinds of interactions
6"should" lead to "similar" states of equilibrium. That is, one "should"
be able to explain the law of income distribution by a model similar
to that used in statistical thermodynamics: many authors have done
so explicitly, and all the others of whom we know have done so im-
plicitly.
Unfortunately, the Pareto P(u) decreases much more slowly than
any of the usual laws of physics, so that if one wants to apply the
physical theory mechanically, one must somehow argud that U is a less
intrinsic variable than some slowly increasing function V(U). For
that, one must renounce the additive properties of U. The seemingly
universal choice for V is V = log U [or V' r log (U - u')]. This choice
is suggested by the fact that one plots log<u empirically. But it can
also be traced back to the "moral wealth" of Berpouilli, and it ap-
parently can be justified by some law of proportionate effect, or by
some kind of Weber-Fechner law. If this choice of V is granted, one
has to explain the normal law for the middle zone of v's, and the
exponential law P(v) = exp {- a(v - v0)} for the upper zone of v's.
Indeed, many existing models of the Pareto distribution are reduci-
ble to the observation that (for any a) exp (- av) is a possible bar-
ometric density distribution in the atmosphere. Alternatively, con-
latter case, the stable law may be called positive, an abbreviation for
"gmaximally skewed in the positive direction."
Since the usual terminology of probability (stable distributions) does
not serve our purpose, we propose to refer to "positive" stable dis-
tributions with 1 < a < 2, as "Pareto-Levy distributions."
It is not strictly true that the distribution of income is invariant
over the whole range of u, with respect to a change in definition of
income. We argue, however, that if the variable U is not Gaussian
and is very skewed, the most reasonable "first order" assumption
about income is that it is a P-L variable.
The density p(u) of the P-L law unfortunately cannot be expressed
in a closed analytic formi, but is determined by its bilateral Laplace
transform (valid for b > 0):
02
I L (D~~~~~~~
O D 49 CV
0O00 0 0 0 00 0o Il
0 qD t N ? @ eP ? C
N CJ C4 C4 clJ
(n)d ?
N~~~~~
_+-
+
5 2 0 0 O 0
+
_ n
0Q (D C\ N N Q O, 0.O
+
(n) d
+0 4 z
NN
CD C D CNJ 0O 0 e C\J O CD eD v N O
(n) d
CQ NQ Q o
I I l I I I I I I I 0 0 0 ( I
Finally, near E(U) and the most likely value of U, the P--L curve
has the kind of skewness which one finds in the empirical data and
which one hopes to derive in a theoretical curve.
Let a now approach 2. At the limit, the P-L density will tend
toward a Gaussian density. Close to the limit, the P-L density already
resembles a Gaussian one. Only for large values of u, (which have
a very small probability) is the Gaussian decrease of P(u) replaced by
a Paretian decrease.
In other terms, the P-L density is worth considering only if a is
not "too close" to 2. Even in this range, however, the prediction for
intermediate values of u is quite inadequate to cover the complete
curve of incomes, as described for example by Miller [22]. But as
part-time and unskilled workers are eliminated we more nearly approx-
imate such a curve. On the other hand, the income distribution of
unskilled workers is fairly symmetric and has a fairly small dispersion.'
Since the distribution of unskilled wages differs from that of high
incomes, the P-L law would a priori explain only the latter incomes.
If we examine any income category as defined by the Bureau of Census,
we can seldom tell in advance whether its income mechanism is closer
to that of the unskilled workers or to that of the largest income re-
cipients. Therefore, if we wish to determine how wide are the cate-
gories to which the P-L law applies, the only solution is to start from
the P-L interpolation of high incomes, and then to see how many
other incomes (and which) must be added to obtain a good fit. This
establishes a distinction between two kinds of income categories. The
reasonableness of this distinction should be checked on independent
grounds. For the time being we shall be content to observe that for
a far enough from 2 the prediction made by the P-L law is "reasona-
ble." (cf. 2.6.)
We shall conclude this section with a few statements concerning
non-P-L stable distributions. The Gaussian distribution is of course
stable, and it is the only stable distribution with a finite variance.
Further, all the stable variables with a finite mean are differences of
P-L variables scaled by arbitrary positive coefficients. The bilateral
generating function is no longer defined, and we must consider the
usual characteristic function (cof.). For a P-L variable, the c.f. is im-
mediately obtained as
B Every stable law is such a limit. To prove this, assume that the variables Us them-
selves are stable; by induction of stability, the e-sum v 1 U8 will be a variable
of the form a(N)U + b(N) and hence {a(N)I-I{(ZfN1Ui) - b(N)} will have the same
distribution function as U. Conversely, assume that a certain normed sum of the variables
Ui has a limit. Write the N-th normed sum, taken in the sense of E, as:
N
A(N) Us - B(N)
S-1
(cf. Le6vy [10], [111, [12], Gnedenko and Kolmogoroff [8]). We might,
therefore, have introduced the P-L law without special motivatior
other than the fact that the known asymptotic behavior of P(u) and
its recently computed behavior for intermediate u make it an attrac-
tive interpolation for the income data (Mandelbrot [18]). Unfortunately,
the behavior of tlle P-L law is in many other ways quite different
from that to which most statisticians are accustomed from their con-
stant handling of the Gaussian law. The only way of really showing
how far this law is suited to the present problem is to sketch its
theory, starting with a heuristic introduction to its properties (from
the viewpoint of addition and of the study of extremal values), and
ending with some rigorous results.
We see how the three main drawbacks of the classical theories
vanish. Many variants of the same argument ali lead to the same
result; no change of scale of U is necessary, the behavior of P(u)
being exactly what is needed, and a is "near" 3/2. In fact, a should
not be too close to 2, and cannot be greater than 2, a point which
requires some elaboration.
2.4. Concerning the sign of a -2 and the variance of U. A few
empirical data claim to lead to a > 2. To explain this by a limit argu-
ment requires most specific and unlikely assumptions. Similarly, the
observational invariance of the stable distribution cannot hold if a > 2,
unless it is weakened (2.5) to require that the sum of a few weak
Pareto variables with a > 2 is a weak Pareto variable with a > 2; as
the number of addends increase (2.8), the weak Pareto law holds for
a decreasing zone of values of u and finally the sum tends towards
a Gaussian variable.
We find that the sign of a - 2 distinguishes two sets of different
random variables. The occurrence of a > 2, or even a < 2 but close
to 2, may mean a predominance of salaries (Lydall's income model [13]
is valid for all values of a). (Another explanation may be found in
Mandelbrot [19].) On the other hand, several of the examples where
a > 2 are encountered occur in communities of Oceania which are
very much less self-contained than, for example, Great Britain or the
U.S.A. Their distributions of U may perhaps be truncated. In any
case, the sign of a - 2 raises an important empirical problem: Is the
fit of the weak law equally good regardless of this sign, and does this
law represent a higher percentage of the data where a < 2 ?
A second consequence of the limitation 1 < a < 2 is that the second
moment of U is then infinite, the first moment being finite. This is
easily seen, because by integration by parts,
(We assume that the behavior of P(u) for negative u does not lead
to a convergence problem for E(U).)
The last result holds for both strong and weak Pareto variables if
1 < a < 2, but it would, for example, cease to hold for the density
p(u) = ku-10`'1 exp(- bu) of 1.4. This provides a new and important
test of our conjecture that b =_ 0. The finite E(U) means that if u,
are samples from a Pareto distribution, the empirical mean E,
Et- Us/N tends to E( U) with probability 1 (Kolmogoroff's strong law
of large numbers); in addition, EN is a good estimate for E(U), if N
is large. Now consider DN = E'- (u, - EN)21N. If a> 2, DN tends to a
finite limit D( U), of which it is a good estimate, and E '-, (u, -E)
tends to a Gaussian variable. This result changes little if one adds an
exponential factor with small b. However, if a < 2, and b 0 the
lirnit of DN is infinity (one may show that D, grows without limit
like N2'-1) but if b becomes > 0, D(U) is finite. Therefore, the use-
fulness of the exponential factor exp(- bu) may be tested by check-
ing whether DN keeps increasing with N in the case of the largest
of the samples available. We could not make the direct test, but an
indirect test results from the following observation: the ordering of
the different populations by "increasing inequality" should presumably
be identical with their ordering- by decreasing a; on the other hand,
more usual measures of inequality are given by Dy, DN/EN or VDK14EN.
These two methods of ordering populations have been compared and
found entirely contradictory. This result ceases to be absurd if one
takes account of the fact that the values of N in the different sam-
ples which were compared range from 102 to 108; and it seems that
even if D(U) were finite, it would not be approached, even with the
largest samples. In that case, irrespective of any theory, it is pref-
erable to take b = 0, and D(U) = co. Further, Ino function of DN
is adequate to compare degrees of inequality, except perhaps between
samples of identical size.
Another test of the usefulness of the approximation D = co is pro-
vided by the relative contribution to DN of the largest of the u,: this
is very large (close to 1/2 for the Wisconsin incomes [23]), as might be
predicted from the theory of the Pareto-Levy law. The usual procedure
p2(u) 0
= C2exp(-b
= C exp(- bu)dx = C2u exp(- bu).
P(X)= e exp 2x
It is known that
P2(U) =exp
Ul~~~~~~~~~
U U
U U
'a '~~~~~~~2
0 0
_~~~~~~~4
p (U 1) p (U/
sume that u' can only be 2 u?, so that its most probable value is u0.
In that case, p(x)p(u - x) will have two maxima, for x = u? and for
x = u - u?. If they are strong enough, p2(zt) will be composed mostly
of the contributiorns of the neighborhoods of these maxirmla. Compare
then the following two expressions:
These two expressions differ only in the small contributions of x's very
different from x - 0; that is,
p,(u) 2 p(u)e
Hence,
I - Pm(u) = [1 - P(u)]2.
Pm(u) ~ 2P(u) .
That is, the sum of two independent and idential strong Pareto
variables is a weak Pareto variable, with the same a and a scale
factor u? multiplied by 21/.
Likewise, any weak Pareto distribution will be invariant in addition
up to the value of u?. The proof requires an easy refinement of the
previous argument, to cover the case where -logp(x) is not concave
or convex all through the range of x, but d2logp(x)/dx2 becomes and
stays negative for large values of x (which implies a quite regular
behavior for -log p(x) in that region). One can show in this way that
the weak Pareto law is preserved in the addition of two (or of a few)
independent random variables: there is no self-contradiction in the
observed fact that this law holds for parts of income as well as for
the whole. That is, the exact definition of the term "income" may
not be a matter of great concern. But, conversely, it is unlikely that
the observed data on P(u) for large u will be useful in discriminat-
ing among several different definitions of "income."
The weak Pareto and the Gaussian are the only laws strictly having
the above invariance ("stability") property. They will be distinguished
by a criterion of "equality" versus "inequality" between u' and in",
when u = u' + u" is known and large. (2.6.) In 2.8 we shall cite a
further known result concerning stable probability distributions.
We may also need to know the behavior of p,(u), when the density
p'(u') of U' decreases slowly and p"(u") decreases rapidly. In that case,
a large u is likely to be equal to u', plus some "small fluctuation."
In particular, a Gaussian error of observation concerning a weak Pareto
variable is quite negligible for large u.
2.6. The problem of addition and of division into two, in the case
of stable variables. We have shown that the behavior of the sum of
two variables is determined mainly by the convexity of -log p(u): we
shall later show that this criterion is in general insufficient to study
the addition of many variables. However, if we limit ourselves to
stable random variables, the convexity of -log p(u') is sufficient to
distinguish between the case of the Gaussian and that of all other stable
distributions. That is, these two cases may be distinguished by the
criterion of approximate equality of the parts of a Gaussian sum con-
trasted with the great inequality between these parts in the case of
all other distributions, in particular the Pareto-Levy one.
This distribution has been used so far only to derive the distribution
of U' @ U". Suppose now that the value u of U is given and that
we wish to study the distribution of u' or of ui" = u - u'. If the
a priori distribution of U' is Gaussian, with mean M and variance a',
(2w)-"2r-1'a-1[ 2 exp-(u 2 j
p(uIu) = 2-'12(2,7)-112r-1 expL- (u -2M)2]
This is Gaussian, with mean value u/2 and variance a2/2. A striking
feature of the result is that the law of u' depends on u only through
the mean value of U' (see the right-hand side of Figure 2).
Let us now consider the non-Gaussian case. Insofar as our theory
is adequate in income studies, the problem of division will appear in
such questions as the following: if we know the sum of the agricul-
tural and industrial incomes of an individual and if the a priori dis-
tributions of both these quantities are Pareto-Levy with the same a,
then what is the distribution of the agricultural income? This case
is more involved than the Gaussian one, because we do not know any
explicit analytic form for the distribution of u' or u"; we can, how-
ever, do some numerical plotting (see the left-hand side of Figure 2).
If the sum u is very large, we find that the distribution of u' has
two very sharp maxima, near u'max and u - uma. As u decreases,
the shape of this distribution of u' will change, instead of being simply
translated, as in the Gaussian case. When u becomes small, more
maxima may appear. They will then all merge, and the distribution
of u' will differ little from that which is valid in the Gaussian case.
Finally, as u becomes negative and very large, the distribution of u'
will remain one with a single maximum.
Hence, bisection provides a very sharp distinction in this respect
between the Gaussian and all other stable laws.
Now, consider a fairly small number N; what is the distribution of
(1/N)-th of a stable variable? In the Gaussian case this (1/N)-th re-
mains Gaussian, whatever N may be; its mean value is u/N, and its
variance (N - 1)u2/N. In the non-Gaussian case, each part of a large
u may be small or it may be close to u; most of the N parts will be
small, but the largest of them will be likely to be close to the whole.
The situation is less intuitive when N becomes very large. How-
ever, Levy has proved that the necessary and sufficient condition for
the limit of the e-sum E U1 to be Gaussian is that the value us
If, on the contrary, a < 1, the above argument fails, because Nl-l ten
therefore WfN-1/0 could have a limit distribution on a non-decreasing r
of VU. whatever the origin of U (anyway E(U) -oo).
| d Sd(u8) = d(u/u*) |
dS.RB
the fluctuation of the contribution of far away stars has the charac-
teristic function
R2
cp(I) = exp (D * dS) (eu" 1 -iu)} I d(zu3/2)
which converges and tends to zero as R - co. For the sake of con-
venience, the same correction itu may also be used for all other values
of u, since its effect for large u is only to add a finite translation to
U. Hence, the difference between the attraction of the stars in the
pencil dS, and the mean value of this attraction, is a "positive"
stable variable, or "Pareto-Le'vy" variable, with a 3/2.
The meaning of the stability of this attraction is that if two clouds
of stars are distinguished by their colors (say red and blue) but have
the same density and fill the samrre pencil dS, then the forces exerted
on 0 by red or blue stars alone, or bv both together, differ only by
a scale factor and not in the analytic form of their d stributions.
It is also evident why large negative values of u are unlikely com-
pared to large positive values. A large negative u can occur only if
there is an abnormally small number of stars. Moreover, the absence
of any stars nea-r 0 is a quite likely event, but alone it can at best
give a bounded negative u. Therefore, a large negative u must also
contain the negative contributions of stars missing far away from 0;
each of these stars contributes little to U, so that the nuinber of miss-
ing stars must be very large, and this is very unlikely.
On the contrary, an unboundedly large positive u may be obtained
from the presence oI a single star near 0, irrespective of the density
of stars elsewhere; such an event is far more likely than the combi-
nation of events required for a negative u. It is easy to check the
fact that the distribution oi U has the same asymptotic behavior for
large u as the distribution of the attraction of the nearest star.
3. CONCLUSION
APPENDIX I
The behavior of the P-L density for large negative arguments is not classical. To derive
it, we note first that the form of the bilateral generating function G(b) (which is not com-
monly used) may be deduced from the commonly used characteristic function, with the
help of some standard theorems on Fourier transforms in the complex plane. From the
existence of G(b) it follows that p(u) must decrease faster than any form exp(-lbul),
when u - .
To show exactly how fast p(u) decreases, write for convenience v -u and fv)
- log p(v); Alv) will increase faster than any linear form of v. Write further:
APPENDIX II
We can exhibit the invariance of the weak Pareto variables in a specific example. Let
O < a < 1, and consider the discrete variable having the following Z(b) as discrete (one-
sided) generating function:
All p(n) are positive and less than 1, and n I p(n) = 1. For large tn,
p()-Cn-(a,+')
PC) r(- a)
The sum of two variables of this type has the generating function:
so that
For large n, the second part of the right-hand term becomes negligible compared to the
first. If a = 1/2, it is zero, so that the range of values of n in which it may be neglected
increases as a tends to 1/2.
If 1 < a < 2, we must add a factor in (1 - e-b) to obtain an acceptable generating func-
tion; consider for example:
For large n, the second and third terms become negligible for all a. The ratio of the
coefficients of the first and of the second term depends little upon a; but the ratio of
the coefficients of the third and first terms is ruled by r(- - a)Ir(- -2 a), which is zero for
a 3/2, but may become large elsewhere. As a result, the range of values of n over
which the third term is important may be large.
Each time a increases past an integral value, the sign of C(1 - e-b) must be changed,
and another polynomial term must be added to Z(b), to keep it a generating function.
The number of corrective terms of p2(n) - 2p(n) increases, as well as the range of values
of n in which the corrective terms are appreciable.
Similarly, as more than two terms are added, pN(n) - Np(n) is vitiated over an in-
creasing range of values of n. Let N -e oo, and observe the weighted sums of the varia-
bles Us, whose values are the integers n. (Z remains a summation in the sense of (B.)
For 0 < a < 1, it is sufficient to consider the expression WN =N-" , since its
g.f. is ZN(N-1/ab), which tends to exp(-Cbm) when N --* co, as it should.
If 1 < a < 2, one must consider the expression WN = N-Il z ( Ut - M), where M, the
mean value of Ut, is easily found to be C. The g.f. of WN is clearly exp(NCb)ZN(N-l Ib)
and when N -*0o, it tends to exp (Cbm), as it should. It is easily seen that if we choose
for M a value different from C, the g.f. of WN will not tend to a non-degenerate ex-
pression.
If the value of a is higher, no linear renorming of Us can eliminate from log[Z(b)]
the square term Kb2, and hence, the best normed sum of the Us is the classically normed
N-112Z(Ui - M), which tends to a Gaussian, whatever the value of a.
APPENDIX III
This appendix concerns the transformation V=log U, which is used in most classical
theories of income distribution.
The strong objections against this transformation do not apply at all in the case of a law
due to Estoup and Zipf, which is formally similar to that of Pareto, but relates to word
frequencies, and which we have studied since 1951 (cf. [14], [16]). In the formal expression
of that law U is replaced by the- "rank" of a word, where words are crdered by decreasing
frequencies. Hence log U has an intrinsic meaning as "cost of coding"; in particular,
the addition of "costs" is quite meaningful.
In the case of income, the transformation V=log U is justified in our [18] and [191,
and in a more detailed article which ought to appear soon.
REFERENCES
[91 HOLTSMARK, J., Annalen der Physik, Vol. 58 (1919), pp 577 ff.
[10] IwVY, PAUL, Calcul des ProbabilitUs, Paris: Gauthier-Villars, 1925.
[11] ----, Theorie de laddition des variables aUatoires, Paris: Gauthier-Villars, 1937
(second edition: 1954).
[12] , Processus stochastiques et mouvement Brownien, Paris: Gauthier-Villars,
1948.
[13] LYDALL, H. F., Econometrica, Vol. 27 (1959), pp 110 if.
[141 MANDELBROT, B., in Communication Theory, ed. Willis Jackson, London: Butter.
worth, 1953.
[151 , Memoranduim of the University of Geneva Mathematical Institute, January
16, 1956.
[16] , in Logique, Langage et Thiorie de l'Information, Apostel, L., B. Mandel-
brot and R. Morf, Paris: Presses Universitaires de France, 1957.
[17] , Memorandum of the University of Lille Mathematical Institute, Novem-
ber, 1957.
[181 - , Comptes Ren-dus de 1'Acad(mie des Sciences, Paris, Vol. 249 (1959), pp.
613-615.
[19] -, Comptes Rendus de l'Acadimie des Sciences, Paris, Vol. 249 (1959), pp.
2153-2155.
[20] , Comptes Rendus de l'Acadgmie des Sciences, Paris, Vol. 250 (1960), pp.
451-453.
[211 MANDELBROT, B. AND F. ZARNFALLER, to be published.
[22] MILLER, H. P., The Income of the American People, New York: J. Wiley, 1955.
[23] HANNA, F. A., J. A. PECKHAM, AND S. M. LERNER, Analysis of Wisconsin Income,
(Vol. 9, Studies in Income and Wealth) New York: National Bureau of Economic
Research, 1948.
[241 PARETO, V., Cours d'Economie Politique, Lausanne and Paris: 1897.
[25] WOLD, H. A. 0. and P. WHITTLE, Econometrica, Vol. 25 (1957), pp. 591 ff.