Inoua Simple Economic Complexity Version 2021
Inoua Simple Economic Complexity Version 2021
Inoua Simple Economic Complexity Version 2021
net/publication/291437352
CITATIONS READS
11 1,843
1 author:
Sabiou M. Inoua
Chapman University
17 PUBLICATIONS 23 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sabiou M. Inoua on 04 October 2021.
Sabiou Inoua
Chapman University
strongly correlated with economic development [1-6]. The “richest” countries make
almost all types of products, from the most rudimentary to the most sophisticated
ones; while the “poorest” countries make comparatively fewer and more
Table 1. The World’s Most and Least Diversified Economies (2018, 4-digit HS).1
Ten Most Diversified Economies (2018) Ten Least Diversified Economies (2018)
Country Diversification Rank Country Diversification Rank
United States 1224 1 Gambia 180 162
China 1221 2 Maldives 178 163
India 1219 3 Saint Lucia 169 164
Japan 1211 3 Equat. Guinea 165 165
Germany 1223 5 Bhutan 143 166
Russia 1204 6 Central Afr. Rep. 130 167
Brazil 1196 7 Chad 118 168
Indonesia 1168 8 Comoros 105 169
United Kingdom 1221 9 Guinea-Bissau 29 170
France 1220 10 Small Islands [3, 28] 171
islands, which mostly export natural products (naturally occurring goods: fruits,
materials (notably oil); these countries have higher incomes despite their low
1See data description and source in Subsection 4.1. The number of products a country makes
depends of course on the product nomenclature used (usually 2, 4, 5, or 6-digit product codes),
notably the SITC (Standard International Trade Classification) and the HS (Harmonized System).
The results from the two product nomenclatures and at different aggregation levels are very similar;
hence we present no systematic comparison of the results based on product nomenclature.
2The islands are, e.g., Bouvet (BVT), Netherlands Antilles (ANT), Kiribati (KIR), Northern Mariana
(MNP), Micronesia (FSM), Pitcairn (PCN), South Geargia & Sandwich (SGS), Tuvalu (TUV), Wallis
& Futuna (WLF).
2
3
product diversity (Figure 1). All in all, about 80% of countries’ GDP ranking can be
resources, which, as documented throughout, are the main source of bias in this
even greater outliers then oil-experts, are not included in Figure 1, due to missing
GDP data.)
These associations are not just correlations and can be explained from a basic
in theory, the spectrum of this knowhow content of products ranges from zero, for
naturally occurring goods (a natural resource sold in the raw, for example), to a
maximum value when all the available knowhows are involved in the making of
output (or income) the country can produce from any combination of aggregate
labor and capital, where the function F is homogenous of degree 1, so that income
F(Capital/Labor, 1).5 But as Solow’s seminal contribution established [10, 11], capital
and labor accumulation cannot account for much of economic growth, which Solow
multiplicative factor (denoted “A” and later named “Total Factor Productivity”),
substance and identity to the “Solow residual” (in other words, to “endogenize” the
part of economic development not explained by the level of capital per worker), the
capabilities their production requires, and countries to the capabilities they possess
(Figure 2). Thus, the core problem of the network approach to production was
projection [16, 17]. Researchers conceived algorithms to that effect, notably the
6
Economic Complexity Index (ECI), which, as the authors argue, predict economic
growth better than traditional variables such as human capital [16, 17]. The ECI is
jointly computed with the Product Complexity Index (PCI) by an algorithm akin to
that which the web search engine Google uses to rank webpages.6Another
named country Fitness (F) and product Quality (Q). Both algorithms will be
presented shortly. (Section 4 offers a step-by-step derivation of the metrics and the
basic logic underlying them, for the reader not familiar with this literature.)7
The primary data of the network view on production is formally, the country-
product binary matrix M [ Mcp ] connecting countries to the products they make:
Mcp 1 if country c makes product p , and Mcp 0, otherwise; this simple product
list data is not available, however; thus, one takes as proxy for countries’ product
lists, the countries’ export lists. (More on the data description in Subsection 4.1.)
Given the matrix M , the product diversity of country c (the number of its products)
and the ubiquity of product p (the number of its producers) are respectively:8
Dc p
Mcp , (1)
Up c
Mcp . (2)
The complexity metrics are (up to norming) the solutions to the equations:
Dc ECI c p
Mcp PCI p , (3)
Up PCI p c
Mcp ECI c . (4)
Fc c
McpQp , (5)
Qp [ c
Mcp Fc 1 ] 1 . (6)
6More precisely the ECI-PCI algorithm is more similar in spirit to an algorithm developed by J.
Kleinberg [18, 19] and used by Ask.com. It is an eigenvector problem, as one can see from (3)-(4).
7The complexity metrics are analyzed in various studies, some of which offer critiques, alternatives,
or refinements of the metrics, including the one presented here, in an earlier draft [21-27].
8 The natural concept is not ubiquity per se, but its inverse, which can be called product rarity.
2 Results and Discussions
applied a set of knowhows to turn them into an economically valuable outcome; and
more and more sophisticated knowledge. The results presented in this paper derives
from these definitions and two simple assumptions about the constraints on
1. Any S capabilities can be put together to transform raw materials into a valuable
S
product only with probability (uniform across countries and products).
2. A country finds the raw materials needed for making a product involving S
S
capabilities only with probability (uniform across countries and products).
The two assumptions imply that a product tends to appear in a country’s product
list with a probability that decays exponentially with the product’s sophistication.
Moreover, one can show (Subsection 4.2) that the total number of capabilities in an
The model predicts the following relationships between knowhow, fitness, and ECI
that fit the data well up to the bias related to natural products (Figure 3-Figure 4):10
9 The derivation is straightforward if we assume away the model’s two constraints (1)-(2); then a
country possessing K capabilities makes D=2K products, whose sophistication range from 0 (for
unprocessed natural resources sold) to K. Thus, K is given by logD (up to a scaling constant).
10 Notation: mean(X) denotes a cross-country average of X (average X across all countries); std(X)
means the cross-country standard deviation of X; later, we will use mean(X|c) to mean the average of
X in a country c; and in equation (16)-(17), we write mean(X|K) for the average of X in a country with
K capabilities. The metrics are systematically compared in standardized form (namely their z-scores)
in the figures below, unless otherwise indicated by the scale of the plot.
7
8
log D mean(log D)
ECI . (9)
std(log D)
log F log D
. (10)
mean(log F ) mean(log D)
9
More specifically (Subsection 4.5), the model predicts the following relationships
D (1 )K , (11)
F 2K F(0), (12)
Q ( ) S Q(0), (13)
K mean( K )
ECI , (14)
std( K )
S mean(S)
PCI . (15)
std(S)
mean(S| K ) ( )K . (16)
1
mean(Q | K ) ( 2 )K , (17)
1
where F(0) and Q(0) are merely normalizing constants, and the notation mean(X|K)
stands for the (conditional) average of X in a country with K capabilities. The last
two predictions (16)-(17) are particularly nontrivial and fit equally well the empirical
data (Figure 5). They offer a simple criterion for assessing the accuracy of each
complexity algorithms.11 The F-Q metrics are better fit by the model because this
algorithm better deals with the bias related to natural products, whose rarity is due
to natural reasons and not to knowhow, and which are mostly exported by island
the estimation of Qp , namely (6). (See Subsection 4.3.) In contrast, island countries
are major outliers of the ECI versus log-diversity theoretical fit [Figure 3, bottom
panel, (a)].
11 In an earlier draft (arxiv.org, 2016) we suggested that it takes more than a regression between logD
and logF to assess the accuracy of the F-Q algorithm, because fitness being the average product
quality multiplied by diversity, a strong correlation between logF and logD might be a fortuitous one
(that holds even with random data, as confirms Figure 5: Bottom Panel). The nontrivial prediction (17)
is the right criterion in this respect.
12
The model’s two parameters come down effectively to one, the joint probability
. (18)
Once this probability is known, all variables are determined (including the scales or
norming constants). This parameter is simply related to the slope between log-
diversity and log-fitness, by virtue of the predictions (11) and (12), which imply
log(1 ) log(1 )
log D log F log F(0). (19)
log 2 log 2
Thus, we can estimate the model’s key probability parameter and the norming
The number of capabilities in each country is then estimated through either one of
The probability and the spread of the distribution of K (notably the maximum
12 If one can manage to find F(0) explicitly in terms of the norming constants involved in the F-Q
algorithm (Section 4.3), then one can sharpen the regression equation by regressing logD on
log(F/F(0)), and split the error (or residual) term into a mean term, which would measure the bias due
to raw products, and a pure noise term.
14
The model’s two assumptions also amount effectively to one: the probability that a
country c makes a product p, which we write simply prob(c| p), decays exponentially
Mcp Sp
prob( p|c) . (23)
Dc
world economy (all countries combined) is (by the law of total probability):
C C 1
prob( p) c 1
prob( p|c) prob(c) c 1
prob( p|c) .
C
Given (23), we get
Sp 1 C
Mcp
prob( p) c 1
. (24)
C Dc
Sp 1
C . (25)
C Mcp
c 1
Dc
Thus, the model predicts that product sophistication is (up to norming) given by the
formula:
1 C
Sp log[mean H { Mcp Dc } ], (26)
1 Up
log
where meanH stands for (cross-country) hormonic mean. By the same token, we
S
( F , Q) ( D, ). (27)
That is, if we replace fitness by diversity in the F-Q algorithm, then this latter yields
Sp
Qp Q(0) . The three product complexity measures (S, PCI, and LogQ) are
15
16
call into question the very dependence between these concepts (and hence
everything we said so far). Diversity and complexity are related notions intuitively.
complexity. One can show mathematically that the dependence cannot be a linear
one anyway, at least if complexity is measured by the ECI, or more precisely by its
network: Subsection 4.2), which we denote k2 below. This follows from a basic yet
diversity vector, which we denote d [Dc ] . By symmetry one can similarly establish
which we denote s2 .
the two vectors, be it reminded, merely implies that the two vectors are not linearly
complexity and diversity is not incompatible with positive dependence between the
two vectors: to the contrary, the orthogonality combined with the positive
mean(k2 ) 0, (28)
mean(s2 ) 0. (29)
18
Since the orthogonality is true mathematically, the sign condition (28)-(29) simply
already know, and which the sign conditions confirm (Figure 8).
The results, in the final analysis, suggest that both the empirical data and the
combine, with some probability , to make more and more sophisticated knowhow.
across countries and across products; it is more accurate, however, to assume some
cross-country and cross-product variability of to account for the bias due to raw
Fundamentally, the model rests entirely on the assumption that knowledge comes in
informational interpretation of the model that seems to be the natural language for
the complexity view on economic development more generally. For our limited
That is, the amount of information that conveys an information source (a system to
effective number of basic states (those states that can carry information). A basic
information state can be the realization of an event, for example: the information that
of the event, and the total information conveyed by the whole system (here a
13Information theory becomes more intuitive (compared to its usual formulation based on probability
as a primitive concept) if the combinatorial foundation of information is more explicitly emphasized,
or even taken as the primitive concept, as seems to suggest Kolmogorov [28]. If indeed the basic
nature of information is that it comes in discrete units and that it expands combinatorially (or
exponentially, in the simplest case), then it is natural to measure the amount of information of an
information system by the logarithm of its effective number of information states.
20
the basic events (Shannon’s entropy formula); if the events are equally likely, then
the overall information content is measured by the logarithm of the total number of
possible events. Indeed Shannon’s entropy formula [29] can be viewed as a special
case of the general information formula: in this sense, Shannon formula defines the
knowledge content of the products it makes; and think of the country’s products as
think of the set of all countries potentially producing a product as the information
source; then each producer reveals partial information about the knowledge content
(or sophistication) of the product: the product’s knowledge content is then obtained
However, this is only a rough measure that implicitly assumes uniform probability
of basic states or events: hence the need for a generalized (or effective) diversity and
ubiquity measures, theoretically given by the model’s predictions (8) and (26). The
economic development is beyond this paper’s scope, since we choose to center the
question is whether a country can make a product or not): more generally, a country
can be considered to be rich either because of its productive knowhow (as reflected
in its product diversity) or by the intensity of its production (or the average amount
21
of output the country is able to sell: the quantitative aspect of production determined
14A discussion of the economic implications of the model is postponed to a follow-up work, which
contains a development accounting in terms of the two dimensions of production (diversity versus
intensity of output), sketched in an earlier draft (arxiv.org, 2016), but that expanded in subtlety.
4 Method: Data and Model
In principle, the complexity view on growth requires very simple data (for each
country, the list of products it makes), which are not yet available, however; hence
one takes as proxy for countries’ product lists, their export lists. While there will
inevitably be some error in centering the analysis on export data (for lack of detailed
data on production), the bias has proved minor a posteriori, given the accuracy of
the results (apparently, a country’s export mix is representative of its total output’s
composition). The results presented throughout this paper are based on the proxy
matrix:
1 if Xcp 0,
Mcp (30)
0 if X cp 0,
where Xcp is the amount country c exported in product p , using the Comtrade data
in HS (revision 2007), available for the years 1995-2018 [30].15 We also use for
comparison the Comtrade data in SITC (revision 2) as compiled and corrected for
mistakes by Feenstra et al. and available for the years 1962-2000 [31].
Unlike in this paper, the standard practice in the economic complexity literature is to
1, RCAcp 1,
Mcp (31)
0, RCAcp 1,
15The trade data are accessible through the Atlas of Economic Complexity Dataverse (Harvard
University): https://dataverse.harvard.edu/dataverse/atlas. The income data are countries’ GDP in
PPP (purchasing power parity) from the Penn World Table (PWT8); we use the RGDPO variable (an
output-oriented GDP estimate), though the other measures give very similar results). The PWT is
accessible through the GGDC (Groningen Growth and Development Centre, University of
Groningen): https://www.rug.nl/ggdc/productivity/pwt/.
22
23
The ECI-PCI algorithm [16, 17] assumes that an economy’s knowhow is proportional
to the average knowledge content of its products, and, vice versa, a product’s
product p, then
Kc p
WcpSp , (32)
Sp p
Wpc* Kc , (33)
Mcp
Wcp , (34)
p
Mcp
Mpc
Wpc* . (35)
c
Mpc
Collecting the variables and weights into the vectors and matrices k [Kc ], s [Sp ],
(WW* )k ( ) 1 k. (36)
That is, the complexities of countries and products are given by eigenvectors of the
Because the averaging weights sum to 1, it is easy to see that any (positive) uniform
vectors k [K ,..., K]T and s [S,..., S]T are solutions to this eigenvector problem; these
are the eigenvectors associated with the largest eigenvalue, which is 1 (by a known
linear algebra result, the Perron-Frobenius theorem). Thus, the authors of this
algorithm choose the eigenvectors associated with the second largest eigenvalue. Let
k2 and s2 be these eigenvectors: then ECI and PCI are (up to the sign) the elements of
k2 mean(k 2 )
ECI sign[corr(k 2 , d)] , (38)
std(k 2 )
s2 mean(s2 )
PCI sign[corr(s2 , u)] . (39)
std(s2 )
signs are correct; this is simply because the sense of an eigenvector being arbitrary,
the standardization specifies the metrics only up to the sign: for example, any chosen
k mean( k) k mean(k)
. (40)
std( k) | | std(k)
In essence, this algorithm [20] measures the complexity of an economy by the total
complexity of its products; and the complexity of a product, by the product’s inverse
That is, the two metrics are jointly computed recursively as follows:
1 ( n)
Fc( n 1)
( n) p
McpQp , (41)
mean(Qp )
1 1
Qp( n 1)
( n) . (42)
mean( Fc ) 1
c
Mcp ( n)
Fc
The means are averages across all countries and all products, respectively, and the
initial conditions are unit complexities for all countries and all products. The
F( )
F , (43)
mean( F ( ) )
Q( )
Q . (44)
mean(Q( ) )
25
The crucial novelty of this algorithm is the following ingenious observation: if a low-
the producers of a product barely reveals any information about the product’s
complexity (since such country makes almost all product types). Thus, highly
what does the harmonic mean, whose following bounds are known:16
We know from the theoretical model why the harmonic mean is the natural choice.
As usual we index an empirical country and product by c and p, and we index the
16 See https://en.wikipedia.org/wiki/Harmonic_mean.
26
(SK ) S
prob(S| K ) ,S 0,..., K. (46)
(1 )K
(S| K ) K. (47)
1
This theoretical result justifies the measurement of a country’s output complexity by
its average product complexity (up to scaling): it explains why ECI works as a
measure of knowhow. We can check the extent to which the ECI-PCI algorithm does
where Kc(2) is cth entry of the country complexity eigenvector k2 and S(2) |c is the
the scale of measurement of sophistication, and an error term that should average
out), then
And if in addition the combinatorial model of production is accurate, then the ECI-
S
Q(S) Q(0) , (52)
The model also predicts that the average product quality in a K-country is
K S
(Q|K) Q(0) S 0
prob(S|K)
K S K
D 1Q(0) S 0
( ) S.
S
That is,
F F(0)2K. (54)
References
28
29
[19] J.M. Kleinberg, M. Newman, A.-L. Barabási, D.J. Watts, Authoritative sources in
a hyperlinked environment, Princeton University Press, 2011.
[20] A. Tacchella, M. Cristelli, G. Caldarelli, A. Gabrielli, L. Pietronero, A new
metrics for countries' fitness and products' complexity, Scientific reports, 2 (2012).
[21] G. Caldarelli, M. Cristelli, A. Gabrielli, L. Pietronero, A. Scala, A. Tacchella, A
network analysis of countries’ export flows: firm grounds for the building blocks of
the economy, PloS one, 7 (2012).
[22] F. Battiston, M. Cristelli, A. Tacchella, L. Pietronero, How metrics for economic
complexity are affected by noise, Complexity Economics, 3 (2014) 1-22.
[23] E. Kemp-Benedict, An interpretation and critique of the Method of Reflections,
(2014).
[24] P. Mealy, J.D. Farmer, A. Teytelboym, Interpreting economic complexity,
Science advances, 5 (2019) eaau1705.
[25] C. Sciarra, G. Chiarotti, L. Ridolfi, F. Laio, Reconciling contrasting views on
economic complexity, Nature communications, 11 (2020) 1-10.
[26] A. Van Dam, K. Frenken, Variety, complexity and economic development,
Research Policy, (2020).
[27] C.A. Hidalgo, Economic complexity theory and applications, Nature Reviews
Physics, (2021) 1-22.
[28] A.N. Kolmogorov, Combinatorial foundations of information theory and the
calculus of probabilities, Russian Mathematical Surveys, 38 (1983) 29-40.
[29] C.E. Shannon, A Mathematical Theory of Communication, Bell system technical
journal, 27 (1948) 379-423.
[30] G. Gaulier, S. Zignago, Baci: international trade database at the product-level
(the 1994-2007 version), (2010).
[31] R.C. Feenstra, R.E. Lipsey, H. Deng, A.C. Ma, H. Mo, World trade flows: 1962-
2000, in, National Bureau of Economic Research, 2005.