First Course Network Theory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 269
At a glance
Powered by AI
The book discusses fundamental concepts and theories in network science including graph structures, matrices, walks and paths, and network properties.

The book is an introduction to network theory that covers elementary concepts, formal definitions, graph structures, matrices, walks and paths in networks.

Random network models discussed include the Erdos-Renyi, Watts-Strogatz, and Barabasi-Albert models.

A F I R S T C O U R S E I N N E T WO R K T H E O RY

A First Course in Network Theory

Professor Ernesto Estrada


Professor in Mathematics and Chair in Complexity Science, University of Strathclyde, UK

Dr Philip A. Knight
Lecturer in Mathematics, University of Strathclyde, UK

3
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Ernesto Estrada and Philip A. Knight 2015
The moral rights of the authors have been asserted
First Edition published in 2015
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2014955860
ISBN 978–0–19–872645–6 (hbk.)
ISBN 978–0–19–872646–3 (pbk.)
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Dedication

To Gisell, Doris, Puri, Rowan, and Finlay


Preface

The origins of this book can be traced to lecture notes we prepared for a class
entitled Introduction to Network Theory offered by the Department of Mathemat-
ics and Statistics at the University of Strathclyde and attended by undergraduate
students in the Honour courses in the department. The course has since been
extended, based on experience gained in teaching the course to graduate students
and postdoctoral researchers from a variety of backgrounds around the world.
To mathematicians, physicists, and computer scientists at Emory University in
Atlanta. To postgraduate students in biological and environmental sciences on
courses sponsored by the Natural Environmental Research Council in the UK. To
Masters students on intensive short courses at the African Institute of Mathemat-
ical Sciences in both South Africa and Ghana. And to mathematicians, computer
scientists, physicists, and more at an International Summer School on Complex
Networks in Bertinoro, Italy.
Designing courses with a common thread suitable for students with very dif-
ferent backgrounds represents a big challenge. For example, the balance between
theory and application will vary significantly between students of mathematics
and students of computer sciences. An even greater challenge is to ensure that
those students with an interest in network theory, but who lack the normal quan-
titative backgrounds expected on a mathematics course, do not become frustrated
by being overloaded by seemingly unnecessary theory. We believe in the interdis-
ciplinary nature of the study of complex networks. The aim of this book is to
approach our students in an interdisciplinary fashion and as a consequence we
try to avoid a heavy mathematical bias. We have avoided a didactic ‘Theorem–
Proof ’ approach but we do not believe we have sacrificed rigour and the book is
replete with examples and solved problems which will lead students through the
theory as constructively as possible.
This book is written with senior undergraduate students and new graduate
students in mind. The major prerequisite is elementary algebra at a level one
would expect in the first year of an undergraduate science degree. To make this
book accessible for students from non-quantitative subjects we explain most of
the basic concepts of linear algebra needed to understand the more specific topics
of network theory. This material should not be wasted on students coming from
more quantitative subjects. As well as providing a reminder of familiar concepts,
we expect they will encounter a number of simple results which are not typically
presented in undergraduate linear algebra courses. We insist on no prerequisites
in graph theory for understanding this book since we believe it contains all the
necessary basic concepts in that area to allow progress in network theory. Based
viii Preface

on our accumulated experience in teaching courses in network theory in different


environments, we have also included chapters which address generic skills with
which students often have difficulties. For example, we include instructions on
how to manipulate and present data from simulations carried out in network the-
ory; and how to prove analytic results in this field. Knowing how useful network
theory is becoming as a tool for physicists, we have also included three chap-
ters which draw analogies between different branches of physics and networks.
Some background in physics at undergraduate level will be useful for fully ap-
preciating these chapters but they are not necessary for understanding the rest of
the book.
Every chapter of this book is written using the following common scheme: (i)
the aims of the chapter are clearly stated at the beginning; (ii) a short introduction
or motivation of the key topics is presented; (iii) the concepts and formulae to be
used are defined with clarity; (iv) concepts are illustrated through examples, both
by using small, artificial networks and also by employing real-world networks;
(v) a few solved problems are given to train the student in how to approach
typical problems related to the principal topics of the chapter. Predominantly
we focus on simple networks—almost all of our edges will be bidirectional, un-
weighted, and will connect a unique pair of adjacent nodes—but we will highlight
significant variations in theory and practice for a wide range of more general
networks.
This book will be useful for both lecturers and researchers working in the area
of complex networks. The book provides researchers with a reference of some
of the most commonly used concepts in network theory, good examples of their
applications in solving practical problems, and clear indications on how to analyse
their results. We would also like to highlight some significant features of the book
which teachers should find particularly attractive. One of the most common prob-
lems encountered by teachers is how to select appropriate illustrative exercises in
the classroom. Because of the large size of many complex networks, solving prob-
lems in this field is frequently left to computers. If a student is just trained to work
with a black box, they miss out on properly contextualizing theory. Frequently,
this inhibits the student’s ability to learn how to prove results analytically. In this
book, we give solved problems which teachers can easily modify and adapt to their
particular objectives. We hope that some of the examples and solved problems in
this book will find their way into more conventional courses in linear algebra and
graph theory, as they provide stimulating practical examples of the application of
abstract concepts.
In closing, we reiterate that this book is aimed at senior undergraduate and
new postgraduate students with or without quantitative backgrounds. For most
of the book, the only prerequisites are a familiarity with elementary algebra and
some rudiments of linear algebra. Teachers of courses in network theory, linear
algebra, and graph theory—as well as researchers in these fields—should find this
book attractive.
Preface ix

Finally, we would like to thank the colleagues and students who have helped
us and inspired us to write this book. In particular, we would like to thank Mary
McAuley for her patience and skill in organizing our material and Eusebio Vargas
for lending us his talents to produce the high quality illustrations of networks
which you will find in the book.
Ernesto Estrada
Philip A. Knight
Contents

1 Introduction to Network Theory 1


1.1 Overview of networks 1
1.2 History of graphs 5
1.3 What you will learn from this book 9
Further reading 11

2 General Concepts in Network Theory 12


2.1 Formal definition of networks 12
2.2 Elementary graph theory concepts 13
2.3 Networks and matrices 16
2.4 Network connectivity 21
2.5 Graph structures 26
Further reading 30

3 How To Prove It 31
3.1 Motivation 31
3.2 Draw pictures 32
3.3 Use induction 34
3.4 Try to find a counterexample 35
3.5 Proof by contradiction 36
3.6 Make connections between concepts 37
3.7 Other general advice 38
Further reading 38

4 Data Analysis and Manipulation 39


4.1 Motivation 39
4.2 Sources of error 40
4.3 Processing data 42
4.4 Data statistics and random variables 46
4.5 Experimental tools 48
Further reading 51

5 Algebraic Concepts in Network Theory 52


5.1 Basic definitions of networks and matrices 52
5.2 Eigenvalues and eigenvectors 54
Further reading 65
xii Contents

6 Spectra of Adjacency Matrices 66


6.1 Motivation 66
6.2 Spectral analysis of simple networks 66
6.3 Spectra and structure 70
6.4 Eigenvectors of the adjacency matrix 75
Further reading 77

7 The Network Laplacian 78


7.1 The graph Laplacian 78
7.2 Eigenvalues and eigenvectors of the graph Laplacian 80
Further reading 85

8 Classical Physics Analogies 86


8.1 Motivation 86
8.2 Classical mechanical analogies 87
8.3 Networks as electrical circuits 90
Further reading 94

9 Degree Distributions 95
9.1 Motivation 95
9.2 General degree distributions 95
9.3 Scale-free networks 97
Further reading 100

10 Clustering Coefficients of Networks 101


10.1 Motivation 101
10.2 The Watts–Strogatz clustering coefficient 101
10.3 The Newman clustering coefficient 102
Further reading 104

11 Random Models of Networks 105


11.1 Motivation 105
11.2 The Erdös–Rényi model of random networks 105
11.3 The Barabási–Albert model 108
11.4 The Watts–Strogatz model 110
Further reading 113

12 Matrix Functions 114


12.1 Motivation 114
12.2 Matrix powers 114
12.3 The matrix exponential 118
12.4 Extending matrix powers 121
12.5 General matrix functions 123
Further reading 128
Contents xiii

13 Fragment-based Measures 129


13.1 Motivation 129
13.2 Counting subgraphs in networks 129
13.3 Network motifs 140
Further reading 142

14 Classical Node Centrality 143


14.1 Motivation 143
14.2 Degree centrality 143
14.3 Closeness centrality 146
14.4 Betweenness centrality 152
Further reading 156

15 Spectral Node Centrality 157


15.1 Motivation 157
15.2 Katz centrality 158
15.3 Eigenvector centrality 160
15.4 Subgraph centrality 164
Further reading 169

16 Quantum Physics Analogies 170


16.1 Motivation 170
16.2 Quantum mechanical analogies 171
16.3 Tight-binding models 174
16.4 Some specific quantum-mechanical systems 177
Further reading 178

17 Global Properties of Networks I 179


17.1 Motivation 179
17.2 Degree–degree correlation 179
17.3 Network reciprocity 187
17.4 Network returnability 189
Further reading 191

18 Global Properties of Networks II 192


18.1 Motivation 192
18.2 Network expansion properties 192
18.3 Spectral scaling method 195
18.4 Bipartivity measures 200
Further reading 205

19 Communicability in Networks 206


19.1 Motivation 206
19.2 Network communicability 207
xiv Contents

19.3 Communicability distance 212


Further reading 216

20 Statistical Physics Analogies 217


20.1 Motivation 217
20.2 Thermodynamics in a nutshell 218
20.3 Micro-canonical ensembles 220
20.4 The canonical ensemble 221
20.5 The temperature in network theory 225
Further reading 226

21 Communities in Networks 227


21.1 Motivation 227
21.2 Basic concepts of communities 229
21.3 Network partition methods 230
21.4 Clustering by centrality 235
21.5 Modularity 237
21.6 Communities based on communicability 239
21.7 Anti-communities 245
Further reading 250

Index 251
Introduction to Network
Theory 1
1.1 Overview of networks 1
In this chapter 1.2 History of graphs 5

We start with a brief introduction outlining some of the areas where we find 1.3 What you will learn from this book 9
networks in the real-world. While our list is far from exhaustive, it highlights Further reading 11
why they are such a fundamental topic in contemporary applied mathematics.
We then take a step back and give a historical perspective of the contribution
of mathematicians in graph theory, to see the origins of some of the terms
and ideas we will use. Finally, we give an example to demonstrate some of the
typical problems a network analyst can be expected to find answers to.

1.1 Overview of networks


One cannot ignore the networks we are part of, that surround us in every day
life. There’s our network of family and friends; the transport network; the tele-
phone network; the distribution network shops use to bring us things to buy; the
banking network—it does not take much effort to come up with dozens of ex-
amples. Analysis of networks particularly the huge networks that drive the global
economy (directly or indirectly) is a vital science, and mathematicians have been
contributing for hundreds of years.
Initially, this contribution might have been considered frivolous, and for a long
time network theory was the preserve of the pure and the recreational mathem-
atician. But more recently there have been significant theoretical and practical
achievements. These lend weight to the idea that every applied mathematician
should include network analysis in his or her toolkit.

1.1.1 Why are networks so ubiquitous?


One answer can be that ‘being networked’ is a fundamental characteristic of com-
plex systems. If we exclude ‘the science of the very large, i.e. cosmology, the
study of the universe’ and ‘the science of the very small, the elementary particles
of matter’, everything remaining forms the object of study of complexity sciences,
2 Introduction to Network Theory

‘which includes chemistry, condensed-matter physics, materials science, and


principles of engineering through geology, biology, and perhaps even psychology
and the social and economic sciences’.1
In addition, the abstract concept of a network represents a wide variety of
structures in which the entities of the complex system are represented by the
nodes of the network, and the relations or interactions between these entities are
captured by means of the edges of the network. Examples of some of these diverse
concepts are listed below.

• Edges representing physical links


Pairs of nodes can be physically connected by a tangible link, such as a
cable, a road, or a vein. We include the physical network behind the inter-
net, urban street networks, road/underground networks, water or electricity
supply networks, neural and vascular networks in this category.
• Edges representing physical interactions
Pairs of nodes can be considered to be connected if there is an interaction
between them which is determined by a physical force, such as the inter-
actions among protein residues, or through biological interactions such
as correlated behaviour between pairs of proteins to particular stimuli in
protein–protein interaction networks.
• Edges representing ‘ethereal’ connections
Pairs of nodes may be connected by intangible connections, such as the
fact that ‘information’ is sent from one node and it is received at another,
irrespective of the ‘physical’ trajectory followed by this ‘information’ such
as in the Web or in a network of airports.
• Edges representing geographic closeness between nodes
Nodes can represent regions of a surface and may be connected by means
of their geographic proximity, such as when we connect countries in a map,
patches on a landscape connected by corridors, or cells connected to each
other in tissues.
• Edges representing mass/energy exchange
Pairs of nodes can be connected by relations that indicate the interchange of
mass and/or energy between them, such as in reaction networks, metabolic
networks, food webs, and trade networks.
• Edges representing social connections
If nodes are connected by means of any kind of social tie, e.g. friendship,
1 Cottrell, A.H. and Pettiford, D.G.,
collaboration, or familial ties.
Models of Structure. In: Structure in Sci- • Edges representing conceptual linking
ence and Art. Pullman, W., Bhadeshia,
H. Cambridge University Press, 2000, Pairs of nodes may be conceptually connected to each other as in diction-
pp. 37–47. aries and citation networks.
Overview of networks 3

• Edges representing functional linking


Pairs of nodes can be connected by means of a functional relation, such as if
one gene activates another; or if a brain region is functionally connected to
another; or even when the work of a part of a machine activates the function
of another.

You may have noticed that these concepts are not completely disjoint and it is
certainly the case that we may want to interpret one network from many differ-
ent points of view. Some examples of these classes of networks are illustrated in
Figures 1.1–1.4.

Figure 1.1 An airline transportation network in Europe


4 Introduction to Network Theory

Figure 1.2 An urban street network

Figure 1.3 Relational network of concepts in network theory


History of graphs 5

Figure 1.4 A network of reactions in the Earth’s atmosphere

1.2 History of graphs


We present a brief history of graphs and highlight the emergence of some of the
key concepts which will prove useful in our analysis of networks.

1.2.1 Euler and the Königsberg Bridge


Our story starts (as many in mathematics do) with Leonhard Euler. He had
heard of a problem that the people of eighteenth century Königsberg amused
themselves with.2 The main river through the city, the Pregel, surrounds an island
and branches as it flows towards the Baltic Sea. At the time, seven bridges con-
nected the various parts of the city divided by the river. Euler’s diagram of the
city is shown in Figure 1.5.3 2 The city is now known as Kaliningrad

The problem that the city’s residents tried to solve was the following: is it pos- and is in Russia.
3 Picture taken from Euler, L., Solu-
sible to walk through the city and cross each bridge once and only once. It is not
tio problematis ad geometriam situs perti-
clear how long this problem had been taxing the populace, but nobody had found nentis, Commentarii academiae scientiarum
a suitable path when Euler entered the picture. He described the problem and his Petropolitanae, 8:128–140, 1741.
6 Introduction to Network Theory

Figure 1.5 The seven bridges of


Königsberg

solution in a paper published in 1736. He was not interested in the problem per
se, but in the fact that there was no mathematical equipment for tackling the prob-
lem despite its geometric flavour (Euler described it as an example of ‘geometry
of position’). He wanted to avoid exhaustively listing all the possible paths and
he found a way of recasting the problem that stripped it of all irrelevant features
(such as the distances between points).
Euler spotted that the key to solving the problem lay in the number of bridges
connected to each piece of land; in particular, whether the number is even or
odd. Consider the north bank of the river, labelled C, which is reached by three
bridges. Suppose a path starts on C. The first time we cross one of its bridges we
leave C, the second time we cross one of its bridges we re-enter C and the final
crossing takes us out again. On the other hand, if a path does not start on C, after
the three bridge crossings it must end on C. The south bank, B, can be reached
by three bridges, and we will only finish on B if that is where we started.
We can conclude that each parcel of land that has an odd number of bridges
must either be the start or finish of a valid route. But there are five bridges on A,
three on C, and three on D. Since it is impossible to have a route with three end
points, no valid route exists.
Euler’s insight could be extended to the more general problem involving any
number of bridges linking any number of pieces of land. He stated the solution
as follows.

If there are more than two areas to which an odd number of bridges lead, then such a
journey is impossible.
If, however, the number of bridges is odd for exactly two areas, then the journey
is possible if it starts in either of these areas.
If, finally, there are no areas to which an odd number of bridges leads, then the
required journey can be accomplished starting from any area.
With these rules, the problem can always be solved.

Euler had over-generalized in his statement: for example, suppose there is an


island on the itinerary which has no bridges connected to it. Then there can never
History of graphs 7

be a route including this island, no matter the number of even and odd areas
elsewhere. But Euler’s approach showed how network problems could be framed
in such a way that they could be analysed with mathematical rigour. Implicit in
his approach were the notion of a graph, vertices, edges, vertex degree, and paths: the
atoms of network theory.

1.2.2 The knight’s tour


While Euler was concerned with paths that visit every edge once and only once,
the knight’s tour is an example of a problem where we look for a path that visits
every vertex once and only once.
If you know anything about chess, you can show very easily that it is possible
to move a king around an empty chessboard so that it visits every square of the
board once (using valid moves for a king in chess). The same is true for a rook
and the queen; but not for a bishop (which can only visit half the squares) or a
pawn, which on an empty chessboard can only move one way. But what about
a knight? This is not so easy to determine, and it is quite tricky to keep track
Figure 1.6 A knight’s tour
of a potential route. The problem of finding a knight’s tour has been known for
more than a thousand years, and solutions have been known for nearly as long
(although the exact number of different tours is still unknown4 ). One such tour is
given in Figure 1.6.
Euler gave a systematic treatment of the knight’s tour in 1759, and this ap-
proach was extended by Alexandre-Théophile Vandermonde in an attempt to
demonstrate a notation for the concept of the geometry of position. Essentially
he reduced the chessboard to a set of 64 coordinates (m, n) with 1 ≤ m, n ≤ 8.
A knight’s tour is a list of all of these coordinate pairs arranged so that if (m1 , n1 )
follows (m0 , n0 ) then |m1 – m0 | = 2 and |n1 – n0 | = 1 or |m1 – m0 | = 1 and
|n1 – n0 | = 2.
Vandermonde showed how to exploit this notation along with some simple
symmetry properties to find tours, and that by adding additional coordinates one
could extend the idea to similar problems: he had in mind the possible shapes one
could make by braiding threads.
The knight’s tour is probably the earliest example of searching for a circuit in a
network. A circuit that visits every vertex once and only once is known as a Ham-
iltonian circuit after the Irish mathematician William Hamilton: in the nineteenth
century he had invented a board game based on finding such circuits on a do-
decahedron. While it did not prove to be a spectacular success, it did popularize
the notion of circuits and led to a more general analysis.

1.2.3 Trees
Network theory would have been of little interest today if it was limited to recre-
ational applications. During the nineteenth century, though, it became apparent 4 There are known to be over 26 trillion
that network analysis could inform many areas of mathematics and science. One different closed tours on an 8 × 8 board
of the first such applications was in calculus, for which Arthur Cayley showed which start and finish at the same square.
8 Introduction to Network Theory

a connection between certain partial derivatives and trees. A tree is a connected


network which has no circuits. We will see several in Chapter 2.
Cayley was particularly interested in rooted trees in which one particular vertex
is designated as the root. Different choices of root can lead to different representa-
tions of the same tree. Cayley derived a method for finding the number of rooted
trees with exactly n edges in terms of polynomial algebra. We omit the details but
it is worth emphasizing that Cayley’s algebraic approach was a key step along the
way to modern methods of network analysis.
Another early analysis of trees was performed by Camille Jordan. He intro-
duced the concept of the centre of a network. If one takes a tree and prunes the
leaves (the vertices which have only one edge), one is left with another tree. Prune
this tree and keep pruning and eventually you are left with a single vertex, or a
tree with two vertices. These vertices can be viewed as the centre of the tree. Other
rules for looking at trees can lead to alternative centres: we will visit the idea of
centrality at length in Chapters 14 and 15.

1.2.4 Kirchhoff and the algebra of graphs


Matrix algebra is going to be our most useful tool for network analysis. All of
the mathematicians we have mentioned so far used algebraic techniques to prove
results about graphs but the methods used were introduced on a fairly ad hoc basis
and were not easily transferrable to general problems. The person responsible for
introducing matrices to the picture was Gustav Kirchhoff, a Prussian physicist
(born in Königsberg).
You may have encountered Kirchhoff ’s laws for electrical networks. Kirchhoff
formulated these laws as part of his student project when he was 21. The laws
govern the way electricity flows through a network: subject to some loose con-
straints, the current flowing into any point must equal the flow going out and the
total (directed) voltage in a closed network must sum to zero.
Kirchhoff established his law on currents by solving a set of simultaneous
equations: each circuit in the network leads to an equation, but not all circuits
x y z leads to an independent equation. In the network illustrated in Figure 1.7 there
are three circuits: xy, yz, and xz but it should be clear that there is some depend-
ence between them: define the sum of two circuits as the set of the edges that
belong to one but not both of them and we see that each circuit is the sum of
Figure 1.7 A simple circuit the other two. Kirchhoff found an algebraic technique for determining and enu-
merating the independent circuits: a set of circuits no member of which is a sum
of any other pair. The set of linear equations he was interested in involved this
fundamental set.
While we can cast Kirchhoff ’s analysis in the language of matrix algebra, this
option was not available to him—the word ‘matrix’ did not enter the mathemat-
ical vocabulary until the 1850s (introduced by the British mathematician James
Sylvester). Kirchhoff ’s algebraic innovation was not appreciated by his contem-
What you will learn from this book 9

poraries and it was not until the twentieth century that much of the pioneering
work in network theory was revisited and recognized as an application of matrix
algebra.

1.2.5 Chemistry
In this book, the terms ‘network’ and ‘graph’ are synonymous. The use of the
word ‘graph’ in this context can be dated precisely to February 1878 where it ap-
peared in a paper by James Sylvester entitled Chemistry and Algebra. By 1850, it
was well known that molecules were formed from atoms; for example, that ethanol
had the chemical formula C2 H5 OH. Sylvester was contributing to ensuing devel-
opments into understanding the possible arrangements of the atoms in molecules.
The modern notation for representing molecules was essentially introduced by the
Scottish chemist Alexander Crum Brown in 1864. He highlighted the concept of
valency, an indication of how many bonds that each atom in the molecule must a H H
be part of—directly related to the vertex degree.
H C C O H
A representation of ethanol is given in Figure 1.8. The valency of each of the
atoms is clear and we can easily represent double and triple bonds by drawing H H
multiple edges between two atoms. Notice the underlying network in this picture.
Figure 1.8 Ethanol
These diagrams made it plain that for some chemical formulae several different
arrangements of atoms are possible: for example, propanol can be configured in
two distinct ways as shown in Figure 1.9.
The graphical representation made it clear that such isomers were an important H H H
topic in molecular chemistry. Mathematicians made a significant contribution to
H C C C O H
this embryonic field. For instance, Cayley was able to exploit his earlier work
on trees in enumerating the number of isomers of alkanes.5 Each isomer has a H H H
carbon skeleton which is augmented with hydrogen atoms. Cayley realized that
H
the number of isomers was equal to the number of different trees with n vertices
(so long as none of these vertices had a degree of more than four). Sylvester built H O H
on this work by showing that even more abstract algebraic ideas could be related
H C C C H
to the graphical structure of molecules.
The different motives of chemists and mathematicians for understanding the H H H
structures that were being uncovered meant that the disciplines diverged soon
afterwards; but more recently the ties have become closer again and network Figure 1.9 Two forms of propanol
theory can be used to design novel molecules with particular properties.

1.3 What you will learn from this book


In many situations in your professional life you will find problems in which net-
work theory will be essential for their solution. In general you will be confronted
with data concerning the connections between pairs of nodes of a network from
which you are expected to answer specific questions. These may be about the or- 5 These are molecules with the formula

ganization of that structure; its functionality; possible mechanisms of evolution or Cn H2n+2 .


10 Introduction to Network Theory

growth; and potential strategies for improving its efficiency. Suppose for instance
that you are analysing the network represented in Figure 1.10, which describes
the social communication among a group of 36 individuals in a sawmill.
You are informed that the employees were asked to indicate the frequency with
which they discussed work matters with each of their colleagues. Two nodes are
connected if the corresponding individuals frequently discuss work matters. After
studying this book you will be able to make the following conclusions.

1. The network forms a connected structure in which every employee


discusses work matters with, on average, three or four other employees.
2. The contacts among employees do not follow a Gaussian-like distribution
Figure 1.10 Employee relationships in but instead the contacts are distributed in a rather skewed manner, with
a sawmill very few employees having a large number of contacts and most of the
others have relatively few.
3. There is one employee (identified as Juan) who is very much central in the
communication with others: Ten per cent of all the contacts in the network
include him. A group of four employees, including Juan, is fundamental in
passing information around the network.
4. No pair of employees is separated by more than eight steps in the network
and, on average, every pair of employees is separated by only three steps.
5. The contacts among the employees are highly transitive. That is, if em-
ployee A discusses work matters with both B and C, there is a high chance
(about 1/3 in this case) that B also discusses work matters with C.
6. The contacts among the employees do not resemble the pattern expected
if they were created at random. Even a random model which mimics em-
ployees’ preferences for limited interactions with others does not explain
all the organizational complexity of this small network.
7. Employees with a higher number of contacts do not interact much between
themselves. Instead, they prefer to discuss work matters with employees
having very few contacts. That is, there is a certain disassortativity in the
way in which employees communicate.
8. There is a central core of employees who ‘dominate’ the communicability
in the social group.
9. The network can be divided into two almost disjoint groups of employees,
in which there are preferential connections between the two groups and
very few links inside each group.
10. Although the employees are Spanish-speaking (H) or English-speaking
(E), which is important for communication purposes, the network is not
split into such ethnic-driven communities. Instead, there are two clusters
of employees, one formed by all the employees working in a section where
the logs are planed (11H + 3E) and the other employees who work either
in the mill, the yard, or in management (22H + 5E).
What you will learn from this book 11

With this analysis in hand you will be able to understand how this small firm
is organized; where the potential bottlenecks in the communication among the
employees occur; how to improve structural organization towards an improved
efficiency and functionality; and also how to develop a model that will allow you
to simulate your proposed changes before they are implemented by the firm.
Enjoy the journey through network-land!

..................................................................................................

FURTHER READING

Barabási, A.-L., Linked: The New Science of Networks, Perseus Books, 2003.
Biggs, N.L., Lloyd, E.K., and Wilson, R.J., Graph Theory 1736–1936, Clarendon
Press, 1976.
Caldarelli, C. and Catanzaro, M., Networks: A Very Short Introduction, Oxford
University Press, 2012.
Estrada, E., The Structure of Complex Networks. Theory and Applications, Oxford
University Press, 2011.
General Concepts
2 in Network Theory

2.1 Formal definition of networks 12


2.2 Elementary graph theory concepts 13 In this chapter
2.3 Networks and matrices 16 We introduce the formal definition of a network and some of the key terms
2.4 Network connectivity 21 and ideas which we will use throughout the book. We look at matrix represen-
2.5 Graph structures 26 tations of networks and make use of the adjacency matrix to identify certain
Further reading 30 network properties.

2.1 Formal definition of networks


We have seen that networks can appear in a variety of guises but they can always
be thought of as a collection of items and the connections between them. In order
to analyse networks, we need to turn this loose statement into formal mathematical
language. To do this, we first introduce some basic set notation.
Let V be a finite set and let E ⊆ V ⊗ V , whose elements are not necessarily all
distinct. E is reflexive if (v, v) ∈ E for all v in V , it is anti-reflexive if (v, v) ∈
/ E for
all v in V and it is symmetric if (v1 , v2 ) ∈ E ⇐⇒ (v2 , v1 ) ∈ E.

Definition 2.1 A network, G, is a pair (V , E). Networks are also known as


graphs. V is called the vertex set of G; its elements are the vertices of G
(also known as nodes).

• If E is symmetric then G is an undirected network.


• If E is symmetric and anti-reflexive and contains no duplicate edges G then is
a simple network.
• If E is nonsymmetric then G is a directed network (or digraph).
Elementary graph theory concepts 13

2.2 Elementary graph theory concepts

Example 2.1

We introduce two very simple networks which we will use to illustrate some of the concepts in this chapter. They can
be represented diagrammatically as in Figure 2.1.
A natural vertex set to use for both networks is V = {1, 2, 3, 4}. Applying these vertex labels to the nodes from left
to right, the edge set of Gl is then

El = {(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2), (3, 4), (4, 3)},

and that of Gr is

Er = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 3), (2, 3), (2, 3), (3, 1), (3, 2), (3, 2), (3, 2), (3, 4), (4, 3)}.

Both networks are undirected. The one on the left is simple.

(a) Gl (b) Gr

Figure 2.1 A simple graph (Gl ) and a pseudo-multigraph (Gr )

Neither the labelling nor the diagrammatic representation of a network is unique. For example, Gl can also be expressed
as (V , Em ) where

Em = {(1, 2), (1, 3), (1, 4), (2, 1), (2, 4), (3, 1), (4, 1), (4, 2)}.

Or as in Figure 2.2.

Figure 2.2 An equi-


valent representation of
Gl
14 General Concepts in Network Theory

Sometimes when we analyse a network we are interested in the properties of


individual vertices or edges—in a rail network we may be interested in the capacity
of a particular station and the traffic that passes through. Sometimes, though, we
are primarily interested in the overall structure of the network—how closely does
the rail network resemble the road network, for example. To distinguish between
the primacy of individual vertices/edges and structure as a whole we can view
networks as labelled or unlabelled, although in many cases we use labels to simply
be able to identify particular parts of a network.
Before proceeding we list some features of graphs that are worth naming.

Definition 2.2

• Gs = (Vs , Es ) is a subgraph of G = (V , E) if Vs ⊆ V and Es ⊆ Vs ⊗



Vs E.
• A loop in a network is an edge of the form (v, v). A simple network has no
loops.
• Suppose e = (v1 , v2 ) is an edge in the network G = (V , E). v1 is incident to
and v2 is incident from e and v1 is adjacent to v2 .
Note that if v1 is adjacent to v2 in an undirected network then v2 is
adjacent to v1 .
• The networks G1 = (V1 , E1 ) and G2 = (V2 , E2 ) are isomorphic if there
is a one-to-one correspondence f : V1 → V2 such that for all u, v ∈ V1 , the
number of edges between u and v matches the number of edges between f (u)
and f (v).
• If G = (V , E) is a simple network then its complement G = (V , E) is
the simple network with the same vertices as G and where (u, v) ∈ E ⇐⇒
(u, v) ∈
/ E.

• Let G1 = (V1 , E1 ) and G2 = (V2 , E2 ) and suppose that V1 V2 = ∅. The
 
union of these two networks, G1 G2 has vertex set V1 V2 and edge set

E1 E2 . This union will have (at least) two disjoint components.

G1 and G2 are subgraphs of G1 G2 .
• Suppose that G = (V , E) and F ⊆ E. Then H = (V , E – F) is the subgraph
we obtain by removing the edges in F from G. If G  = (V , F) then we write
H = G – G. 
• The out degree of a node is the number of edges that it is incident to and its
in degree is the number of edges that it is incident from.
There is no difference between in and out degree in an undirected network:
we call this common value the degree of a node.

Examples 2.2

(i) Gl = (V , El ) is a subgraph of Gr = (V , Er ).
(V , El ) and (V , Em ) are isomorphic under the mapping f : V → V where f (1) = 3, f (2) = 2, f (3) = 4, f (4) = 1.
Gl = (V , {(1, 4), (4, 1), (2, 4), (4, 2)}), illustrated in Figure 2.3.
Letting F = {(1, 1), (2, 3), (2, 3), (3, 2), (3, 2)} gives (V , El ) = (V , Er ) – (V , F).
Elementary graph theory concepts 15

3 4

Figure 2.3 The complement, Gl , of the graph Gl


(ii) The degrees of the vertices in Gl are 2, 2, 3, and 1. In Gr they are 3, 4, 5, and 1.
(iii) A vertex in a network which has degree 0 is referred to as an isolated vertex. If a vertex has degree 1 it is called
an end vertex or a pendant vertex.
For example, in Figure 2.3, 3 is an isolated vertex of Gl while 1 and 2 are end vertices.
(iv) In a complete network (see Figure 2.4) there is an edge between every distinct node. In a complete network with
n nodes, the degree of each node is n – 1. We will use the notation Kn to denote this network.

Figure 2.4 The complete networks K4 and K5

(v) A network with n nodes but no edges is called a null graph. Every node is isolated. We write Nn to denote this
network. It is not particularly exciting, but it arises in many theorems.
(vi) A network is regular if all nodes have the same degree. If this common degree is k, the network is called k-regular
or regular of degree k. The null graph is 0-regular and a single edge connecting two nodes is 1-regular. Kn is
(n – 1)-regular. For 0 < k < n – 1 there are several different k-regular networks with n nodes.
(vii) In a cycle graph with n nodes, Cn , the nodes can be ordered so that each is connected to its immediate neighbours.
It is 2-regular. If we remove an edge from Cn we get the path graph, Pn–1 . If we add a node to Cn and connect it
to every other node we get the wheel graph, Wn+1 . Examples are given in Figure 2.5.

(a) C6 (b) P5 (c)W6

Figure 2.5 A cycle, a path, and a wheel with six nodes


16 General Concepts in Network Theory

2.3 Networks and matrices


To record the properties of an unlabelled network, matters are often simplified
if we add a generic set of labels to the nodes. For example, in a network with n
vertices we can label each one with an element from the set V = {1, 2, . . . , n}.

Definition 2.3 Suppose G = (V , E) is a simple network where V = {1, 2, . . . , n}.


For 1 ≤ i, j ≤ n define

1, (i, j) ∈ E,
aij =
0, (i, j) ∈
/ E.

Then the square matrix A = (aij ) is called the adjacency matrix of G.

The adjacency matrix gives an unambiguous representation of any simple net-


work. In much of the rest of this book we will look at how concepts in matrix
algebra can be applied to networks and interpreting the results. Directed networks
and networks with loops have adjacency matrices, too.
One or two of our examples so far have had duplicate edges. We can represent
this in our adjacency matrix by letting the appropriate entries in the adjacency
matrix equal the number of connections between nodes.
Any square matrix can be interpreted as a network: call aij the weight of an
edge and the network induced by A is a weighted network. Weighted networks are
very useful if we want to assign a hierarchy to edges in a network, but in this book
their appearances are very rare.

Example 2.3

(i) Adjacency matrices for Gl and Gr (see Figure 2.1) are


⎡ ⎤ ⎡ ⎤
0 1 1 0 1 1 1 0
⎢1 0⎥ ⎢1 0⎥
⎢ 0 1 ⎥ ⎢ 0 3 ⎥
⎢ ⎥ and ⎢ ⎥.
⎣1 1 0 1⎦ ⎣1 3 0 1⎦
0 0 1 0 0 0 1 0

(ii) If G is a simple network with adjacency matrix A then its complement, G, has adjacency matrix E – I – A, where
E is a matrix of ones.
(iii) Starting with the cycle graph Cn we can add edges so that each node is linked to its k nearest neighbours clockwise
and anticlockwise. The resulting network is called a circulant network and its adjacency matrix is an example of
a circulant matrix. For example, if n = 7 and k = 2 the adjacency matrix is
Networks and matrices 17

⎡ ⎤
0 1 1 0 0 1 1
⎢1 0 1 1 0 0 1⎥
⎢ ⎥
⎢1 0⎥
⎢ 1 0 1 1 0 ⎥
⎢ ⎥
⎢0 1 1 0 1 1 0⎥ .
⎢ ⎥
⎢0 0 1 1 0 1 1⎥
⎢ ⎥
⎣1 0 0 1 1 0 1⎦
1 1 0 0 1 1 0

By taking an edge-centric view of a network, one can come up with another


way of representing a network in matrix form.

Definition 2.4 Suppose G = (V , E) is a network where V = {1, 2, . . . , n} and


E = {e1 , e2 , . . . , em } with ei = (ui , vi ).
For 1 ≤ i ≤ m and 1 ≤ j ≤ n define


⎨ 1, ui = j,
bij = –1, vi = j,

⎩ 0, otherwise.

Then the rectangular matrix B = (bij ) is called the incidence matrix of G. If


ei is a loop then we set biui = 1 and every other element of the row is left as zero.

If the edge (u, v) is in a simple network then so is (v, u). We will only include
one of each of these pairs (it doesn’t matter which) in the incidence matrix. In this
case, you may find in some references that the incidence matrix is defined so that
all the nonzero entries are set to one and our definition of the incidence matrix is
known as the oriented incidence matrix. There are many different conventions
for including loops in incidence matrices. Since we are primarily concerned with
simple networks it doesn’t really matter which convention we use.
We will look more at the connections between the adjacency and incidence
matrices when we look at the spectra of networks.

Example 2.4

Consider, again, Gl and Gr . Gl is simple and we can write its incidence


matrix as
⎡ ⎤
1 –1 0 0
⎢1 0⎥
⎢ 0 –1 ⎥
⎢ ⎥.
⎣0 1 –1 0⎦
0 0 1 –1

continued
18 General Concepts in Network Theory

Example 2.4 continued

For (V , Er ) the incidence matrix is


⎡ ⎤
1 0 0 0
⎢ 1 –1 0 0⎥
⎢ ⎥
⎢ 1 0 –1 0⎥
⎢ ⎥
⎢–1 0⎥
⎢ 1 0 ⎥
⎢ ⎥
⎢ 0 1 –1 0⎥
⎢ ⎥
⎢ 0 1 –1 0⎥
⎢ ⎥
⎢ 0 0⎥
⎢ 1 –1 ⎥.
⎢–1 0⎥
⎢ 0 1 ⎥
⎢ ⎥
⎢ 0 –1 1 0⎥
⎢ ⎥
⎢ 0 –1 1 0⎥
⎢ ⎥
⎢ 0 –1 1 0⎥
⎢ ⎥
⎣ 0 0 1 –1⎦
0 0 –1 1

We favour matrix representations of networks in this book as they are particularly


amenable to analysis and offer such a simple way of representing a network.
Diagrammatic representations have a role to play—a picture can often convey
information in an enlightening, evocative, or provocative way—but they can be
very confusing if the network is large and they can also mislead.

2.3.1 Walks and paths


Networks highlight direct connections between nodes but indirect connections
implicit in a network are frequently just as important, if not more so. If nodes are
not linked by an edge, but have one (or more) nodes in common, or if there are
several intermediate nodes in the way, we will usually assume that information can
be passed from one to the other.

Definition 2.5 A walk in a network is a series of edges (not necessarily distinct)

(u1 , v1 ), (u2 , v2 ), . . . , (up , vp ),

for which vi = ui+1 (i = 1, 2, . . . , p – 1). If vp = u1 then the walk is closed.


A trail is a walk in which all the edges are distinct. A path is a trail in
which all the ui are distinct. A closed path is called a cycle or circuit. A graph
with no cycles is called acyclic. A cycle of length 3 is called a triangle. The
walk/trail/path length is given by p.
Networks and matrices 19
p
We can enumerate walks using the adjacency matrix, A. The entries of A tell
us the number of walks of length p between each pair of nodes.

Example 2.5

(i) A path of length p in a network induces a subgraph Pp and a cycle of


length p induces the subgraph Cp .
(ii) In Gl we can identify a single triangle 1 → 2 → 3 → 1 (see Figure 2.1).
The walk 4 → 3 → 2 → 1 → 3 is a trail. 4 → 3 → 2 → 1 is a path.
Successive powers of its adjacency matrix, A, are

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 1 1 2 3 4 1 7 6 6 4
⎢1 1⎥ ⎢3 1⎥ ⎢6 7 6 4⎥
⎢ 2 1 ⎥ ⎢ 2 4 ⎥ ⎢ ⎥
A2 = ⎢ ⎥ , A3 = ⎢ ⎥ , A4 = ⎢ ⎥.
⎣1 1 3 0⎦ ⎣4 4 2 3⎦ ⎣6 6 11 2⎦
1 1 0 1 1 1 3 0 4 4 2 3

Note that the diagonal of A2 gives the degree of each node. Can you
explain why this is true for all simple graphs? Can you find all four walks
of length three from node 1 to 3? How about all seven closed walks of
length four from node 1?

Note that the shortest walk in a network between distinct nodes is also the
shortest path. This can be formally established by an inductive proof.
A walk of length one must also be a path. For a longer walk between nodes u
and v simply consider the walk without the last edge (that connects w and v, say).
This must be the shortest such walk from u to w (why?) and therefore it is a path.
This path does not contain the node v (why?) and the edge (w, v) is not a part of
this path, so the walk from u to v must also be a path.
Note that the shortest closed walk is not necessarily the shortest cycle. In a
simple network one can move along any edge back and forth to find a closed
walk of length two from any node with positive degree. But the shortest cycle in
any simple network is at least three. There are general techniques for finding the
shortest cycle in any network, but we will not discuss them here.
We can find the shortest walk between any two nodes by looking at successive
powers of A: it will be the value of p for which the (i, j)th element of Ap is first
nonzero.
20 General Concepts in Network Theory

Examples 2.6

(i) Consider C5 (Figure 2.6). Can you write down A2 and A3 just by looking at the network?

⎡ ⎤
0 1 0 0 1
⎢1 0⎥
⎢ 0 1 0 ⎥
⎢ ⎥
A = ⎢0 1 0 1 0⎥
⎢ ⎥
⎣0 0 1 0 1⎦
1 0 0 1 0

Figure 2.6 The cycle graph C5 and its adjacency matrix

(ii) The adjacency matrix


⎡ ⎤
0 1 0 0 0
⎢0 0⎥
⎢ 0 1 0 ⎥
⎢ ⎥
A = ⎢0 0 0 1 0⎥
⎢ ⎥
⎣0 0 0 0 1⎦
0 0 0 0 0
can be sketched as in Figure 2.7. What are the powers of A? How do you interpret this?

Figure 2.7 A directed graph

(iii) Consider the network in Figure 2.8.

Figure 2.8 Ga , a network with eight nodes

Using a natural ordering of the nodes, we can write the adjacency matrix, A, and A2 , and A3 as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 3 0 1 1 0 2 0
⎢1 0 1 0 0 1 0 0⎥ ⎢0 3 0 1 1 0 2 0⎥ ⎢3 0 6 0 0 6 0 2⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 1 0 1 0 0 1 0⎥ ⎢1 0 3 0 0 2 0 1⎥ ⎢0 6 0 3 2 0 0⎥
⎢ ⎥ ⎢ ⎥ ⎢ 6 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 0 1 0 0 0 0 0⎥ ⎢0 1 0 1 0 0 1 0⎥ ⎢1 0 3 0 0 2 0 1⎥
⎢ ⎥, ⎢ ⎥, ⎢ ⎥.
⎢0 0 0 0 0 1 0 0⎥ ⎢0 1 0 0 1 0 1 0⎥ ⎢1 0 2 0 0 3 0 1⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 1 0 0 1 0 1 0⎥ ⎢1 0 2 0 0 3 0 1⎥ ⎢0 6 0 2 3 0 6 0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣0 0 1 0 0 1 0 1⎦ ⎣0 2 0 1 1 0 3 0⎦ ⎣2 0 6 0 0 6 0 3⎦
0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 2 0 1 1 0 3 0
Network connectivity 21

Comparing these matrices, we see that the only entries that are always zero are (1, 8), (4, 5), and their symmetric
counterparts. These are the only ones which cannot be connected with paths of length three or fewer. We also see, for
example, that the minimum path lengths from node two to the other nodes are 1, 1, 2, 2, 1, 2, and 3, respectively.
All of this information is readily gleaned from the diagrammatic form of the network, but the adjacency matrix
encodes all this information in a way that can be manipulated algebraically.
Notice that every entry of the diagonal of A2 is nonzero. None of these entries represents a circuit: they are ‘out and
back again’ walks along edges. Since the diagonal of A3 is zero, there are no triangles in the network: the only circuits
are permutations of 2 → 3 → 7 → 6 → 2.

2.4 Network connectivity


A network is connected if there is a path connecting any two nodes. Clearly, con-
nectivity is a significant property of a network. In an undirected network it is
simple to identify connectivity: if it is not connected then there will be a part of
the network that is completely separated from the rest. For a directed network it
can take a little more work to confirm connectivity. At first glance it may not be
obvious that there are culs-de-sac from which one cannot escape once entered.
There are a number of ways of confirming connectivity. We show how this can
be done using the adjacency matrix. To do this, we will make use of permutation
matrices: P ∈ Rn×n is a permutation matrix if premultiplying/postmultiplying a
matrix by P simply permutes its rows and columns. To form a permutation matrix
we simple permute the rows of the identity matrix.

Problem 2.1
Show that a network with adjacency matrix A is disconnected if and only if there
is a permutation matrix P such that
 
X Y
A=P PT .
O Z
First we establish sufficiency. If A has this form then, since X and Z are square,
 
X2 XY + YZ
A2 = P PT .
O Z2

That is, the zero block remains and a simple induction confirms its presence in all
powers of A. This means that there is no path of any length between nodes cor-
responding to the rows of the zero block and nodes corresponding to its columns,
so by definition the network is not connected.
For necessity, we consider a disconnected network. We identify two nodes for
which there is no path from the first to the second and label them n and 1. If there
are any other nodes which cannot be reached from n then label these as nodes
2, 3, . . . , r. The remaining nodes, r + 1, . . . , n – 1 are all accessible from n. There
can be no path from a node in this second set to the first. For if such a path existed
(say between i and j) then there would be a path from n to j via i.
22 General Concepts in Network Theory

If there is no path between two nodes then they are certainly not adjacent.
Hence aij = 0 if r < i ≤ n and 1 ≤ j ≤ r, thus
 
X Y
A=
O Z

where X ∈ Rr×r and Z ∈ R(n – r)×(n – r) are square.

If a network is not connected then it can be divided into components each of


which is connected. In an undirected network the components are disjointed. In
a directed network you can leave one component and enter another, but not go
back. A component in a directed network that you cannot exit is referred to as
strongly connected.
Connectivity can be associated with the number of edges in a network, since
the more edges there are the more likely one should be able to find a path between
one node and another.

Problem 2.2
Show that if a simple network, G, has n nodes, m edges, and k components then
1
n – k ≤ m ≤ (n – k)(n – k + 1).
2
The lower bound can be established by induction on m. The result is trivial if
G = Nn , the null graph. Suppose G has m edges. If one removes a single edge
then the new network has n nodes, m – 1 edges, and K components where K = k
or K = k + 1. By the inductive hypothesis, n – K ≤ m – 1 and so n – k ≤ m.
For the upper bound, we note that if a network with n nodes and k components
has the greatest possible number of edges then every one of its components is a
complete graph. We leave it to the reader to show that we attain the maximum
edge number if k – 1 of the components are isolated vertices. The number of
edges in Kn – k + 1 is (n – k)(n – k + 1)/2.
Note that we can conclude that any simple network with n nodes and at least
(n – 1)(n – 2)/2 + 1 edges is connected.

Examples 2.7

(i) Of all networks with n nodes, the complete graph, Kn , has most edges. There are n – 1 edges emerging out of
each of the n nodes. Each of these edges is shared by two nodes. Thus the total number of edges is n(n – 1)/2.
Kn has a single component.
(ii) Of all networks with n nodes, the null graph, Nn , has most components, namely n. Nn has no edges.

(iii) Consider the network Gl Ga (where Gl and Ga were defined in Examples 2.1 and 2.6, respectively). It has
n = 12 nodes, m = 12 edges, and k = 2 components. Clearly n – k < m and m < (n – k)(n – k + 1)/2.
In Gl , n = 4, k = 2 and m = 2 giving n – k = m and m < (n – k)(n – k + 1)/2.
In Gl , n = 4 and m = 4. Since m > (n – 1)(n – 2)/2 we know it must be connected without any additional
information.
Network connectivity 23

Consider a network representing social relationships. One would expect


certain parts of the network to be more connected than others. That is, one would
expect groups of mutual acquaintances to be linked together by more tenuous
connections. This is definitely a matter that is worth analysing, so let us formalize
some ideas about connectedness. We can compare networks by having a meas-
ure of connectivity. Note that this measure can be applied to each subgraph of a
network.

Definition 2.6 Suppose G = (V , E) is a (simple) network.

• A disconnecting set of G is any subset S ⊆ E for which the network


(V , E – S) has more components than G.
• S is a cut-set of G if it contains no proper subsets that are disconnecting sets.
• A disconnecting set is a bridge if it contains only one edge.
• The edge connectivity of a network G = (V , E) is the number of edges in
its smallest cut-set and is denoted by λ(G). If λ(G) ≥ k then G is k-edge
connected.

Examples 2.8

(i) In Ga , a disconnecting set must contain an edge incident to an end


vertex or two edges from the central cycle.
A cut-set contains either one edge or two.
(ii) In the path network Pn , every non-empty set of edges is a disconnecting
set and every individual edge forms a cut-set.
(iii) Ga and Gl both have an edge connectivity of one.
(iv) For the complete network Kn to become disconnected a set of nodes,
say {1, 2, . . . , k}, must become isolated from {k + 1, k + 2, . . . , n}. For
each of the k nodes, n – k links need to be severed, thus a cut-set must
contain k(n – k) edges.
The smallest cut-set arises when k = 1. Hence λ(Kn ) = n – 1. This is
the maximal value edge-connectivity that can be obtained by a network
with n nodes.

One can also view connectivity from the perspective of vertices. A set of nodes
in a connected network is called a separating set if their removal (and the re-
moval of incident edges) disconnects the graph. If the smallest such set has size
k then the network is called k-connected and its connectivity, denoted κ(G), is k.
If κ(G) = 1 then a node whose removal disconnects the network is known as a
cut-vertex.
24 General Concepts in Network Theory

Example 2.9

(i) The left-hand network in Figure 2.9 is 1-connected. For the middle, κ =
2. k-connectivity does not really make sense in the right-hand network,
K4 . We cannot isolate any nodes without removing the whole of the rest
of the network, but by convention we let κ = n – 1 for Kn .

Figure 2.9 Three networks with different connectivity properties

Notice that for each network the minimal separating set must contain a
node of maximal degree.
(ii) Figure 2.10 shows a network representing relationships between power-
ful families in fifteenth-century Florence. Notice that the Medici family
are a cut-vertex but no other family can cut out anything other than end
vertices.

LAMBERTES BISCHERI

GUADAGNI PERUZZI
STROZZI
CASTELLAN
RIDOLFI ALBIZZI
TORNABUON
MEDICI GINORI

SALVIATI BARBARADORI
PAZZI ACCIAIUOL
Figure 2.10 Socio-economic ties between fifteenth-century Floren-
tine families

(iii) In Figure 2.11, κ(G) = 2. Can you identify all the separating sets of size
two?
Network connectivity 25

Figure 2.11 A network with connect-


ivity of two

The network also has edge-connectivity of two.


(iv) If a connected network has an end vertex then κ = 1 and any node
adjacent to an end vertex is a cut vertex.

Notice that in Figure 2.11 the network has a number of distinct parts which
are highly connected: there are a number of subgraphs that are completely con-
nected. This is a sort of structure that arises in many practical applications and
is worth naming. Any subgraph of a simple network that is completely connected
is called a clique. The biggest such clique is the maximal clique. As with many of
the concepts we have seen, we can guarantee the existence of cliques of a certain
size in networks with sufficient edges. It was established over a century ago that
a simple graph with n nodes and more than n2 /4 edges must contain a triangle.
This result was extended in 1941 by the Hungarian mathematician Turán who
showed that if a simple network with n nodes has m edges then there will be a
clique of at least size k if

n2 (k – 2)
m> .
2(k – 1)

Examples 2.10

(i) To prove his result, Turán devised a way of constructing the network with as many edges as possible with n nodes
and maximal clique size k.
To do this, one divides the nodes into k subsets with sizes as equal as possible. Nodes are connected by an edge
if and only if they belong to different subsets. We use Tn,k to denote this network. T5,3 is illustrated in Figure 2.12
for which m = 8.
Can you find the cliques of size four that are created by adding an extra edge?

continued
26 General Concepts in Network Theory

Examples 2.10 continued

Figure 2.12 T5,3

(ii) Tn,2 is the densest graph without any triangles. T6,2 is illustrated in Figure 2.13 along with the adjacency matrix,
A. T6,2 has a special structure which we will discuss in more detail.
⎡ ⎤
0 0 0 1 1 1
⎢0 1⎥
⎢ 0 0 1 1 ⎥
⎢ ⎥
⎢0 0 0 1 1 1⎥
A=⎢ ⎥
⎢1 1 1 0 0 0⎥
⎢ ⎥
⎣1 1 1 0 0 0⎦
1 1 1 0 0 0
Figure 2.13 T6,2

Triangles can be identified by looking at the diagonal of the cube of the adjacency matrix. It is easy to show that
A3 = 9A. The zero diagonal of A3 confirms the absence of triangles.

2.5 Graph structures

2.5.1 Trees
The word ‘tree’ evokes a similar picture for most people, and we can use it to
describe a particular structure in a network that frequently arises in practice. We
encountered trees in our history lesson.

Definition 2.7 A tree is a connected network with no cycles. A forest is a union of


trees.

There are lots and lots of trees! There are nn–2 distinct (up to isomorphism)
labelled trees with n nodes. For n = 1, 2, 3, 4, 5, 6 this gives 1, 1, 3, 16, 125, 1296
trees before truly explosive growth sets in. Counting unlabelled trees is much
harder, and there is no known formula in terms of the number of nodes but their
abundance appears to grow exponentially in n.
Graph structures 27

Examples 2.11

(i) Figure 2.14 illustrates a number of trees.

Figure 2.14 Examples of trees

(ii) There are only two different unlabelled trees with four nodes, as illustrated in Figure 2.15. The left-hand tree
can be labelled in four ways, but only in 12 distinct ways since one half are just the reverse of the other. Once we
label the pivotal node of the right-hand tree (for which we have four choices) all labellings are equivalent.

Figure 2.15 Two unlabelled trees

Suppose G is a connected network with cycles. Then we can break a cycle by


removing an edge. The network is still connected. We can keep doing this until
all cycles are removed and we end up with a tree that connects all the nodes of G.
Such a tree is called a spanning tree.

Examples 2.12

(i) In Figure 2.16 we illustrate a network and two of the possible spanning trees.

Figure 2.16 A network and two of its spanning trees

(ii) The notion of a spanning tree can be expanded to disconnected networks. If we form a spanning tree for each
component and take their union, the result is a spanning forest.
(iii) The number of edges in a spanning tree/forest of G is called its cut-set rank, denoted by ξ (G). If a network has k
components ξ (G) = n – k. The number of edges removed from G to form the forest, m – n + k, is known as the
cycle rank and is denoted by γ (G).
For example, in (i) above, γ (G) = 3 and ξ (G) = 6.
28 General Concepts in Network Theory

2.5.2 Bipartite graphs


In any big city on a Saturday afternoon, thousands of people are out in the shops.
Imagine you construct a network to record activity of business between 2pm
and 3pm. Nodes represent people and an edge between two individuals repre-
sents the relationship ‘enters into a financial transaction with’. Almost all edges in
the network will represent a transaction between a shop-worker and a customer
(exceptions would exist as some of the shop-workers may make a purchase them-
selves) but to all intents and purposes we should expect a network which contains
two groups of nodes which have many edges between them and very few within
a group.

Definition 2.8 A network G = (V , E) is bipartite if the nodes can be divided into



disjoint sets V1 V2 such that (u, v) ∈ E ⇒ u ∈ Vi , v ∈ Vj , i = j .

There are many networks in real applications that are exactly or nearly bi-
partite. In chapter 18 we will look at how to measure how close to bipartite a
network is in order to infer other properties. For now, we briefly discuss some of
the properties an exactly bipartite network possesses.

Examples 2.13

(i) The Turán network, Tn,2 , is bipartite. Recall that T6,2 has the adjacency
matrix
⎡ ⎤
0 0 0 1 1 1
⎢0 0 0 1 1 1⎥
⎢ ⎥
⎢ ⎥
⎢0 0 0 1 1 1⎥
A=⎢ ⎥.
⎢1 1 1 0 0 0⎥
⎢ ⎥
⎣1 1 1 0 0 0⎦
1 1 1 0 0 0

In general, if n is even, Tn,2 has adjacency matrix


 
O E
A=
E O

where E is an (n/2) × (n/2) matrix of ones.1 It is straightforward to


show that
   
2k 2k E O 2k+1 2k+1 O E
A = (n/2) , A = (n/2) .
O E E O

(ii) In the complete bipartite graph every node in V1 is connected to every


node in V2 . If V1 has m nodes and V2 has n we can denote this graph
1 If as Km,n . The Turán networks Tk,2 are complete bipartite graphs, for
 n is odd,
 the structure is similar:
O ET example T6,2 = K3,3 .
2 × 2 .
where E is n–1 n+1
A=
E O
Graph structures 29

(iii) The k-cube, Qk , is a network representing the connections between ver-


tices in a k-dimensional cube. A diagrammatic representation of Q3 is
shown in Figure 2.17.

Figure 2.17 The 3-cube Q3

The vertices of the unit k-cube have coordinates (x1 , x2 , . . . , xk ) where


xi = 0 or 1. Vertices are adjacent if their coordinates differ in only one
place. We can divide the coordinates into those whose sum is even and
those whose sum is odd. Vertices in these sets cannot be adjacent and
so the network is bipartite.

The adjacency matrix, A, of a bipartite network has a characteristic structure.


The division of the nodes into two groups means that there must be a permutation
P such that
 
O X
A=P PT .
Y O

In an undirected network, Y = X T . It is straightforward to show that


   
2k (XY )k O T 2k+1 O (XY )k X
A =P P and A =P PT .
O (YX )k (YX )k Y O

The odd powers have a zero diagonal hence every cycle in a bipartite network has
even length.
Bipartivity can be generalized to k-partivity. A network is k-partite if its nodes
can be partitioned into k sets V1 , V2 , . . . , Vk such that if u, v ∈ Vi then there is no
edge between them.

Examples 2.14

(i) Trees are bipartite. To show this, pick a node on a tree and colour it black. Then colour all its neighbours white.
Colour the nodes adjacent to the white nodes black and repeat until the whole tree is coloured. This could only
break down if we encounter a previously coloured node. If this were the case, we would have found a cycle in
the network. The nodes can then be divided into black and white sets. We show some appropriate colourings of
trees in Figure 2.18.

continued
30 General Concepts in Network Theory

Examples 2.14 continued

Figure 2.18 A demonstration of bipartitivity in


trees through a 2-colouring

(ii) The maximal clique in a bipartite network has size 2, since Kn has odd cycles for n > 2.
(iii) The n node star graph, S1,n–1 has a single central node connected to all other n – 1 nodes and no other edges. S1,5
is illustrated in Figure 2.19.

Figure 2.19 The star graph S1,5

S1,n–1 is a tree, and is also the complete bipartite graph K1,n–1 .


(iv) If G is 3-partite then its adjacency matrix can be permuted into the form
⎡ ⎤
O X1 X2
⎢ ⎥
⎣ 1
Y O X3 ⎦ .
Y2 Y3 O

Odd cycle lengths are possible.

..................................................................................................

FURTHER READING

Aldous, J.M. and Wilson, R.J., Graphs and Applications: An Introductory Approach,
Springer, 2003.
Bapat, R.B., Graphs and Matrices, Springer, 2011.
Chartrand, G. and Zhang, P., A First Course in Graph Theory, Dover, 2012.
Wilson, R.J., Introduction to Graph Theory, Prentice Hall, 2010.
How To Prove It
3
In this chapter
3.1 Motivation 31
We motivate the necessity for rigorous proofs of results in network theory.
Then we give some advice on how to prove results by using techniques such 3.2 Draw pictures 32

as induction and proof by contradiction. At the same time we encourage 3.3 Use induction 34
the student to use drawings and counterexamples and to build connections 3.4 Try to find a counterexample 35
between different concepts to prove a result. 3.5 Proof by contradiction 36
3.6 Make connections between
concepts 37
3.7 Other general advice 38
3.1 Motivation Further reading 38

You may have noticed that in this book we have not employed the ‘Theorem–
Proof ’ structure familiar to many textbooks in mathematics, and when you read
the title of this chapter maybe you thought, “Why should I care about proving
things rigorously?”. Let us start by considering a practical problem. Suppose you
are interested in constructing certain networks displaying the maximal possible
heterogeneity in their degrees. You figure out that a network in which every node
has a different degree will do it and you try by trial-and-error to construct such
a network. However, every time you attempt to draw such a network you end up
stymied by the fact that there is always at least one pair of nodes which share the
same degree. You try as hard as possible and may have even used a computer to
help you in generating such networks, but you have always failed. You then make
the following conjecture.

In any simple network with n ≥ 2 nodes, there should be at least two nodes which have
exactly the same degree.

But, are you sure about this? Is it not possible that you are missing something
and such a dreamed network can be constructed? The only way to be sure about
this statement is by means of a rigorous proof that it is true. Such a proof is just a
deductive argument that such a statement is true. Indeed it has been said that:

Proofs are to the mathematician what experimental procedures are to the experimental 1 Rav, Y., Why do we prove theorems?
scientist: in studying them one learns of new ideas, new concepts, new strategic- Philosophia Mathematica 15 (1999)
devices which can be assimilated for one’s own research and be further developed.1 291–320.
32 How To Prove It

There are many ways to establish the veracity of your conjecture. Here is a
very concise argument. First observe that in your network of n nodes, no node
can have degree bigger than n – 1. So for a completely heterogeneous set of de-
grees you may assume that the degrees of the n nodes of your imagined network
are 0, 1, 2, . . . , n – 2, n – 1. The node with degree n – 1 must be connected to all the
other nodes in the network. However, there is a node with degree 0, which contra-
dicts the previous statement. Consequently, you have proved by contradiction that
the statement is true. It has become a theorem and you can now be absolutely
convinced that such a dreamed network cannot exist.
When you read these statements and their proofs in any textbook they usually
look so beautiful, short, and insightful that your first impression is: “I will never
be able to construct something like that”. However, such statement of a theorem
is usually the result of a long process in which hands have possibly got dirty on
the way; some not so beautiful, short, and insightful sketches of the proof were
advanced and then distilled until the last proof was produced. You, too, should
be able to produce such beautiful and condensed results if you train yourself and
know a few general rules and tricks. You can even create an algorithmic scheme
to generate such proofs. Something of the following sort.

1. Read carefully the statement and determine what the problem is asking you
to do.
2. Determine which information is provided and which assumptions are
made.
3. Try getting your hands dirty with some calculations to see yourself how
the problem looks in practice.
4. Plan a strategy for attacking the problem and select some of the many
techniques available to prove a theorem.
5. Sketch the proof.
6. Check yourself if your solution is the one asked for by the problem stated.
7. Simplify the proof as much as possible by eliminating all the superfluous
statements, assumptions, and calculations.

Some of these techniques for proving results in network theory are provided
in this chapter as a guide to students for solving their own problems. Have a look
through Chapter 2 and you will see that we have used some of these techniques in
the examples and problems. Hopefully, with practice, you can use them to solve
more general problems that you find during your independent work.

3.2 Draw pictures


A well-known adage says that ‘A picture is worth a thousand words’ and network
theory is a discipline where this is particularly true. But even though network
Draw pictures 33

theory is very pictorial (and this book is filled with figures of networks), it is still
as analytic and rigorous as any branch of mathematics. There are many situations
in which it is not obvious as to how one starts solving a particular problem. A
drawing or a sketch can help trigger an idea that eventually leads to the solution.
Let us illustrate this situation with a particular example.

Example 3.1

Suppose you have been asked the following.

Show that if there is a walk from the node v1 to v2 , such that v1 = v2 , then there is a path from v1 to v2 .

1. You can start by sketching a walk as illustrated in Figure 3.1.

Figure 3.1 A walk of length seven in


a network between nodes 1 and 6

2. Notice that to establish a path between 1 and 6 exists we simply have to avoid visiting the same vertex twice.
For instance, avoiding visiting vertex 5 twice, we obtain the path shown in Figure 3.2(a) and if we avoid visiting
node 2 twice we obtain the path illustrated in Figure 3.2(b).

(a) (b)

Figure 3.2 Two walks in a network that avoid visiting a node twice

continued
34 How To Prove It

Example 3.1 continued

We can proceed to write these findings mathematically.

1. Let W1 = v0 , v1 , . . . , vp–1 , vp , . . . , vq , vq+1 , . . . , vk–1 , vk be a walk between v0 and vk .


2. If all the nodes in W1 are distinct, then W1 is a path and we are done.
3. If all the vertices are not different, select a pair of identical ones, say vp = vq , where p < q.
4. Write W2 = v0 , v1 , . . . , vp , vq+1 , . . . , vk–1 , vk , which is a walk between v0 and vk and it is shorter than W1 .
5. If all the vertices in W2 are distinct, then W2 is the required path. Otherwise, select another repeated pair of
nodes and proceed as before.

3.3 Use induction


Induction is a powerful technique for solving analytic problems and you have
surely encountered it previously. In network theory, induction is one of the most
powerful tools for solving problems. The idea is that you show something for small
networks that can be inductively extended to all the networks you are studying to
prove the result.

Example 3.2

Suppose that you have been asked to do the following.

Prove that a connected network with n nodes is a tree if and only if it has exactly m = n – 1 edges.

You can proceed in the following way to obtain the proof.

1. First, gain some insights by drawing some pictures as we have recommended before.
2. For the if (⇒ ) part of the theorem we proceed as follows.
(a) Start by assuming that the network is a tree with n nodes.
(b) We can easily verify that the result is true for n = 1, because this corresponds to a network with one node
and zero edges, which is also a tree.
(c) Suppose now that the result is true for any k < n.
(d) Select a tree with n nodes and remove one edge. Because it is a tree, the result is a network with two disjoint
connected components, each of which is a tree with n1 and n2 nodes, respectively.
(e) Because n1 < n and n2 < n, by the induction hypothesis the result is true for these two trees. Hence they
have n1 – 1 and n2 – 1 edges, respectively.
(f) As n = n1 + n2 we can verify that, returning the edge that was removed, the total number of edges in the
network is m = (n1 – 1) + (n2 – 1) + 1 = n1 + n2 – 1 = n – 1, which proves the (⇒ ) part of the theorem.
Try to find a counterexample 35

3. For the only if (⇐) part of the theorem we proceed as follows.


(a) Suppose that a connected network is not a tree and it has n nodes. Since it is not a tree it has cycles and
therefore there are edges which are not bridges.
(b) Select an edge which is not a bridge and remove it.
(c) If the resulting network is not a tree repeat the previous step until you end up with a tree.
(d) Because you have ended up with a tree it must have n – 1 edges.
(e) Because you have removed k > 0 edges from the network until it had n – 1 edges you can conclude that the
original network has m > n – 1 edges and we are done.

Notice that we only used induction in the ‘if ’ part of the proof. We are free to combine as many individual techniques
as we like in creating proofs.

3.4 Try to find a counterexample


On many occasions it is good advice when trying to prove something to try finding
a counterexample. It does not mean that such a counterexample exists (particu-
larly if the result you are trying to prove is true), but in trying to build such a
counterexample you find a fundamental obstruction that guides you toward the
proof. In fact, many budding pure mathematicians attempting to prove their first
complex result are given the following advice, which we pass on to you: spend
half your time trying to prove the theorem and the other half of your time trying
to disprove it. The expectation is that this holistic view of the process of proof will
give you additional insight—that by trying to contradict yourself you will under-
stand how to overcome the hurdles preventing you from a positive proof. This will
work equally well if it turns out that the result you are trying to establish is false.

Example 3.3

Suppose that you have been asked to prove the following.

A network is bipartite if and only if it contains no cycle of odd length.

You can start by drawing a bipartite network (following some previous advice). Now try to add a triangle to the
network. You will immediately realize that to add a triangle you necessarily need to connect two nodes which are in the
same disjoint set of the bipartite network. Thus, the graph to which you have added the triangle is no longer bipartite.
The same happens if you try a pentagon or a heptagon. Thus, a key ingredient in your proof should be the fact that
the existence of odd cycles necessarily implies connections between nodes in the same set of the bipartition, which
necessarily means destroying the bipartivity of the graph. We will see how to use this fact to prove this result using a
powerful technique in Section 3.5.
36 How To Prove It

3.5 Proof by contradiction


Contradiction is another powerful theoretical tool for solving problems in network
theory. It is also a very intuitive and convincing method of proving a result.

Example 3.4

Let us try to prove now the result stated in Example 3.3.

A network is bipartite if and only if it contains no cycle of odd length.

For instance, assume that G is a bipartite network and that it contains an odd cycle (a contradiction according to
this statement). We can then proceed as follows.
To prove necessity (⇒)

1. Let V1 and V2 be the two disjoint sets of nodes in the bipartite network.
2. Let l = 2k + 1 be the length of that cycle in G, such that

v1 , v2 , . . . , v2k+1 , v1

is a sequence of consecutive nodes.


3. Assume that vi ∈ V1 . Then, vi+1 ∈ V2 because otherwise the edge (vi , vi+1 ) belongs to V1 , which is prohibited
by the bipartivity of the network.
4. Assume that v1 ∈ V1 , then v2 ∈ V2 , which means that v3 ∈ V1 , and so on until we arrive at v2k+1 ∈ V1 .
5. Since v2k+1 ∈ V1 it follows that v1 ∈ V2 . Thus, v1 ∈ V1 and v1 ∈ V2 , which contradicts the disjointness of the
sets V1 and V2 .

We now prove sufficiency (⇐).

1. Suppose that the network G has no cycle of odd length and assume that the network is connected. This second
assumption is not part of the statement of the theorem but splitting the network and the proof into individual
components lets us focus on the important details.
2. Select an arbitrary node vi .
3. Partition the nodes into two sets V1 and V2 so that any node at even distance from vi (including vi itself) is
placed in V1 and any node at odd distance from vi is placed in V2 . In particular, there is no node connected to
vi in V1 .
4. Now suppose that G is not bipartite. Then there exists at least one pair of adjacent nodes that lie in the same
partition Vi . Label these node vp and vq .
5. Suppose that the distance between v1 and vp is k and that between v1 and vq is l. Now construct a closed walk
that moves from v1 to vp along a shortest path; then from vp to vq along their common edge; and finally back
from vq to v1 along a shortest path. This walk is closed and has length k + l + 1, which must be odd.
Make connections between concepts 37

6. Because every closed walk of odd length contains an odd cycle we conclude that the network has an odd cycle,
which contradicts our initial assumption and so the network must be bipartite.
7. If G has more than one component then we can complete the proof by constructing sets such as V1 and V2 for
each individual component and then combine them to form disjoint sets for the whole network. This final bit
of housekeeping does not require any additional contradictions to be established.

3.6 Make connections between concepts


It is obvious advice but it is worth reminding you that network theory is an inter-
disciplinary field in which you can make connections among different concepts in
order to prove a particular result. We actually proved the ⇒ direction of the proof
of the theorem in the last example in Chapter 2 using the algebraic connection
between networks and adjacency matrices.

Example 3.5

Suppose that you have been asked to prove the following.


n
A network is regular if and only if j=1 λ2j = nλ1 , where λ1 > λ2 ≥ · · · λn are the eigenvalues of the adjacency matrix of a
network with n nodes.

Let us prove here only the statement that if the graph is regular then the previous equality holds and we leave the
only if part as an exercise. We start by noticing that


n
λ2j = tr(A2 ).
j=1

We also know that the diagonal entries of A2 are equal to the number of closed walks of length two starting (and
ending) at the corresponding node, which is simply the degree of the node (ki ). That is,


n
tr(A2 ) = ki .
i=1


n
Now, we make a connection with the Handshaking Lemma, which states that ki = 2m for any graph. Thus,
i=1


n
λ2j = tr(A2 ) = 2m.
j=1

continued
38 How To Prove It

Example 3.5 continued

1
n
We can write the average degree as: k = ki and then we have
n i=1

1 2
n
k= λ .
n j=1 j

Finally, because the graph is regular k̄ = λ1 , which proves the result.

3.7 Other general advice


There are many other techniques and tricks available for proving results in net-
work theory and it is impossible to cover them all in this chapter. Apart from the
cases we have analysed, the use of special cases and extreme examples abound in
proofs in network theory. The student should learn as many of these techniques
as possible to create an extensive arsenal for solving problems in this area. A rec-
ommendation in this direction is that the student tries to remember not only the
theorems or statements of the results but also the techniques used in their proofs.
This will allow them to make necessary connections between new problems and
some of the ‘classical’ ones which they already know how to solve. Maybe one
day one of the readers of this chapter will be able to prove a theorem that de-
serves the classification of a ‘proof from the book’. According to Erdös, one of
the founders of modern graph theory, such a book, maintained by God, would
contain a perfect proof for every theorem. These proofs are so short, beautiful,
and insightful that they make theorems instantaneously and obviously true. Good
luck with producing such a proof!

..................................................................................................

FURTHER READING

Franklin, J. and Daoud A., Proof in Mathematics: An Introduction, Quakers Hill


Press/Kew Books, 2011.
Hammack, R.H., Book of Proof, Virginia Commonwealth University, 2013.
Data Analysis
and Manipulation 4
4.1 Motivation 39
In this chapter 4.2 Sources of error 40

We turn our attention to some of the phenomena we should be aware of when 4.3 Processing data 42
carrying out experiments. For example, experimental data are prone to error 4.4 Data statistics and random
variables 46
from many sources and we classify some of these sources. We give a brief
overview of some of the techniques that we can use to make sense of the 4.5 Experimental tools 48

data. For all of these techniques, mathematicians and statisticians have devel- Further reading 51
oped effective computational approaches and we hope our discussion gives
the student a flavour of the issues which should be considered to perform an
accurate and meaningful analysis. We list some of the key statistical concepts
that a successful student of network analysis needs in their arsenal and give
an idea of some of the software tools available.

4.1 Motivation
The focus of much of this book is to present the theory behind network analysis.
Many of the networks we choose to illustrate the theory are idealized in order
to accentuate the effectiveness of the analysis. Once we have developed enough
theory, though, we can start applying it to real life networks—if you leaf through
this book you will see examples based on complex biological, social, and transport
networks, to name just three—and with a sound understanding of the theory we
can ensure we can draw credible inferences when we analyse real networks. But
we should also be aware of the limitations of any analysis we attempt. When we
build a network to represent a food chain, can we be sure we have included all
the species? When we look at a social network where edges represent friendship,
can we be certain of the accuracy of all of our links? And if we study a network
of transport connections, how do we accommodate routes which are seasonal or
temporary?
The point is that any data collected from a real life setting are subject to error.
While we may be able to control or mitigate errors, we should always ensure that
the analytical techniques we use are sufficiently robust for us to be confident in
our results. In this chapter, after presenting a brief taxonomy of experimental
error, we look at some of the techniques available to us for processing and
analysing data.
40 Data Analysis and Manipulation

4.2 Sources of error


4.2.1 Modelling error
In almost every practical application of network theory, the network we analyse
Solve
equations
is an idealized model of a real-world situation that relies on certain assumptions
holding either exactly or to within an accepted level of accuracy. We may disregard
certain variables which we believe have an insignificant effect on the phenomena
we are trying to understand. We may find it convenient to attach an integer value
Design to a quantity when it would be better represented by a real number, or vice versa.
model Our model may be replacing a complex physical process with something which
Compare with we believe is analogous, but can we be sure that the analogy holds in the circum-
reality stances where we are applying it? These are examples of some of the potential
sources of modelling error. In order to control this error and build confidence in
Figure 4.1 The modelling cycle your results you should be prepared to enter the modelling cycle, illustrated in
Figure 4.1. You can enter or leave the cycle at any point but it is best to go round
more than once!

4.2.2 Data uncertainty


Along with the perturbations from reality introduced by the model, there are many
other factors that can lead to uncertainty in the data we are working with. An
obvious source of many of these errors is in the measurement of physical quan-
tities. If the variables we are measuring can take values from a continuous set
then it is inevitable that errors are introduced due to the finite accuracy of our
measurements. And however carefully an experiment is carried out, some sort of
underlying noise is more than likely to introduce error. These errors can usually
be treated as random variables. By repeating measurements and through good
experimental protocol we can often control the size of these effectively. But there
are also occasions when data are missing or affected by a factor that is not ran-
dom. In Section 4.3 we will introduce techniques that can mitigate against data
uncertainty.
Something that is harder to deal with after data have been collected is system-
atic error. For example, a mistake may have been made in the units we used or a
poorly thought out experimental procedure may contaminate the results. We will
assume that the data we are given do not suffer from such systematic error but you
should be aware of their pernicious effects when working with experimental data.

4.2.3 Computational error


Once we have our experimental data we are likely to rely on computational algo-
rithms to perform our analysis. Numerical analysts have designed and analysed
techniques for solving a vast array of mathematical problems, but you need to be
aware of the limitations of the algorithms you use—and to make sure you use them
appropriately. Generally, the computational techniques will give you an approxi-
mate answer to your problem. Many techniques in numerical analysis attempt to
Sources of error 41

find an approximate solution by discretizing the problem. We split the continuum


of real numbers into a discrete set of numbers separated by a step size h. We con-
trol h to achieve a balance between speed and accuracy. Whether or not you have
control over the level of discretization error that is introduced, there is a limit on
the accuracy you can expect and it is unreasonable to ask for high accuracy if the
step size h is very large.
Another source of computational error is rounding error. On a computer, real
numbers can only be assigned a finite amount of storage space, yet they may
have infinite decimal expansions. We must round numbers. Each rounding error
is (relatively) tiny, and usually they will have a tiny effect on the answer. Ideally,
we would like the size of the perturbations introduced prior to and during com-
putations not to be amplified significantly by our method of solution. But if the
problem we are trying to solve is badly conditioned then the errors introduced by
rounding and discretization can be amplified massively and it is possible for the
accumulation of only a few rounding errors to have a catastrophic effect.

Example 4.1

1
Suppose we wish to compute I20 where In = 0 xn ex–1 dx.
1
Note that I0 = 0 ex–1 dx = 1 – e–1 = 0.6321 and using integration by parts we find
 1  1
In+1 = xn+1 ex–1 – (n + 1) xn ex–1 dx = 1 – (n + 1)In .
0 0

Notice that for 0 ≤ x ≤ 1, if m > n, 0 ≤ x ≤ xn and the sequence {In } should be nonnegative and monotone
m

decreasing.

In

0 10 n

Figure 4.2 Estimating In

Figure 4.2 shows In as computed on a computer with 16 digits of accuracy using the recurrence In = 1 – n In–1 up
to n = 19. At the next step it gives the value I20 = –30.192.
The problem is caused by the accumulation of tiny rounding errors (by n = 20 the initial error in rounding I0
has been multiplied by 20!). The problem can be overcome by rearranging the recurrence. For example, if we let
In = (1 – In+1 )/(n + 1) and assume I30 = 0, we calculate I20 to full accuracy.
42 Data Analysis and Manipulation

4.3 Processing data


Given that the data we generate from experiments may be prone to uncer-
tainty, it is worth considering whether we can do anything to mitigate against
it. The answer will depend on the source and type of error. We now present a
non-exhaustive introduction to some of the things we can attempt to do.

4.3.1 Dealing with missing data


There are multiple reasons why our observations from an experiment may be
missing data. If our data come from a survey or census we are reliant on the
respondents and data collectors giving complete answers. When taking meas-
urements from a biological network it may be physically impossible to record
a complete set of interactions. If data come from a series of experiments, illness
or some other random misfortune may render the series incomplete. Data can be
simply lost or mistranscribed. Or there may be more malicious reasons for data
going missing—they may have been stolen or compromised before reaching the
analyst. Whatever the reason for data going missing (and assuming we know it
has) we have to make a choice—can the missing data be ignored or should it be
replaced? The answer to this question, and the procedure we adopt if we plan to
replace the data, will depend on whether the loss of data can be treated as a ran-
dom event or not. In many cases, the simplest option of ignoring all missing data
can work. But it can, of course, weaken the confidence we have in our results and
may also introduce a bias, which can lead to seriously misleading conclusions.
In the context of networks, missing data essentially manifest themselves in one
of two ways—missing links and missing nodes. If we cannot simply ignore the
missing data, there are a number of strategies we can employ to replace it.

Example 4.2

Consider the network of sawmill employees we considered in Chapter 1. Suppose the mill is visited one year later and
the study is repeated but we find that two of the employees (who we know are still working at the mill) are not recorded
in our follow-up survey. To deal with the missing data we could choose one of the following options.

1. Ignore it. Maybe two missing participants will not skew our results.
2. Substitute by including the missing workers in our survey and connecting them to the network using the links
to fellow employees recorded in the previous study.
3. Suppose one of the missing workers had exactly the same connections in the original survey as a worker who
has not gone missing. In this case, we can substitute the missing data by assuming their links are still identical.
Processing data 43

If there are noidentical twins (in this sense) then this process could be expanded to substitute with the links of the
most similar node. Later in this book (Chapter 21) we will see ways of measuring this similarity between nodes.
4. A statistical analysis of the two networks may show some discrepancies which can be ameliorated by adding
links in particular places in the network. This is the essence of a process known as imputation. The analysis is
based on statistics such as the degree distribution, a concept we will also visit in Chapter 9.

For statistical reasons, imputation is the favoured approach for dealing with miss-
ing nodes as it can effectively remove bias. But dealing with the ‘known unknowns’
and the ‘unknown unknowns’ of data uncertainty can be fraught with danger for
even the most experienced of practitioners and you may be playing it safe by ap-
plying a simplistic approach to missing data. For particular classes of network,
sophisticated techniques exist for recovering missing data. Such techniques are
beyond the scope of this book. They are underpinned by theory on high-level
properties of networks such as degree distribution (see Chapter 9) and network
motifs (see Chapter 13).

Example 4.3

Protein–protein interaction (PPI) networks attempt to describe affinities be-


tween proteins by measuring their tendency to interact when stimulated by a
particular chemical or physical intervention. Experimental evidence of inter-
actions is inevitably subject to noise and whether an edge (representing an
interaction) should be drawn between two nodes (proteins) is not necessar-
ily an exact science. If edges are drawn as a result of a chemical change, the
simplest approach is to use a fixed threshold based on an assessment of the
level of noise. However, a number of more sophisticated approaches can be
developed. For example, researchers have noticed that PPI networks can be
embedded in a lower dimensional geometric space than one would expect in a
random network. By embedding experimental results one can thus use expert
evidence to judge the likelihood of whether edges are missing.

4.3.2 Filtering data


As well as dealing with missing data, there are a number of other reasons why
we may want to manipulate data before using it. For example, we may seek to
eliminate repeated or rogue data; or we may want to detrend or seasonally adjust
data so we can focus on particular variations of interest. Techniques for accom-
plishing these objectives are often referred to as filters but we reserve the term
44 Data Analysis and Manipulation

for a more restrictive class of techniques for dealing with data subject to noise
(particularly unbiased noise). We define filtering to be the process of smoothing
the data. When dealing with networks, we may want to filter raw data when de-
termining whether to link nodes together (such as in constructing PPI networks
when the results of experiments are judged) but it is also an essential tool once we
start our analysis to get a smoother picture of our statistical results.

Example 4.4

In Figure 4.3(a) we show simulated data measuring the number of links


(measured by total degree) within a group of individuals within a social net-
work. The measurements are assumed to have been made indirectly and were
subject to a lot of noise. The general trend appears to be that the number of
links is increasing with time, but at the point indicated by the arrow there
appears to be a temporary drop.
Degree

Degree
Time Time
(a) Raw data (b) Data after smoothing

Figure 4.3 Applying a data filter

To get a better idea of whether this drop is real or just an artefact of the
noise we have plotted a moving average of the data. At every point in time we
have replace the measured value with the average taken over (in this case) six
successive time intervals. This ‘averages out’ the noise and appears to show
that the apparent drop is a real phenomenon within the network.

The moving average typifies a filter where we replace a data series


x0 , x1 , x2 , . . . , xn

with a smoothed data set


y0 , y1 , y2 , . . . , yn
where

k
yi = aj xi–k+j . (4.1)
j=1
Processing data 45

There are many ways of choosing the aj (in our example they were uniform)
depending on what we are trying to achieve and the suspicions we have about the
nature of the contamination of our data. For i < k an appropriate filtering must
be chosen to properly define the initial points in the filtered data (for example, by
creating fictitious data x–k , . . . , x–1 ).

4.3.3 Fitting data


Given the choice, humans naturally favour a simple explanation over a complex
one. However, by assuming the role of network theorists we have to accept that
complex networks yield complex results. Nevertheless, we may be able to draw
links and analogies between what we see and much simpler situations. In par-
ticular, if we graph or tabulate the results of a particular network analysis it may
be obvious that the data follow a particular trend, or we may expect a particular
distribution of data given the analogy we have drawn. In these and many other
cases we may want to fit a simple algebraic relationship between two (or more)
variables. The most popular type of relationship to fit is a linear one. If this is not
appropriate, we may be able to manipulate our data and change variables so that
a linear fit is possible.

Example 4.5

Let x and y be two variables and a and b be constants. A simple change of variables can be used to convert some
nonlinear relationships between the variables into linear ones:

y = ax2 + b, ⇒ y = aX + b, X = x2 ,
y = aebx ⇒ Y = bx + ln a, Y = ln y, (4.2)
y = ax ⇒ Y = bX + ln a, Y = ln y, X = ln x.
b

Suppose we are given data (x1 , y1 ), . . . , (xn , yn ) for two variables. The basic
principle of linear fitting is to find constants a and b so that the line y = ax + b
matches the data as closely as possible. We do this by minimizing the errors
ei = yi – axi – b over all choices of a and b. There are many ways of choosing
the error measure but if we assume that the errors can be modelled by a ran-
dom variable then it usually makes sense to minimize the Euclidean distance in the
errors, namely,

 n

min  ( yi – axi – b)2 , (4.3)
a,b
i=1

to find the least squares solution, for which there are many efficient computational
techniques.
46 Data Analysis and Manipulation

If the relationship between the variables (or transformed variables) cannot be


represented by a straight line then we can generalize the process and look for the
best-fitting solution from a bigger class of functions (for example, polynomials of
degree k, a sum of exponentials, or a trigonometric series). We can also look to
fit multivariate relationships, too. Once we start our network analysis you will see
that there are many occasions when many different types of relationship appear
to fit the data equally well (or badly!). Whenever you attempt to find a particular
fit between two variables you should ideally have some justification for it a priori
for the results to be truly meaningful.
Having fitted a curve to data we are then in a position to interpolate or ex-
trapolate our data. That is, we can use the curve to approximate the relationship
between variables between points, or outside the range from which they have
been collected. Alternatively, if we are confident in our data and do not wish to
apply a fit, we can use it to interpolate or extrapolate directly. Interpolation and
extrapolation can also be used to replace missing data, too.

Example 4.6

Given that the variables x and y are related, and that when x = 0, 2, and 3, y = 0, 4, and 9, respectively, we can estimate
the value of y when x = 1 in a number of ways. For example, we can assume that between x = 0 and x = 2, the
relationship between the variables is linear and hence when we interpolate to x = 1 we find y = 2. Or we notice that
our three data points lie on a quadratic curve and if we assume this relationship throughout then y = 1 at x = 1. We
could also fit other shapes to pass through the points or we could interpolate from the least squares linear fit to the
data, y = 0.94 + 0.93x.
Any of our choices can be used to extrapolate outside the measured values of x but notice that these will diverge
significantly as x increases.

As with all the other techniques of massaging data that we have presented
in this section, we can be most confident in our interpolated/extrapolated data
when there is an underlying justification for the method we use provided by the
theory. If we have a large amount of data we should be judicious in the use of
interpolation/extrapolation. Large amounts of data allow us to calculate unique
interpolants of great complexity (for example, with 25 pieces of data we can fit
a degree 24 polynomial). But these complicated functions can fluctuate wildly
between the given data and give outlandish results in the gaps. Extrapolation es-
pecially needs to be done carefully. Any discrepancy between our assumed fit and
the actual behaviour can be accentuated to ridiculous proportions and lead to
predictions which are physically impossible.

4.4 Data statistics and random variables


Once we have analysed a large set of data we may end up producing a huge
volume of results. And, of course, these results will have accumulated errors on
Data statistics and random variables 47

the way. To get a feeling of what the results mean it is often useful to calculate
certain representative statistics: averages, maxima, and minima. For a set of results
x, some of the statistics we will make use of the most are the maximum, xmax , the
minimum, xmin , the mean, x, the standard deviation, σx , and the variance, σx2 . If
we label the individual elements of x as x1 , x2 , . . . , xn then

1 1
n n
x= xi , and σx2 = (xi – x)2 . (4.4)
n i=1 n i=1

It is often natural (and profitable) to look for and measure correlations be-
tween variables. A simple way to compare two variables x and y is to measure the
covariance between samples {x1 , x2 , . . . , xn } and {y1 , y2 , . . . , yn }, defined as

1
n
cov(x, y) = (xi – x)( yi – y). (4.5)
n i=1

Powerful statistical techniques have been developed which boil down a large
amount of information into a number between –1 and 1 which gives a precise
value of the dependency between two variables. For example, if we assume that x
and y are linearly related then we can calculate the Pearson correlation coefficient of
two samples as

cov(x, y)
r= . (4.6)
σx σy

If r = 1 then we can infer a perfect linear relationship between x and y. If


r = –1 then one variable goes up as the other goes down, while we can infer
independence between variables if r = 0. In practice we would expect r to take
non-integer values and the strength of the correlation should be judged against
a null hypothesis. Alternatively, a whole group of samples can be compared by
computing the covariances between each individual pair and forming a covariance
matrix.
Throughout our analysis of networks we will be comparing our results from
real-world networks against data generated by random variables. We are not sug-
gesting that the real-world is random, but there is still much value in a comparison
of deterministic and random phenomena. In large data sets there is often a sound
mathematical reason for the distribution of the data to match that of a benchmark
random variable and many of the simplified models that provide the motivation
for our analysis are based on assumptions of randomness. Among the distribu-
tions we will make use of are the binomial, Poisson, Gaussian, power law, and
exponential, and the student should make sure she is familiar with these. Illustra-
tions of some of these distributions can be found in Chapter 9. In the real-world
we deal with finite networks and it is natural to use discrete probability distribu-
tions with finite ranges in these models. However, for large networks it is usually
48 Data Analysis and Manipulation

safe to approximate with continuous distributions and/or ones with infinite do-
mains such as the Gaussian. And (since we are mathematicians!) we will be taking
limits to infinity to get a complete understanding of the finite. But care should be
taken so that our use of random distributions does not give meaningless results
and the student should be prepared to use truncated approximations to idealized
distributions to avoid such misfortune.

4.5 Experimental tools


Speak to any enthusiastic mathematician for long enough and she will tell you
that the subject is not a spectator sport—you need to get out there and participate
to really appreciate the subject. The authors of this book share that sentiment
wholeheartedly and we encourage readers to get hold of their own networks to
test and develop the analytical techniques presented in this book. To that end, we
give a brief presentation of some of the tools you can use for software analysis.
The nature of software development means that the picture changes rapidly and
this section of the book is likely to become dated very quickly. To guard against
premature ageing our survey is fairly generic. We cover three general areas—
sources of data, data types, and software tools.

4.5.1 Sources of data


The first step in network analysis is to get hold of a network. You can generate
your own by simply creating an adjacency matrix but to get a feel for the proper-
ties of particular classes of techniques you may want to use networks which have
been created and curated by previous researchers. Many of the networks we study
in this book have been trawled from recent research literature and we encourage
you to look at these yourself. Of course, search engines can help you find almost
anything but you will find a targeted search much more productive, in particular
you might want to start with online network data repositories. The dynamic na-
ture of the web means that the data currently available could disappear at any time
but typing one of the following terms (or something similar) into a search engine
should be productive: ‘complex network data’, ‘complex network resources’, or
‘network data repository’. In particular, at the time of going to press, we can rec-
ommend the KONECT set at the University of Koblenz which contains a rich
variety of data and the SNAP set of large networks at Stanford University. You
will also find that many of the authors referenced in the ‘Further reading’ sections
at the end of each chapter have their own sets of data on their personal websites.

4.5.2 Data types


There is a lot of information in a large complex network and one needs to pay
consideration as to how it should be stored to promote efficient analysis. In this
regard, there is a significant difference between unlabelled and labelled networks.
Experimental tools 49

Understanding unlabelled networks can give us rich insight into structure and
theory but in applications the actual labels attached to the nodes must be taken
into account if we want to name the most important node or the members of a
particular set. For these labelled graphs, a wide array of file formats have been
developed that are influenced by their author’s particular interests.
For unlabelled networks one can work exclusively with an adjacency matrix.
All mathematical software packages will have a format for storing matrices and
an initial analysis can be performed using well-established methods from linear
algebra. However, for large networks it may be necessary to use efficient storage
formats. Suppose you are analysing a simple network with n nodes and m edges.
To store this information one simply needs a list of the edges and their end points.
If we assign each node a number between 1 and n we can store all the information
as a set of m pairs of numbers. Assuming each number requires four bytes of stor-
age (which gives us around four billion numbers to play with) the whole network
therefore requires 4m bytes of room in the computer’s memory. Unless instructed
otherwise, most software packages will allocate eight bytes to store each entry of a
matrix (in so-called double precision format). Thus the adjacency matrix requires
8n2 bytes of storage.

Example 4.7

Consider a social network of around 100,000 people who are each connected
on average to around 100 other individuals. We only need around 20 mega-
bytes to store the links (m ≈ 5, 000, 000 since each edge adds two to the total
of connections). If we form the adjacency matrix we need 8 × 1010 bytes or
80 gigabytes; a factor 4,000 times as big. While most modern computers have
room for a file of 80GB, they may not be easy to manipulate (for example, as
of 2014 very few personal computers would be able to store the whole matrix
in RAM) and the problem is exacerbated as n increases.

The point of Example 4.7 is that even if they are simple and unlabelled net-
works, thought needs to go in to the method of representing a network on a
computer. If we just want to store the edges then this can be done in a simple
two-column text file, or a spreadsheet. It is easy enough to work in a similar way
with directed networks and by adding an additional column one can also add
weights.
If one makes use of a simple file type then there are usually tools to convert
the network into a format which can be manipulated efficiently by your software
package of choice. If the amount of storage is critical, one can consider formats
that attempt to compress information as much as possible (for example graph6
and sparse6).
Simple text files can also be used to represent labelled networks but, again,
when m and n are large it often pays to consider a format which is optimal for the
50 Data Analysis and Manipulation

software we want to use. Formats such as GML and Pajek have been designed to
make network data portable and use flexible hierarchical structures which allow
researchers to add detailed annotations to provide context, but one can also make
use of some of the other countless file types that were developed without networks
necessarily being at the forefront of the creators’ minds.

4.5.3 Software tools


As with the formats for storing data, there is a vast array of software for analysing
networks—both specialized and generic. The useful half-life of some of these
packages is extremely short and so we restrict our attention to a few packages
which have been around for a while.
MATLAB is a popular tool in many branches of scientific computation. Its ori-
gins are in solving problems in computational linear algebra and so it is very well
suited to perform many of the operations we will discuss in this book which exploit
the representation of networks in matrix form. It is not free software but is used in
many universities and so is accessible to many students. MATLAB ‘clones’ have
been developed, such as Octave, which have much of the same functionality and
are licensed as free software. Since MATLAB was not designed as a tool for net-
work analysis, it may be missing commands to compute some of the statistics you
are interested in but there are active communities of MATLAB users who are
constantly developing new functions and there are a number of freely available
MATLAB toolboxes designed specifically for these purposes. However you ac-
quire your data, you should be able to import it into a format that MATLAB can
recognize and it has powerful graphical capabilities, albeit not necessarily tuned
for illustrating complex networks.
Python users form a very active community of software developers and there
are many packages that can be used to perform network analysis. In particular,
most (if not all) of the network measures we introduce in this book can be calcu-
lated using the package NetworkX. Python is free software, and for mathematical
computations you can use the open-source software system Sage. This can be
used anywhere with an internet connection through a web browser without in-
stalling anything onto your own machine. If you have no experience of Python,
you need a little bit of patience to get NetworkX up and running and doing what
you want it to do. But, as the authors of this book can attest, even a fool can
eventually get impressive results.
If your computing experience is limited to Microsoft Office or similar soft-
ware, you can still perform sophisticated network analysis. Spreadsheets are a
user friendly format for storing networks and Excel (and its rivals) have many
of the mathematical and statistical functions necessary to proceed. Again, these
products were not designed for working with networks, and for large sets of data,
computational accuracy and efficiency may be compromised.
Finally, another open-source option is Gephi. This is designed especially
for the purpose of exploring networks visually. It does not have some of the
Experimental tools 51

capabilities of the other packages we have mentioned for analysing very large net-
works but often a purely visual approach can lead one to uncover patterns which
can then be analysed more systematically. In particular, it is an excellent package
for creating arresting illustrations of networks and we have used it extensively in
producing this book.

..................................................................................................

FURTHER READING

Clarke, G.M. and Cooke, D., A Basic Course in Statistics, Edward Arnold, 1998.
Ellenberg, J., How Not To Be Wrong: The Hidden Maths of Everyday Life, Allen
Lane, 2014.
Lyons, L., A Practical Guide to Data Analysis for Physical Science Students,
Cambridge University Press, 1991.
Mendenhall, W., Beaver, R.J., and Beaver, B.M., Introduction to Probability and
Statistics, Brooks/Cole, 2012.
Algebraic Concepts
5 in Network Theory

5.1 Basic definitions of networks


and matrices 52
In this chapter
5.2 Eigenvalues and eigenvectors 54
The primary aim of this book is to build a mathematical understanding of
Further reading 65
networks. A central tool in this is matrix algebra: there is a duality between
networks and matrices which we can exploit. So before we start analysing
networks, we review some results from matrix algebra and develop ideas that
will be helpful to us.

5.1 Basic definitions of networks


and matrices
Throughout this book we will be exploiting the fact that we can represent net-
works with matrices: we have already defined the adjacency matrix and the
incidence matrix—and there are more to come! We can then make use of es-
tablished matrix theory to deduce properties of networks. Let us first introduce
some basic matrix quantities and notation which will prove useful. We will then
focus on eigenvalues and eigenvectors of matrices in order to prepare ourselves
for working with the spectrum of a network.
A typical matrix A will either be square (and in Rn×n ) or rectangular (and
in Rm×n ). Its (i, j)th entry will be denoted aij and its transpose is the matrix AT
whose (i, j)th entry is aji . If A = AT then we call the matrix symmetric. The trace of
a square matrix, written tr(A), is defined to be the sum of the diagonal elements
of a matrix.
Vectors of length n can be thought of as n × 1 matrices (if they are columns)
or 1 × n (for row vectors). We will use bold letters to represent vectors. The inner
product of two vectors x and y of the same dimension is the scalar quantity xT y.
If the inner product is zero then the vectors are said to be orthogonal to each other.
An orthogonal matrix is a square matrix in which all of the columns are mutually
orthogonal (and are scaled to have a Euclidean length of one). If A is orthogonal
then AT A = I , the identity matrix.
Occasionally we will deal with complex matrices. Most results for real
matrices have direct analogues for complex matrices so long as we use the
Basic definitions of networks and matrices 53
∗ T 1 ∗ ∗
conjugate transpose, A , in place of A . The (i, j)th entry of A is aji . If A = A we
call the matrix Hermitian and if A∗ A = I then we call the matrix unitary.
We will need to make use of the determinant of some matrices. We use
Laplace’s definition which can be linked to salient properties of networks.

Definition 5.1 Let A ∈ Rn×n then its determinant, written det(A) is the quantity
defined inductively by


⎨ A,n n = 1,
det(A) = 

⎩ (–1)i+j aij det(Aij ), n > 1,
j=1

for any fixed i, where Aij denotes the submatrix formed from A by deleting its ith
row and the jth column.

The determinants of the (n – 1) × (n – 1) matrices used in the definition are


known as minors. In particular, a k × k principal minor is the determinant of a k × k
principal submatrix of A formed by taking the intersection of k rows of A with the
same k columns.
The determinant has a host of theoretical uses that stem from the fact
that many identities can be established, such as det(AB) = det(A) det(B) and
det(AT ) = det(A). The most useful theoretical property of the determinant is that
it can be used to characterize singular matrices: a square matrix, A, is singular if
and only if det(A) = 0. 1 Here z is the complex conjugate of z.

Examples 5.1

(i) If A is a 2 × 2 matrix then, letting i = 1 in the definition,

det(A) = a11 a22 – a12 a21 .

And if A ∈ R3×3 ,

det(A) = a11 (a22 a33 – a23 a32 ) – a12 (a21 a33 – a23 a31 ) + a13 (a21 a32 – a22 a31 )
= a11 a22 a33 + a21 a32 a13 + a31 a12 a23 – a13 a22 a31 – a23 a32 a11 – a33 a12 a21 .

(ii) Let
⎡ ⎤
1 3 1
⎢ ⎥
A = ⎣2 1 1⎦ .
0 3 1

Then det(A) = 1 + 6 + 0 – 0 – 3 – 6 = –2. There are three 2 × 2 principal minors, det(A11 ) = 1 – 3 = –2,
det(A22 ) = 1 – 0 = 1, and det(A33 ) = 1 – 6 = –5.

continued
54 Algebraic Concepts in Network Theory

Examples 5.1 continued

(iii) Suppose that G is a simple connected network with three nodes. Then the adjacency matrix of G is either
⎡ ⎤
0 1 1
⎢ ⎥
⎣1 0 1⎦
1 1 0

or a permutation of
⎡ ⎤
0 1 1
⎢ ⎥
⎣1 0 0⎦ .
1 0 0

The first matrix represents K3 , a triangle, and the second is the adjacency matrix of P2 . The determinant of the
first matrix is two and that of the second is zero. In this very simple case we can characterize connected networks
through their determinant.

5.2 Eigenvalues and eigenvectors


Eigenvalues and eigenvectors are the key tools in determining the action of a linear
operator in many different branches of mathematics, not least in the study of
networks.

Definition 5.2 For any n dimensional square matrix A there exist scalar values2 λ
and vectors3 x such that
Ax = λx.
Any value of λ that satisfies this equation is called an eigenvalue. Any nonzero
vector x that satisfies this equation is called an eigenvector.

For all values of λ we have A0 = λ0. The zero vector is not considered an
eigenvector, though. Now
Ax = λx ⇐⇒ (A – λI )x = 0,
which means that the matrix A – λI is singular. We have therefore established
the important result that the eigenvalues of λ are the roots of the equation
det(A – λI ) = 0, a polynomial of degree n.

Definition 5.3 The polynomial

det(A – λI ) = b0 + b1 λ + b2 λ2 + · · · + bn λn

2 At least one and at most n. is known as the characteristic polynomial (or c.p.) of A. The equation
3 At least one for each value of λ. det(A – λI ) = 0 is known as the characteristic equation.
Eigenvalues and eigenvectors 55

Once we know the eigenvalues we can factorize the c.p. so

det(A – λI ) = (λ1 – λ)p1 (λ2 – λ)p2 . . . (λk – λ)pk

where the λi are unique. Generally k = n and pi = 1 for all i. But sometimes we
encounter repeated eigenvalues for which pi > 1. The value of pi is known as the
algebraic multiplicity of λi . For each distinct eigenvalue there are between 1 and
pi linearly independent eigenvectors. The number of eigenvectors associated with
an eigenvalue is known as its geometric multiplicity.
Encoded in the characteristic polynomial are certain quantities that are very
useful to us. We can link its coefficients to principal minors: (–1)k bn – k is the sum
of all the k × k principal minors of A. Since tr(A) is also the sum of the principal
1 × 1 minors of A, –tr(A) is the coefficient of λn–1 in the c.p. At the same time, if
A has eigenvalues λ1 , λ2 , . . . , λn then

det(λI – A) = (λ – λ1 )(λ – λ2 ) · · · (λ – λn )

and multiplying out the right-hand side we immediately see that the coefficient of
λn–1 is

–λ1 – λ2 – · · · – λn

which means that the sum of the eigenvalues of a matrix equals its trace.
For a general real matrix A, the roots of the characteristic polynomial can be
complex. Notice that if A is a real matrix then

det(A – λI ) = det((A – λI )∗ ) = det(AT – λI ).

So if λ is a root of the c.p. of A, λ is a root of the c.p. of AT . A is real though, so its


complex roots appear as conjugate pairs, so λ must be a root of the c.p. of A, too.
We can conclude from this that A and AT have the same eigenvalues. Suppose that
Ax = λx and AT y = λy, which means that y∗ A = λy∗ (why?). Then x is known as
a right eigenvector of A (corresponding to λ) and y∗ is the left eigenvector.

Problem 5.1
Show that if x and y∗ are right and left eigenvectors corresponding to different
eigenvalues then they are orthogonal and hence conclude that if A ∈ Rn×n is
symmetric then (i) the eigenvalues of A are all real and (ii) the eigenvectors of
distinct eigenvalues A are mutually orthogonal.
Suppose that x1 is the right eigenvector corresponding to λ1 = 0 and y∗2 is the
left eigenvector corresponding to λ2 , where λ2 = λ1 . Then,

Ax1 x1 λ2 ∗
y∗2 x1 = y∗2 = (y∗2 A) = y x1 .
λ1 λ1 λ1 2
56 Algebraic Concepts in Network Theory

But λ2 = λ1 so y∗2 x1 = 0. If λ1 = 0,
Ax1
y∗2 x1 = y∗2 = λ1 y∗2 x1 = 0.
λ2
If A = AT then by symmetry x = y in the above argument, hence the left and
right eigenvectors are identical.

(i) Since A is symmetric, (x∗ Ax)∗ = x∗ A∗ x = x∗ Ax, for any vector x. Now
(x∗ Ax)∗ is the complex conjugate of x∗ Ax, and for a number to equal its
complex conjugate it must be real. Now suppose that λ is an eigenvalue of
A with eigenvector x. Then x∗ Ax = x∗ (λx) = λx∗ x and this can only be
real if λ is real.
(ii) Suppose that x and y are eigenvectors with respective eigenvalues λ and
μ. Then

λyT x = yT (λx) = yT Ax = yT AT x = (Ay)T x = (μy)T x = μyT x.

If λ = μ this means that yT x = 0, hence they are at right angles to each other.

The spectral radius


Definition 5.4 The spectrum of a matrix is the set of its eigenvalues and is written
as σ (A). That is,

σ (A) = {λ : det(A – λI ) = 0}.

The spectral radius of A is defined to be the modulus of its largest eigenvalue


and is written ρ(A). That is,

ρ(A) = max |λ|.


λ∈σ (A)

The spectral radius of a matrix can be a particularly useful measure of the


properties of a matrix and we will discuss how we can estimate it later in this
chapter.
Suppose Ax = λx. Then Ak x = Ak–1 (Ax) = λAk–1 x. This identity forms the
basis of an inductive proof that if λ is an eigenvalue of A then λk is an eigenvalue
of Ak . Hence ρ(Ak ) = ρ(A)k . If ρ(A) < 1 then ρ(Ak ) → 0 as k → ∞ and we can
conclude that Ak → O, too (the details are left to the reader). If ρ(A) > 1 we
can show that Ak diverges. If ρ(A) = 1 then Ak may or may not converge. We will
study matrix powers in more detail in chapter 12.
We already know that for real matrices, ρ(AT ) = ρ(A). For complex matrices
we know that λ is an eigenvalue of A if and only if λ is an eigenvalue of A∗ and
so, again, ρ(A∗ ) = ρ(A). Furthermore, if A is nonsingular and Ax = λx then
A–1 x = λ–1 x, and we can conclude that
1
ρ(A–1 ) = max .
λ∈σ (A) |λ|
Eigenvalues and eigenvectors 57

Similarity transforms
When it comes to utilizing spectral information to analyse networks, we will want
to look at both eigenvalues and eigenvectors. Furthermore, the behaviour of ma-
trix functions is intimately tied to the interplay of eigenvalues and eigenvectors.
It is worthwhile, then, looking at the basic tools for understanding this interplay.
Similarity transformations are of utmost importance in this regard.

Definition 5.5 If X is a nonsingular matrix then the mapping

SX : A → X –1 AX

is called a similarity transformation.


SX is an orthogonal similarity transformation if X is orthogonal and
a unitary similarity transformation if X is unitary.

Let Ax = λx and B = C –1 AC, then

BC –1 x = (C –1 AC)C –1 x = C –1 Ax = λC –1 x.

So x is an eigenvector of A with eigenvalue λ iff C –1 x is an eigenvector of


B with eigenvalue λ. A 1-1 correspondence exists between the eigenpairs of A
and B.

Example 5.2

Let
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
–5 1 3 1 0 0 1 1 0
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
A = ⎣ 18 2 –9⎦ , D = ⎣0 2 0⎦ , X = ⎣0 1 3⎦ .
–20 2 11 0 0 5 2 2 –1

We can show that A = XDX –1 , given that


⎡ ⎤
7 –1 –3
⎢ ⎥
X –1 = ⎣–6 1 3⎦ .
2 0 –1

Clearly D has eigenvalues 1, 2, and 5 with eigenvectors corresponding to the


columns of I . It is straightforward to show that A has the same eigenvalues as
D and the eigenvectors of A are the columns of X .
58 Algebraic Concepts in Network Theory

This example shows that under an appropriate similarity transformation, the


eigenstructure of a matrix becomes obvious. The question remains as to how
simple a form a matrix can be reduced to with similarity transformations.

5.2.1 The spectral theorem


In general, it is possible to find a similarity transformation for any n × n matrix A
such that
X –1 AX = J,
where J is a block diagonal matrix J = diag(J1 , . . . , Jl ) whose ith block is
⎡ ⎤
λi 1
⎢ .. ⎥
⎢ ⎥
⎢ λi . ⎥
Ji = ⎢ ⎥,
⎢ .. ⎥
⎣ . 1⎦
λi
and λi is an eigenvalue of A. The matrix J is known as the Jordan canonical form
of A and the representation A = XJX –1 is known as the Jordan decomposition.
If all the diagonal blocks are 1 × 1 then the matrix is called simple. Matrices
that are not simple are defective and do not have a set of n linearly independent
eigenvectors meaning that the algebraic multiplicity does not match the geometric
multiplicity for at least one eigenvalue. Almost every matrix we deal with will be
simple; certainly this is the case for simple networks, but defective matrices can
arise for directed networks.

Example 5.3

The defective matrix


⎡ ⎤
–1 8 0
⎢ ⎥
A=⎣ 0 3 1⎦
1 –4 1

has a single eigenvalue λ = 1. Its Jordan canonical form is


⎡ ⎤
1 1 0
⎢ ⎥
J = ⎣0 1 1⎦ .
0 0 1

If
⎡ ⎤
4 –2 –7
⎢ ⎥
X =⎣ 1 0 –2⎦
–2 1 4
Eigenvalues and eigenvectors 59

then A = XJX –1 . The first column of X is the eigenvector of A and it is the only eigenvector of the matrix. The second
and third columns satisfy the equations

(A – I )xi+1 = xi , i = 1, 2.

The diagonal blocks in the Jordan canonical form of a matrix are unique, hence
two matrices are similar if and only if their Jordan canonical forms have the same
diagonal blocks.
The Jordan decomposition is of great theoretical use but in practice it can be
difficult to compute accurately as it can be highly sensitive to small perturbations.
An alternative is to use the Schur decomposition.

Problem 5.2
Show that for any n × n matrix A there is a unitary matrix U and an upper-
triangular matrix T such that

A = UTU ∗ . (5.1)

This can be established by induction. The result is trivial for n = 1 (U = 1,


T = A). Let us assume it holds for n = k.
Let A be a (k + 1) × (k + 1) matrix and choose one of its eigenvalues, λ with
associated eigenvector x (where x2 = 1). Let X be a unitary matrix
 whose
 first
column is x (can you describe how to find one?) and write X = x W . Note
that W ∗ x = 0. Now,
   
x∗   x∗ Ax x∗ AW

X AX = A x W = ,
W∗ W ∗ Ax W ∗ AW

and since Ax = λx it follows that x∗ Ax = λ and W ∗ Ax = λW ∗ x = 0, so


 
∗ λ x∗ AW
X AX = .
0 W ∗ AW

Now W ∗ AW is a k × k matrix and so by the inductive hypothesis has a Schur


decomposition VT1 V ∗ . Define
 
 1 0
V = .
0 V

 is unitary (why?) and


Then V
     
 ∗ ∗  1 0 λ x∗ AW 1 0 λ b∗
V X AX V = = ,
0 V∗ 0 VT1 V ∗ 0 V 0 T1

 and let U
(for some vector b) which is upper triangular. Call this matrix T  = XV
,
which is unitary (why?). A = UT
U ∗ is a Schur decomposition of A.
60 Algebraic Concepts in Network Theory

The matrix T in (5.1) is known as the Schur canonical form and UTU ∗ is the
Schur decomposition of A.
If T is triangular,


n
det(T ) = tii ,
i=1

and its characteristic polynomial is

det(T – λI ) = (t11 – λ)(t22 – λ) · · · (tnn – λ).

Hence the eigenvalues of a triangular matrix are its diagonal elements. We do


not need to transform a matrix to diagonal form to find its eigenvalues—just tri-
angular form, computation of the eigenvectors is a further step, though. However
if T is not only triangular but diagonal, the eigenvectors are trivial to find.
For a general real matrix, the Schur decomposition can be complex. But for
most of our analysis we will be working with symmetric matrices for which the
Schur decomposition simplifies and we arrive at the so-called spectral theorem.

Theorem 5.1 If A ∈ Rn×n is symmetric then there is an orthogonal matrix Q and a


real diagonal matrix D such that

A = QDQT (5.2)

where D = diag(λ1 , λ2 , . . . , λn ) contains the eigenvalues of A and the columns


of Q are the corresponding eigenvectors.

The spectral theorem is a direct consequence of the existence of a Schur de-


composition. Suppose A has the Schur decomposition UTU ∗ . Then, since A is
symmetric,

T ∗ = (U ∗ AU )∗ = U ∗ A∗ U = U ∗ AU = T ,

hence T is a diagonal matrix, its entries must be the eigenvalues of A, UTU ∗ is


the Jordan decomposition of A and the columns of U are the eigenvectors of A.
But we know the eigenvalues and eigenvectors of a real symmetric matrix are real,
so D = T and Q = U .
So if A is symmetric, its eigenvectors are just as easy to determine as the
eigenvalues once we have found the Schur decomposition. In this case the Schur
decomposition is identical to the Jordan decomposition.

5.2.2 Gershgorin’s theorem


Before we start to look for the eigenvalues of a matrix, it is useful to have some
idea of where they lie in the complex plane. The following theorem gives us some
useful information.
Eigenvalues and eigenvectors 61

Theorem 5.2 (Gershgorin) Let A ∈ C n×n –1


and suppose that X AX = D + F
where D = diag(d1 , . . . , dn ) and F has no nonzero diagonal entries. Then the
eigenvalues of A lie in the union of the discs 1 , 2 , . . . , n where
⎧ ⎫
⎨ n ⎬
i = z ∈ C : |z – di | ≤ |fij | .
⎩ ⎭
j=1

If we pick X carefully we can often get tight bounds on the locations of the
eigenvalues. Simple choices of X can also be useful. Note that if the discs are
disjoint they each contain a single eigenvalue of A.

Example 5.4

Suppose X = I , then
⎧ ⎫
⎨  ⎬
i = z ∈ C : |z – aii | ≤ |aij | .
⎩ ⎭
j =i

For example, if
⎡ ⎤
10 2 3
⎢ ⎥
A = ⎣ –1 0 2⎦
1 –2 1

then

1 = {z : |z – 10| ≤ 5}
2 = {z : |z| ≤ 3}
3 = {z : |z – 1| ≤ 3}.

The actual eigenvalues of A are 10.226 and 0.3870 ± 2.216i.


⎡ ⎤
6 0 0
⎢ ⎥
If we choose X = ⎣0 1 0⎦ then the discs are
0 0 1

1 = {z : |z – 10| ≤ 5/6}
2 = {z : |z| ≤ 8}
3 = {z : |z – 1| ≤ 8}

and we can give a much better estimate of the largest eigenvalue, as illustrated in Figure 5.1.

continued
62 Algebraic Concepts in Network Theory

Example 5.4 continued

(a) (b)
Figure 5.1 Gershgorin discs for (a) X = I (b) X = diag(6, 1, 1). The eigenvalues are indicated
by black circles

If we want to estimate eigenvalues we can also make use of the Rayleigh


quotient.

Definition 5.6 Given A ∈ Rn×n and x ∈ Rn the Rayleigh quotient associated


xT Ax
with A and x is T .
x x

The Rayleigh quotient appears in a number of guises in applied mathematics.


If x is an eigenvector (with associated eigenvalue λ) then

xT Ax xT λx
= T = λ.
xT x x x
The range of Rayleigh quotients for a given matrix will thus include the whole
spectrum (if that spectrum is real). Bounds on this range can be used to bound
the spectral radius (which is an upper bound on the size of Rayleigh quotients of
a given matrix).

5.2.3 Perron–Frobenius theorem


Adjacency matrices are nonnegative. That is, all of their entries are greater than or
equal to zero. We use the inequality A ≥ 0 to denote that A is nonnegative. We
can say something concrete about the largest eigenvalue of nonnegative matrices
which will be useful when we analyse networks. In terms of adjacency matrices,
the key results are dependent on the connectivity of the underlying network. They
follow from results which were first proved for positive matrices. If A is a positive
matrix then we write A > 0.
Eigenvalues and eigenvectors 63

Theorem 5.3 (Perron) Suppose A ∈ R n×n


and A > 0. Then A has an eigenvalue
λ that satisfies the following properties.
1. λ = ρ(A).
2. If μ ∈ σ (A) and μ = λ then |μ| < λ.
3. λ has algebraic multiplicity 1.
4. If Ay = λy then y = αx where x > 0 and α ∈ C.

The proof of Perron’s theorem can be found in many standard linear algebra
texts. The details are rather intricate and we omit them here, but it would be
remiss of us not to give a flavour of what the proof involves.

Problem 5.3
Show that if A > 0 then the following are true.

1. Ak > 0 for every finite k.


2. ρ(A) > 0.
3. If x ≥ 0 then Ax ≥ 0 with equality if and only if x = 0.

1. By induction. A > 0. Suppose B = Ak > 0. Then the (i, j)th entry of Ak+1

is nk=1 bik akj and the result follows since every term in this sum is positive.
2. If ρ(A) = 0 then Ak = O for k ≥ n. But we have just shown that Ak > 0 for
a positive matrix.

3. The kth entry of Ax is nk=1 aik xk . Every term is nonnegative and the sum
can only equal zero if every element is zero.

For nonnegative matrices, point one of the Perron theorem is also true. But
the uniqueness of the largest eigenvalue is not guaranteed.

Examples 5.5
 
0 1
(i) A = is nonnegative and has eigenvalues ±1, showing point two of the Perron theorem does not hold for
1 0
all nonnegative matrices.
 
A O
(ii) Suppose A > 0 and that Ax = ρ(A)x where x > 0. Now let B = . Then it should be obvious that
O A
ρ(B) = ρ(A) and
       
x x x x
B = ρ(B) , B = ρ(A) ,
0 0 –x –x

proving that points three and four of the Perron theorem do not hold for all nonnegative matrices.
(iii) We could have used B = I in the last example.
64 Algebraic Concepts in Network Theory

Many nonnegative matrices do share all the key properties of positive matrices.
Whether they do or not depends on the pattern of zeros in the matrix which for
adjacency matrices, as previously stated, can be related to connectivity in networks.

Definition 5.7 A ∈ Rn×n is reducible if there exists a permutation P such that


 
X Y
A=P PT
O Z

where X and Z are both square A ∈ Rn×n is fully decomposable if there are
permutations P and Q such that
 
X Y
A=P Q.
O Z

If a square matrix is not reducible then it is irreducible. If it is not fully


decomposable then it is fully indecomposable.

Theorem 5.4 (Perron–Frobenius) If A is fully indecomposable and nonnegative


then the properties listed in the Perron theorem still hold. If it is irreducible then
properties 1, 3, and 4 are guaranteed to hold.

Perron’s theorem and the Perron–Frobenius theorem can be generalized in


many ways to cover much more exotic linear operators.
Notice that if A is symmetric and reducible then it can be permuted into the
 
X O
block diagonal form . Essentially, X and Z are completely independent
O Z
of each other. If either of these matrices is reducible it can also be permuted into
block diagonal form, and so on, and hence any symmetric matrix can be written
in the form

A = Pdiag(A1 , A2 , . . . , Ak )P T

where each of the diagonal blocks is irreducible. The Perron–Frobenius theorem


can then be applied to each block in turn.
Notice that using reducibility we can restate the condition established in Chap-
ter 2 that characterizes connectivity. A network, G, is connected if and only if its
adjacency matrix, A, is irreducible.
If a network has more than one connected component then its adjacency
matrix must be reducible. In particular, using the argument given above, if the
network is undirected we can permute its adjacency matrix into a block diagonal
form in which each block is irreducible.
Eigenvalues and eigenvectors 65
..................................................................................................

FURTHER READING

Horn, R.A. and Johnson, C.R., Matrix Analysis, Cambridge University Press,
2012.
Meyer, C.D., Matrix Analysis and Applied Linear Algebra, SIAM, 2000.
Strang, G., Linear Algebra and Its Applications, Brooks/Cole, 2004.
Spectra of Adjacency
6 Matrices

6.1 Motivation 66
6.2 Spectral analysis of simple In this chapter
networks 66
Diverse physical phenomena can be understood by studying their spectral
6.3 Spectra and structure 70
properties. An understanding of the harmonies of music can be developed by
6.4 Eigenvectors of the adjacency
looking at characteristic frequencies that can be viewed as eigenfunctions; and
matrix 75
astronomers can predict the chemical composition of unimaginably distant
Further reading 77
galaxies from the spectra of the electromagnetic radiation they emit.
In this chapter we look at some ways to define the spectrum of a network
and what we can infer from the resulting eigenvalues. We will only consider
undirected networks, which allows us to take advantage of some powerful
tools from matrix algebra.

6.1 Motivation
The obvious place to start when looking for the spectrum of a network is the ad-
jacency matrix. For now, we will focus on simple networks. Since the adjacency
matrix is symmetric, the eigenvalues are real (by the spectral theorem) and since it
is nonnegative, its largest eigenvalue is real and positive (by the Perron–Frobenius
theorem). We can compare networks through the spectra of their adjacency matri-
ces but we can also calculate some useful network statistics from them, too. We
will assume that the spectrum of the adjacency matrix A is ordered so that
λ1 ≥ λ2 ≥ · · · ≥ λn .
Since A is symmetric and the eigenvalues are real, such an ordering is possible.

6.2 Spectral analysis of simple networks


The spectrum of the adjacency matrix is an example of a graph invariant. How-
ever we decide to label a network, the eigenvalues we compute will be the
same. This is simple to see. Relabelling is equivalent to applying a similarity
transformation induced by a permutation matrix, P, and
det(PAP T – λI ) = det(P(A – λI )P T ) = det(A – λI ) det(P) det(P T )
and hence the zeros of the c.p. are unchanged by the permutation.
Spectral analysis of simple networks 67

A network is not uniquely defined by its spectrum—there are a number of


famous results in graph theory on isospectral graphs (i.e. graphs with exactly
the same eigenvalues) which are otherwise unrelated—but certain properties can
still be inferred. The spectra of some highly structured networks can be derived
analytically through a variety of straightforward linear algebra techniques.

Examples 6.1

(i) The adjacency matrix of the complete network1 Kn is A = E – I . E has


a zero eigenvalue of algebraic multiplicity n – 1 (why?), so A has an
eigenvalue of –1 with algebraic multiplicity n – 1. Since Ae = (n – 1)e,
the remaining eigenvalue is n – 1. Thus

det(λI – A) = (λ + 1)n – 1 (λ – n + 1).

(ii) The cycle graph, Cn , has adjacency matrix


⎡ ⎤
0 1 0 ... 0 1
⎢1 0
⎢ 1 0 ··· 0⎥ ⎥
⎢ ⎥
⎢ .. ⎥
⎢0 1
⎢ 0 . 0⎥ ⎥
A = ⎢. . .. .. ⎥ .
⎢ .. 0
⎢ 1 .. . .⎥⎥
⎢ ⎥
⎢ . . ⎥
⎣0 .. .. 0 1⎦
1 0 ··· 0 1 0
 T
Let ωn = 1 and v = 1 ω ω2 ··· ωn – 1 . Then, since
ωn – 1 = ω–1 ,
⎡ ⎤
ω + ω–1
⎢ 1 + ω2 ⎥
⎢ ⎥
⎢ ω + ω3 ⎥
⎢ ⎥
Av = ⎢ ⎥ = (ω + ω–1 )v.
⎢ .. ⎥
⎢ . ⎥
⎢ n–3 ⎥
n – 1⎦
⎣ω +ω
1 + ωn – 2

Thus ω + ω–1 = 2Re ω is an eigenvalue of A for any root of unity, ω.


This means the spectrum of Cn is
! " # $
2π j
2 cos , j = 1, . . . , n .
n

All the eigenvalues corresponding to complex ω have algebraic multi- 1 Throughout this book e denotes a vec-

plicity 2 since ωj and ωn – j share the same real part. tor of ones and E a matrix of ones. Their
dimensions will vary but should be readily
continued understood from the context in which they
appear.
68 Spectra of Adjacency Matrices

Examples 6.1 continued

(iii) It can be shown that each repeated eigenvalues of C2n is an eigen-


value of the path graph. This can be explained by looking closely at
the eigenvectors of C2n , which we will do in Section 6.4. Accordingly,
the spectrum of Pn – 2 is
! " # $
πj
2 cos , j = 1, . . . , n – 1 .
n

(iv) If G is bipartite then its adjacency matrix can be permuted to the form
 
O B
A= .
BT O
 
x
Suppose v is an eigenvector of A. Partition it as we have A so v = .
y
Now Av = λv and
    
O B x By
Av = = ,
BT O y BT x

so By = λx and BT x = λy. Hence


       
x –By –λx x
A = = = –λ .
–y BT x λy –y

We conclude that if λ ∈ σ (A) then so is –λ and the spectrum of a bi-


partite network is centred around zero. In particular, a bipartite network
with an odd number of nodes must have a zero eigenvalue.
(v) Let Emn represent an m × n array of ones. Then the adjacency matrix of
the complete bipartite network Km,n can be written
 
O Emn
A= T .
Emn O
 
x
Suppose that is an eigenvector of A (where x has m components)
y
with eigenvalue λ. Then
     
x x Emn y
λ =A = T ,
y y Emn x
Spectral analysis of simple networks 69

so
T
Emn Emn x = λEmn y = λ2 x.
T
But Emn Emn = nE, where E is an m×m matrix of ones. From our analysis
of the complete graph, we know that the spectrum of nE is {mn, 0} and

so the eigenvalues of Kmn are ± mn and 0. We get a similar result if
T
we consider Emn Emn y and if we account for all the copies of the zero
eigenvalue we find that it has algebraic multiplicity of m + n – 2. Hence,

det(A – λI ) = λm + n – 2 (λ2 – mn).

(vi) If G is disconnected then its spectrum is simply the union of the spectra
of the individual components.
(vii) If the adjacency matrix of G has characteristic polynomial

det(A – λI ) = (λ – λ1 )p1 (λ – λ1 )p2 · · · (λ – λk )pk

then it is common practice in network theory to write its spectrum as

σ (G) = {[λ1 ]p1 , [λ2 ]p2 , . . . , [λk ]pk }.

For general networks we can either compute (parts of) the spectrum nu-
merically or estimate certain eigenvalues. The most useful eigenvalues to have
information about are λ1 , λ2 , and λn (the most negative eigenvalue). Since A is
symmetric and nonnegative we can deduce several things about λ1 and λn .
By Gershgorin’s theorem, the spectrum lies in discs centred on the diagonal
elements of A and have radius equal to the sum of off diagonal elements in a
row. Note that since the eigenvalues are real, this means all eigenvalues lie in the
interval [–n + 1, n – 1]. By the Perron–Frobenius theorem, the largest eigenvalue is
nonnegative so 0 ≤ λ1 ≤ n – 1 and by symmetry, λ1 = 0 if and only if A = O. Note
that the adjacency matrices of all networks apart from that of Nn have negative
eigenvalues.
Another consequence of the Perron–Frobenius theorem is that unless A is re-
ducible, |λn | < λ1 . We saw in an earlier example that the spectrum of a bipartite
network is symmetric about zero, and hence |λn | = λ1 . The only other case in
which A is reducible for a simple network, G, is if G has several components. And
unless one of these components is bipartite, the smallest eigenvalue is guaranteed
to satisfy |λn | < λ1 , too.
We can get a lower bound on the biggest eigenvalue using the ratio of edges to
nodes in a network. Since λ1 = ρ(A) is an upper bound on the size of Rayleigh
xT Ax
quotients, for any x = 0, λ1 ≥ T . Letting x = e gives
x x
70 Spectra of Adjacency Matrices
eT Ae 2m
λ1 ≥ = .
eT e n

We can get tighter bounds on eigenvalues by using additional information


about the network.

Problem 6.1
Show that if kmax is the maximal degree of any node of a simple network with

adjacency matrix A then λ1 ≥ kmax .
Suppose node k has the highest degree and define x so
 √
kmax , i = k,
xi =
aik , i = k,

and let y = Ax.


We can get lower bounds on the components of y. They are all nonnegative

and if aik = 1, yi ≥ aik xk = kmax . Noting that A has a zero diagonal,
 
yk = akj xj = a2kj = kmax .
j j

Since xT x = 2kmax ,
√ √
xT Ax kmax kmax + kmax kmax %
λ1 ≥ T ≥ = kmax .
x x 2kmax

Note that the second largest eigenvalue, λ2 , is harder to estimate accurately.


The spectral gap λ1 – λ2 can give useful insight into a network but we will not
discuss this further.

6.3 Spectra and structure


In Section 6.2 we showed how we could use features of the network to estimate
eigenvalues. We can work the other way round, too: certain spectral distribu-
tions guarantee the existence of particular network structures, and we can use the
eigenvalues to enumerate these.
Suppose that G(V , E) is a simple network whose adjacency matrix A has
characteristic polynomial

det(λI – A) = c0 + c1 λ + · · · + cn – 1 λn – 1 + cn λn .

As previously stated, the coefficients of the c.p. are related to the sum of the
principal minors of A. Since A is simple its diagonal is zero and hence so is that
of any principal submatrix. All nonzeros in an adjacency matrix are ones and all
Spectra and structure 71

principal submatrices must be symmetric. A 1 × 1 principal minor of A is simply


a diagonal element hence cn – 1 = 0.    
0 0 0 1
The possible 2 × 2 principal submatrices are and . Only the
0 0 1 0
second of these has a nonzero determinant (namely –1) and there is one of these
principal submatrices for every edge in A. Thus –cn – 2 equals the number of edges
in G.
The non-trivial 3 × 3 principal submatrices are
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 0 1 1 0 1 1
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣1 0 0⎦ , ⎣1 0 0⎦ , and ⎣1 0 1⎦ .
0 0 0 1 0 0 1 1 0

Only the third of these has a nonzero determinant (it is two). Such a subma-
trix corresponds to a triangle in G and so –cn – 3 counts twice the number of
triangles in G.
We can use the spectrum of a network to count walks. In particular, since the
entries of Ap denote the number of walks of length p between nodes, the number
of closed walks lies along the diagonal and hence the sum of the closed walks of
length p is

n
tr(Ap ) = λpi , (6.1)
i=1

where the λi are the eigenvalues of A.


We can use this result as an alternative method for counting edges and tri-
angles. Every closed walk of length two represents a trip along an edge and back.
Each edge is involved in two such trips. Every closed walk of length three repre-
sents a trip around a triangle. Each triangle is involved in six such walks. So just
divide (6.1) by 2 or by 6 when p = 2 or 3 to calculate the right values. Things get
a little trickier if we want to count the number of circuits of greater length, as we
have to eliminate closed walks which are not paths.

Examples 6.2

(i) Recall that the c.p. of the complete graph is

det(λI – A) = (λ + 1)n – 1 (λ – n + 1).

Expanding this polynomial gives

n(n – 1) n – 2 n(n – 1)(n – 2) n – 3


det(λI – A) = λn – λ – λ + · · · + (–1)n (n – 1).
2 3

continued
72 Spectra of Adjacency Matrices

Examples 6.2 continued

One can readily confirm that the coefficients of λn – 2 and λn – 3 match the
expected values in terms of edges and triangles.
(ii) The number of edges in the cycle graph Cn is n and there are no
triangles for n ≥ 4, thus its characteristic polynomial is

λn – nλn – 2 + cn λn – 3 + · · ·

where c3 = –2 and cn = 0 if n > 3.


(iii) The path graph, Pn , has n edges and no triangles, so the c.p. of its
adjacency matrix is

det(λI – A) = λn + 1 – nλn – 1 + 0λn – 2 + · · · .

(iv) Since a bipartite network has no triangles, the cn – 3 coefficient in the c.p.
is zero.
(v) Suppose that the spectrum of the adjacency matrix of a network is
symmetric about zero. Then if p is odd,


n
tr(Ap ) = λpi = 0,
i=1

so there are no closed walks in the network of odd length, meaning


that it is bipartite. This is the converse of the result we established in
Example 6.1(iv).

We can also use (6.1) to predict whether a network contains certain features.

Problem 6.2

Suppose G = (V , E) has m edges. Show that if λ1 > m, then G contains a
triangle.
 
Since m ≥ 0, λ1 > 0. From (6.1), 2λ31 > 2λ1 m ≥ λ1 ni=1 λ2i ≥ ni=1 |λ3i |, so

λ31 > ni=2 |λ3i |.
 
If t is the number of triangles then 6t = ni=1 λ3i ≥ λ31 – ni=2 |λ3i | > 0.
Since t must take an integer value, we have established the existence of at least
one triangle.

While this may not be the most practical of results, it illustrates the point that
if we know or can estimate only limited parts of the spectrum, we may be able to
determine many characteristics of the network.
Spectra and structure 73

Example 6.3

Suppose that G is a connected k-regular network with n nodes and that its
adjacency matrix has only four distinct eigenvalues, namely,

λ1 > λ2 > λ3 > λ4 ,

such that
& '
σ (G) = [λ1 ]p1 , [λ2 ]p2 , [λ3 ]p3 , [λ4 ]p4 .

Since G is connected, p1 = 1. And since pi = n (algebraic multiplicities
always sum to the dimension), we get

1 + p2 + p3 + p4 = n.

The Gershgorin discs for the adjacency matrix, A, of a k-regular network


are all of the form |z| ≤ k so |λ1 | ≤ k and since Ae = ke we know that
λ1 = k.
Since the sum of the eigenvalues of A is zero (since tr(A) = 0),

k + p2 λ2 + p3 λ3 + p4 λ4 = 0.

Similarly,

k2 + p2 λ22 + p3 λ23 + p4 λ24 = tr(A2 ) = nk,

since tr(A2 ) double counts the nk/2 edges of G.


Note also that the number of triangles, t, can be written

1 1
t= tr(A3 ) = (k3 + p2 λ32 + p3 λ33 + p4 λ34 ).
6 6

Problem 6.3
Calculate the number of 4-cycles in the network described in Example 6.3 in
terms of k, n, the eigenvalues and their multiplicities.
Given the constraints that we have imposed, it can be shown that the number
of closed walks of a particular length is the same from any node in the network.
Suppose

u→v→w→x→u

is a closed walk of length four. Following the previous example, we can express
the number of closed walks of length four as
74 Spectra of Adjacency Matrices

tr(A4 ) = k4 + p2 λ42 + p3 λ43 + p4 λ44 .

In some of these, the nodes are not all distinct and so do not count towards the
total of 4-cycles.
There are three types of walk of length four with duplicate nodes.

1. u → v → u → v → u
2. u → v → u → x → u (x = v)
3. u → v → x → v → u (x = u)

Given a node u, there are k choices for v and k – 1 for x: any node adjacent to
u (u) that is not v (u).
In total this gives k walks of type one and k(k – 1) of each of types two and
three. We get the same number from any node in a k-regular graph, hence the
total number of closed walks of length four which are not cycles is nk(2k – 1).
Each 4-cycle represents eight different closed walks: start from one of its nodes
and move clockwise or anticlockwise.
Thus the total number of distinct 4-cycles is

1( 4 )
k + p2 λ42 + p3 λ43 + p4 λ44 – nk(2k – 1) .
8

Multiple eigenvalues appear to be much more common in adjacency matrices


than in a matrix chosen at random. They often correspond to symmetries within
the network (which is why it is possible for regular graphs to exist with very few
distinct eigenvalues). A symmetry can be characterized as a relabelling of the
nodes which leaves the adjacency matrix unchanged—a permutation P such that
P T AP = A. The possibilities for such a P are severely restricted if there are no
repeated eigenvalues.

Example 6.4

Suppose that the adjacency matrix A has distinct eigenvalues and that x is an
eigenvector of A such that Ax = λx.
If P T AP = A then APx = PAx = λ(Px) and so Px is also an eigenvector
of A corresponding to the eigenvalue λ. This is only possible if Px = ±x
which means that P 2 x = x. Since this is true for all eigenvectors, P 2 = I (and
P = P T ).
The only permutations which have this property are the identity matrix
and ones where pairs of nodes are swapped with each other.
Eigenvectors of the adjacency matrix 75

6.4 Eigenvectors of the adjacency matrix


In classical graph theory there seem to be very few applications of eigenvectors,
particularly those of the adjacency matrix, but when it comes to drawing out
properties of complex networks, eigenvectors can prove to be a vital tool. For
the most part, practitioners have used the eigenvectors of the graph Laplacian—
which we introduce in Chapter 7—but the eigenvectors of the adjacency matrix
have applications in areas such as community discovery (see Chapter 21). To do
this, we will associate the ith component of an eigenvector with the ith node of
the network and this value can be used as a quantifier.
The most frequently used eigenvector of the adjacency matrix is its principal
one. Recall that in a nonnegative matrix, the eigenvector associated with λ1 can
be normalized so that all its entries are nonnegative. The entries can then be
used to score the importance of a node; an idea we will make more concrete,
in Chapter 15. In any case, we can use the eigenvector components to group
together nodes with similar properties.

Examples 6.5

(i) Consider a bipartite network with principal eigenvalue λ. From Example 6.1(iv), we know that the signs of the
elements of the eigenvector associated with the eigenvalue –λ can be used to divide the nodes of the network into
its two parts.
(ii) We know that if ω is an nth root of 1 then
 T
v = 1 ω ω2 . . . ω n – 1

is an eigenvector of Cn (see Example 6.1(ii)) with corresponding eigenvalue 2 cos(2π j/n) for some j ∈
{1, 2, . . . , n}. In this case the principal eigenvector is e. All the nodes are identical in this network and so it is
no surprise that they are indistinguishable. Note that e is an eigenvector of any regular network.
If n is even then Cn is bipartite and we can divide the network according to the signs of the elements of the
eigenvector associated with the eigenvalue –2. In this case, the eigenvector is
 T
v = 1 –1 1 . . . –1 ,

highlighting that neighbouring nodes in the cycle belong to different parts.


All the other eigenvalues of Cn are repeated, meaning that they are associated with two-dimensional
eigenspaces. This makes it harder to use eigenvectors to differentiate elements. Orderings and splittings of nodes
may vary according to the representative vectors we pick from the eigenspace.
(iii) We can exploit the eigenspaces of the repeated eigenvalues of the cycle graph C2n to deduce the spectrum of the
path graph Pn–2 . First observe that removing diametrically positioned elements of C2n gives us two copies of Pn – 2
(see Figure 6.1).

continued
76 Spectra of Adjacency Matrices

Examples 6.5 continued

Figure 6.1 Splitting the cycle graph C2n into two copies of the path graph
Pn – 2

Using Example 6.1(ii) we can write a basis for the two-dimensional eigenspaces in the form
⎧⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 ⎪


⎪ ⎢ ⎢ –1 ⎥ ⎪

⎪ ω ⎥ ⎥ ⎢ ω ⎥⎪

⎨⎢⎢ 2 ⎥ ⎢ –2 ⎥


⎢ ω ⎥,⎢ ω ⎥ ,
⎪ ⎢ ⎥ ⎢
.. ⎥ ⎢ .. ⎥ ⎪ ⎥

⎪ ⎢ ⎪



⎣ . ⎦ ⎣ . ⎦⎪ ⎪


⎩ ⎭
ω2n–1 ω1–2n

where ω2n = 1.
Now choose such an ω, for which the eigenvalue is λ = ω + ω–1 , and call the basis elements v1 and v2 ,
respectively. Note that the first and (n + 1)th elements of x = v1 – v2 are both zero.
Since a1,n+1 = an+1,1 = 0 it follows that λ is an eigenvalue and x is an eigenvector of the matrix you obtain by
replacing the first and (n + 1)th rows and columns of A with zeros. For example, when n = 4 we end up with the
matrix
⎡ ⎤
0 0 0 0 0 0 0 0
⎢0 0 1 0 0 0 0 0⎥
⎢ ⎥
⎢0 0⎥
⎢ 1 0 1 0 0 0 ⎥
⎢ ⎥
⎢0 0 1 0 0 0 0 0⎥
⎢ ⎥,
⎢0 0 0 0 0 0 0 0⎥
⎢ ⎥
⎢0 0 0 0 0 0 1 0⎥
⎢ ⎥
⎣0 0 0 0 0 1 0 1⎦
0 0 0 0 0 0 1 0
Eigenvectors of the adjacency matrix 77

isolating two copies of the adjacency matrix of the path graph Pn – 2 . Thus each repeated eigenvalues of C2n is an
eigenvalue of Pn – 2 , as claimed in Example 6.1 (iii).
(iv) Consider the network in Figure 6.2.
The principal eigenvector of the adjacency matrix is
 T
0.106 0.044 0.128 0.219 0.177 0.186 0.140 .

1 5

3 4 6

2 7

Figure 6.2 A graph with a


hub at node 4

Notice that the largest element (the fourth) is associated with the node at the hub of the network and that the
smallest element is associated with the most peripheral. The eigenvector of the next largest eigenvalue is
 T
–0.172 –0.091 0.206 –0.202 0.183 –0.039 0.107 .

If we split the nodes according to the signs of the elements of this vector we find a grouping of nodes that points
towards a bipartite split in the network we only need to remove the edge between nodes 4 and 6 to achieve this.
We will revisit applications of eigenvectors frequently in later chapters.

..................................................................................................

FURTHER READING

Biggs, N. Algebraic Graph Theory, Cambridge University Press, 1993.


Cvetković, D., Rowlinson, P., and Simíc, S. Eigenspaces of Graphs, Cambridge
University Press, 1997.
Cvetković, D., Rowlinson, P., and Simíc, S. An Introduction to the Theory of Graph
Spectra, Cambridge University Press, 2010.
The Network Laplacian
7
In this chapter
7.1 The graph Laplacian 78
The adjacency matrix is not the only useful algebraic representation of a
7.2 Eigenvalues and eigenvectors network, particularly when it comes to spectral analysis. We introduce the
of the graph Laplacian 80
network Laplacian, which can be defined in terms of either the adjacency
Further reading 85
matrix or the incidence matrix. We look at the spectrum of the Laplacian,
highlighting some pertinent properties and deriving some simple bounds on
key eigenvalues. We give examples of the applications of the eigenvalues and
eigenvectors of the Laplacian matrix. We will have more to say on these later.

7.1 The graph Laplacian


You may well have encountered the Laplacian operator,  = ∇ 2 which plays a
significant role in the theory and solution of partial differential equations. For
example, solutions of the equation

u + λu = 0

over a bounded region tell us about characteristic modes associated with vibra-
tions in the region and Laplace’s equation u = 0 can be related to equilibria in a
system.
The graph Laplacian extends the idea of  to a discrete network. Starting from
the definition of a derivative as the limit of differences,

f (x) – f (a)
f  (a) = lim ,
x→a x–a

we note that the incidence matrix lets us find differences between nodes in a net-
work. Consider a function f which is assigned a value f (xi ) = fi at each node. Then
the differences between incident nodes can be found by calculating g = Bf, where
B is the incidence matrix. To get the second difference at a node i, as we must
to find , we need to take the differences of all the first differences incident to i,
namely BT g = BT Bf. With a little more rigour (we need to choose appropriate
denominators for our differences and make sure signs of differences are properly
matched together) we can show that the matrix L = BT B is indeed a discrete
The graph Laplacian 79

analogue of the continuous Laplacian operator. We call it the graph Laplacian of


the network G. The graph Laplacian can also be written in terms of the adjacency
matrix. Note that for a network with n nodes and m edges, L is an n×n matrix and

m
lij = bki bkj .
k=1

Recall that row k of B contains only two nonzeros: a 1 and a –1 in the columns
incident to the kth edge. There is a positive contribution to lii every time bki = ±1.
That is, there is a positive contribution of 1 corresponding to every edge incident
to i and hence lii is equal to the degree of node i. But if i = j then bki bkj = 0 if and
only if i and j are adjacent in which case bki bkj = –1. Therefore

L = D–A

where A is the adjacency matrix of G and D = diag(Ae) is a diagonal matrix


whose entries are the degrees of each node. This representation of L makes it
easy to compute.

Examples 7.1

(i) Two networks are illustrated in Figure 7.1.

(a) (b)

Figure 7.1 Two simple networks

Ordering nodes from left to right and top to bottom, their Laplacian
matrices are
⎡ ⎤
1 –1 0 0 0 0 0 0 ⎡ ⎤
⎢–1 3 –1 0 0 –1 0 0⎥ 2 0 –1 0 –1 0 0
⎢ ⎥ ⎢ 0 1 –1 0 0 0 0⎥
⎢ 0 –1 3 –1 0 0 –1 0⎥ ⎢ ⎥
⎢ ⎥ ⎢–1 –1 3 –1 0 0 0⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 0 –1 1 0 0 0 0⎥ ⎢ ⎥
⎢ ⎥ and ⎢ 0 0 –1 4 –1 –1 –1⎥ .
⎢ 0 0 0 0 1 –1 0 0⎥ ⎢ ⎥
⎢ ⎥ ⎢–1 0 0 –1 3 0 –1⎥
⎢ 0 –1 0 0 –1 3 –1 0⎥ ⎢ ⎥
⎢ ⎥ ⎣ 0 0 0 –1 0 2 –1⎦
⎣ 0 0 –1 0 0 –1 3 –1⎦
0 0 0 –1 –1 –1 3
0 0 0 0 0 0 –1 1

continued
80 The Network Laplacian

Examples 7.1 continued

(ii) The complete network, Kn has Laplacian nI – E. For the null network,
L = O. The Laplacian of the path graph Pn – 1 is an n × n matrix of the
form
⎡ ⎤
1 –1 0 ... 0
⎢ .. ..⎥
⎢–1 . .⎥
⎢ 2 –1 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ 0 . 0⎥ .
⎢ ⎥
⎢ .. .. ⎥
⎣ . . –1 2 –1⎦
0 ... 0 –1 1

7.2 Eigenvalues and eigenvectors


of the graph Laplacian
Because of its theoretical connections with the Laplacian operator, the graph
Laplacian is a very useful tool for analysing a network. In particular, its spectrum
and its eigenvectors can reveal significant properties of a network.
Since L = BT B = D – A is symmetric we know that its spectrum is real. Its
Gershgorin discs are of the form i = {z : |z – ki | ≤ ki }, where ki is the degree
of node i. Applying Gershgorin’s theorem we find that all the eigenvalues of the
graph Laplacian lie in the interval [0, 2kmax ] where kmax is the maximal degree of
a node in the associated network. Since

Le = (D – A)e = diag(Ae)e – Ae = Ae – Ae = 0

we know that zero is an eigenvalue of any graph Laplacian. Simple bounds on


other eigenvalues of L can be attained easily.

Problem 7.1
Suppose we order the eigenvalues of the graph Laplacian L of a simple network
G so that

λ1 ≥ λ2 ≥ · · · ≥ λn – 1 ≥ λn = 0.

Show the following bounds hold.

1. λn – 1 > 0 ⇐⇒ G is connected.
2. kmax ≤ λ1 ≤ 2kmax .
3. λ1 is bounded above by the maximum of the sum of degrees of adjacent
nodes. That is,

λ1 ≤ k(ij)
max = max ki + kj .
(i, j)|aij =1
Eigenvalues and eigenvectors of the graph Laplacian 81

For point 1, note that (n – 1)I – L is a nonnegative matrix whose eigenvalues


are equal to (n – 1) – λi . By the Perron–Frobenius theorem, the largest positive
eigenvalue of this matrix, (n – 1) has algebraic multiplicity of 1 unless (n – 1)I – L
is reducible. This can only happen if A is reducible, too, in which case G has more
than one connected component.
We can use Rayleigh quotients to establish the lower bound in 2. Suppose
eT Lei
kmax = ki . Then i T = ki ≤ λ1 , where ei is the ith column of the identity
ei ei
matrix. The upper bound comes from considering the right hand boundary point
of the Gershgorin discs of L.
A simple way to establish the final result is to use Gershgorin’s theorem on the
matrix L = D–1 LD. The diagonal elements of this matrix are the same as those of
L and the sum of the moduli of the off diagonal elements in the ith row is
1 
n
kj aij . (7.1)
ki j=1

There are ki nonzero terms in the sum and they are bounded above by the
maximum degree of the nodes adjacent to i (call this k(i)
max ). Thus the point in the
ith Gershgorin disc of maximum size is ki + k(i)
max . Taking the maximum over all i
gives the desired bound.
Note that (7.1) gives the mean degree of nodes adjacent to i. Thus we can
improve our upper bound to
λ1 ≤ max ki + mi ,
i

where mi is the mean degree of nodes adjacent to node i.

The spectra of the adjacency matrix and the graph Laplacian share some
relations, but usually we can infer different information from each.

Examples 7.2

(i) For the network in Figure 7.1(a), λ1 ≤ 2kmax = k(ij)


max = 6. Each node of
degree 3 is connected to two nodes of degree 3 and a node of degree 1.
Thus the mean degree of nodes adjacent to a node of degree 3 is (in all
cases) 2.333 which improves the upper bound on λ1 to 5.333. In fact,
the largest eigenvalue of the Laplacian is 5.236.
For the network in Figure 7.1(b), 2kmax = 8, k(ij)
max = 7, and max ki +
i
mi = 6.75. In this case, λ1 = 5.449.
(ii) Recall that G is a k-regular network if all its nodes have degree k. If
this is the case, L = kI – A and λi ∈ σ (L) ⇐⇒ k – λi ∈ σ (A). The
eigenvectors of L and A are identical for regular networks.

continued
82 The Network Laplacian

Examples 7.2 continued

(iii) Kn is n – 1 regular and Cn is 2-regular. Since we know the spectra


of their adjacency matrices, we can write down the spectra of their
Laplacians, too.
The characteristic polynomial of the Laplacian of Kn is det(λI – L) =
λ(λ – n – 1)n – 1 and the eigenvalues of the Laplacian of Cn are
! " # $
2π j
2 – 2 cos , j = 1, . . . , n
n

or, using standard trigonometric identities,


! " # $
πj
4 sin 2
, j = 1, . . . , n .
n

(iv) As with the adjacency matrix, we can use the eigenvalues and eigen-
vectors of the Laplacian of the cycle graph C2n to find the spectrum of
the path graph (although the details differ). First observe that grouping
together pairs of entries in C2n gives us something that looks very like
Pn – 1 (see Figure 7.2.1 ).
Given a repeated eigenvalue λ = 2 – ω – ω–1 (ω2n = 1) of the Lapla-
cian, LC , of C2n we can form a linear combination of the eigenvectors
we gave in Chapter 6 to give another eigenvector. In particular,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 ω+1
⎢ ω ⎥ ⎢ω2n–1 ⎥ ⎢ω2 + ω2n–1 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 2 ⎥ ⎢ 2n–2 ⎥ ⎢ ⎥
z = ω⎢ ω ⎥ + ⎢ω ⎥ = ⎢ .. ⎥,
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ 2n–1 ⎥
⎣ . ⎦ ⎣ . ⎦ ⎣ω +ω ⎦
2

ω2n–1 ω 1+ω

a vector whose ith component is the same as its (2n + 1 – i)th.

1 2 3 ··· n− 1 n

2n 2n − 1 2n − 2 ··· n+ 2 n+ 1

···
1 Technically, we are treating Pn – 1 as
a quotient graph of C2n . The interested Figure 7.2 Pairing the nodes in the cycle graph C2n
reader can find many more details in a text gives the path graph Pn – 1
book devoted to graph theory.
Eigenvalues and eigenvectors of the graph Laplacian 83

Now we show that the first n components of z constitute an


eigenvector of the Laplacian LP of Pn – 1 .
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
z1 z1 – z2 2z1 – z2 – z2n z1
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ z2 ⎥ ⎢ –z1 + 2z2 – z3 ⎥ ⎢ –z1 + 2z2 – z3 ⎥ ⎢ z2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
LP ⎢ .. ⎥ = ⎢ . ⎥=⎢ . ⎥ = λ ⎢ .. ⎥ ,
⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣zn – 1 ⎦ ⎣–zn – 2 + 2zn – 1 – zn ⎦ ⎣ –zn – 2 + 2zn – 1 – zn ⎦ ⎣ zn – 1 ⎦
zn –zn – 1 + zn –zn – 1 + 2zn – zn + 1 zn

since z1 = z2n , zn = zn + 1 and the action of LP on the first n components


of z mimics that of LC .
We conclude that each repeated eigenvalue of C2n is an eigenvalue of
Pn – 1 . The remaining eigenvalue is, of course, 0 and so the spectrum of
the Laplacian of Pn – 1 is
! " # $
πj
4 sin2 , j = 1, . . . , n .
2n

(v) Km,n has Laplacian


 
nI –Emn
L= T ,
–Emn mI

where Emn is an m × n matrix of all ones. We can find its spectrum with
a similar technique to that used to find the spectrum of its adjacency
matrix.     
x x x
Let be an eigenvector of the eigenvalue λ. Then L =λ
y y y
gives

mx – Emn y = λx and ny – Emn


T
x = λy,

so,

(m – λ)(n – λ)x = (n – λ)Emn y = Emn Emn


T
x = nEm x.

This means that (m – λ)(n – λ) is an eigenvalue of nEm , which in turn


means (m – λ)(n – λ) = 0 or (m – λ)(n – λ) = mn. We conclude that the
eigenvalues of L are 0, m, n and m + n. Taking into account algebraic
multiplicities we can establish that the characteristic polynomial in this
case is

det(λI – L) = λ(λ – m – n)(λ – m)n – 1 (λ – n)m – 1 .


84 The Network Laplacian

Notice that the largest eigenvalue of the Laplacian of Kmn is m + n, confirming


the upper bound in point 3 of Problem 7.1 can be attained. It can be shown that
complete bipartite networks are the only ones for which the bound can be attained
but we omit the details.
As with the adjacency matrix, the spectrum of L can be used to calculate
certain useful network quantities. The most widely quoted example is that the
number of spanning trees in a network is given by λ1 λ2 · · · λn – 1 . Notice that if G
is not connected then λn – 1 = 0, confirming that there are no spanning trees in a
disconnected network.
The eigenvalues of the Laplacian are another example of a graph invariant.
For applications where invariance is a key factor, some authors prefer to use the
normalized Laplacian which can be defined as L = D –1/2 LD –1/2 (if a node is iso-
lated we ‘fix’ the undefined entry in D –1/2 by setting it to 1). It can be shown
that the spectrum of L lies in the interval [0, 2] and that for bipartite networks
the spectrum is symmetric around 1. For the applications that our predominant
interest in this book, the ‘ordinary’ Laplacian works perfectly well.

7.2.1 The Fiedler vector


A network is connected if the second smallest eigenvalue of its Laplacian is
nonzero. On the other hand, if G has k components then the Laplacian has k
zero eigenvalues and an orthogonal basis for the eigenvectors associated with the
zero eigenvalue can be constructed as follows: for each component form a vec-
tor whose ith element is one if and only if node i belongs to that component.
Otherwise the ith element is left as zero.
These results are trivial to establish but an important observation is that the
eigenvectors of the Laplacian contain useful information for establishing key
structures in a network. We have previously considered the notion of edge and
vertex connectivity. The spectrum of the graph Laplacian motivates the defin-
ition of the algebraic connectivity of a network, a(G), which is assigned the value
λn – 1 . The eigenvector associated with λn – 1 has a very important property.

Theorem 7.1 (Fiedler, 1975) Suppose G = (V , E) is a connected network with


graph Laplacian L whose second smallest eigenvalue is λn – 1 > 0. Let x be the
eigenvector associated with λn – 1 . Let r ∈ R and partition the nodes in V into
two sets

V1 = {i ∈ V |xi ≥ r} , V2 = {i ∈ V |xi < r} .

Then the subgraphs of G induced by the sets V1 and V2 are connected.

The significance of this result is that it gives us a method of partitioning a


network in a systematic way and ensuring the partitions are still connected. Each
of these partitions can be further divided and we may be able to identify key
Eigenvalues and eigenvectors of the graph Laplacian 85

clusters of the network. Choosing r to be the median value of x ensures that the
clusters are evenly sized. Another popular choice of r is 0. Spectral clustering is
one of many ways to divide a network into pieces. The vector x is known as the
Fiedler vector.
We will consider a variety of methods for partitioning a network in Chapter 21.
Some of these will exploit spectral information but there are a variety of other
techniques, too.

Example 7.3

5 6

3 7 8

1 4 9 11

2 13

10 12

Figure 7.3 A network


with some clusters. The
Fielder vector can re-
veal them

Figure 7.3 shows a network we saw in Chapter 2. The nodes have been
labelled to highlight the clustering given by the Fiedler vector.
The first five elements of the Fiedler vector have the opposite sign to the
others, suggesting one particular partition. Assuming x1 > 0, we find that

x1 = x2 > x3 > x4 > x5 > x6 > x7 > x10 > x8 > x13 > x9 > x12 > x11 .

Grouping nodes according to this ordering highlights several obvious


clusters.

..................................................................................................

FURTHER READING

Chung, F., Spectral Graph Theory, American Mathematical Society, 1997.


Fiedler, M., A property of eigenvectors of nonnegative symmetric matrices and its
application to graph theory, Czechoslovak Mathematical Journal, 25:619–633,
1975.
Godsil, C. and Royle, G., Algebraic Graph Theory, Springer, 2001. Chapter 13.
Classical Physics Analogies
8
In this chapter
8.1 Motivation 86
We introduce basic concepts from classical mechanics, such as the Lagran-
8.2 Classical mechanical analogies 87 gian and the Hamiltonian of a system, to describe the motion of a particle
8.3 Networks as electrical circuits 90 in space. We use the analogy of a network as a classical mechanics mass–
Further reading 94 spring system, which allows us to interpret the Laplacian matrix of a network
(and its spectrum) physically. We also picture networks as electrical circuits.
In this case, we show the effective resistance between a pair of nodes in an
electrical circuit is a Euclidean distance between the corresponding nodes
in a network. This chapter assumes a basic working knowledge of classical
physics.

8.1 Motivation
Analogies and metaphors are very useful in any branch of science. Complex
networks are already an abstraction of the connectivity patterns observed in real-
world complex systems in which we reduce complex entities to single nodes and
their complex relationships to the links of the network. It is very natural, therefore,
to use some physical analogies to study these networks so that we can use familiar
physical and mathematical concepts to understand the structure and dynamics
of networks; and we can use physical intuition about the abstract mathematical
concepts to study the structure/dynamics of networks.
In this chapter we focus on analogies based on classical physics. The first cor-
responds to the use of mass–spring systems in classical mechanics and the second
uses electrical circuits. In both cases we show how to gain intuition into the ana-
lysis of networks as well as how to import techniques from physics, such as the
resistance distance, which can be useful for the study of networks.
For instance, we understand mathematically the Fiedler vector of the Laplacian
matrix. Now if we observe that it is analogous to a vibrational mode of the nodes of
a mass–spring network, we can develop a physical picture of how this eigenvector
splits the nodes of the network into two clusters: one corresponding to the nodes
vibrating in one direction and the other to the nodes vibrating in the opposite
direction for a certain natural frequency. We fill in the details in Section 8.2.
Classical mechanical analogies 87

8.2 Classical mechanical analogies


A principal goal of classical mechanics is to determine the position of a particle
as a function of time i.e. the particle trajectory. We will consider finite systems;
the simplest of these is a single point with mass m moving in space. In order to
compute the trajectory of this particle, x(t), we need to determine the forces F(x)
acting on it using Newton’s equation

F(x) = mẍ(t), (8.1)

where ẍ(t) is the second derivative of the position with respect to time i.e. the
acceleration of the particle.
The state of a system with n degrees of freedom is fully determined by n
coordinates xi (t) and n velocities ẋi (t) for i = 1, . . . , n, and the system is described
by the Lagrangian L(x, ẋ, t). The Lagrangian function for a dynamical system is
its kinetic energy minus the potential function from which the generalized force
components are determined. That is, L = T – V , where T is the kinetic and V the
potential energies of the system. For a system with n particles

1 
T= mi ẋi2 , (8.2)
2 i

and
1
V = kij (xj – xi )2 , (8.3)
2 j,i

where the last summation is over all pairs of particles interacting with each other.
Thus,
⎡ ⎤
1 ⎣ 
2⎦
L= 2
mi ẋi – kij (xj – xi ) (8.4)
2 i j,i

and the momenta of the particle i at time t is

∂L(x, ẋ, t)
pi (t) = . (8.5)
∂ ẋi

The Hamiltonian transformation of the Lagrangian is given by the function

∂L(x, ẋ, t)  n
H = ẋi – L(x, ẋ, t) = ẋi pi – L(x, ẋ, t). (8.6)
∂ ẋi i=1

The Hamiltonian function (which we will simply refer to as the Hamiltonian)


for a given system is written as

H = T + V. (8.7)
88 Classical Physics Analogies

The so-called phase space of the system {(x, p)} is formed by the 2n-tuples
(x, p) = (x1 , x2 , . . . , xn , p1 , p2 , . . . , pn ) in which a path of the particle system is
determined by the Hamilton equations

∂H(x, p, t) ∂H(x, p, t)
ẋi (t) = = {xi , H}, ṗi (t) = – = {pi , H},
∂pi ∂xi

where {A, B} is the so-called Poisson bracket in a d-dimensional space, namely,

d " #
∂A ∂B ∂B ∂A
{A, B} = – . (8.8)
i=1
∂xi ∂pi ∂pi ∂xi
m1 k1 m2 k2 m3
We are interested in systems with several interconnected particles. So, for the
x1 x2 x3 purpose of illustrating the connections between classical mechanics and network
theory, consider a mass–spring system like the system of three masses and two
Figure 8.1 A mass–spring system with
springs illustrated in Figure 8.1.
masses mi , spring constants ki and posi-
The kinetic and potential energy for this system can be written as
tions xi
1 1 1 1 1
T= m1 ẋ21 + m2 ẋ22 + m3 ẋ23 , V = k1 (x1 – x2 )2 + k2 (x3 – x2 )2 ,
2 2 2 2 2

and the Lagrangian of the system is given by

1* +
L= m1 ẋ21 + m2 ẋ22 + m3 ẋ23 – k1 (x1 – x2 )2 – k2 (x3 – x2 )2 . (8.9)
2

Now if we use the Euler–Lagrange equations

d ∂L ∂L
– = 0, (8.10)
dt ∂ ẋi ∂xi

for the system under study we obtain

∂L ∂L ∂L
= m1 ẋ1 , = m2 ẋ2 , = m3 ẋ3 , (8.11)
∂ ẋ1 ∂ ẋ2 ∂ ẋ3

and

∂L ∂L ∂L
= –k1 (x1 – x2 ), = k1 (x1 – x2 ) + k2 (x3 – x2 ), = –k2 (x3 – x2 ). (8.12)
∂x1 ∂x2 ∂x3

d ∂L
It is evident that = mi ẍi , which is just Fi according to Newton’s
dt ∂ ẋi
equation. Hence, for the system of three masses and two springs,

m1 ẍ1 = –k1 x1 + k 1 x2 ,
m2 ẍ2 = k 1 x1 + (–k1 – k2 )x2 + k 2 x3 , (8.13)
m3 ẍ3 = k 2 x2 + –k2 x3 .
Classical mechanical analogies 89

which can be written in the matrix-vector form


⎡ ⎤⎡ ⎤ ⎡ ⎤⎡ ⎤
m1 0 0 ẍ1 –k1 k1 0 x1
⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ 0 m 2 0 ⎦ ⎣ 2⎦ ⎣ 1
ẍ = k –(k 1 + k 2 ) k 2 ⎦ ⎣x2 ⎦ , (8.14)
0 0 m3 ẍ3 0 k2 –k2 x3

or more concisely as

ẍ(t) = M –1 Lx. (8.15)

Notice that the matrix on the right-hand side of the equation is just the Lapla-
cian matrix for a weighted network having three nodes and two edges i.e. a path
of length 3. If we consider a system with m1 = m2 = m and k1 = k2 = k, then
⎡ ⎤ ⎡ ⎤⎡ ⎤
ẍ1 –k/m k 0 x1
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ẍ2 ⎦ = ⎣ k –2k/m k ⎦ ⎣x2 ⎦ . (8.16)
ẍ3 0 k –k/m x3

If we now take m = k = 1, the characteristic equation


, ,
,–1 – μ 0,,
, 1
, ,
, 1 –2 – μ 1, = 0 (8.17)
, ,
, 0 1 –1 – μ,

has solutions μ1 = 0, μ2 = –1, μ3 = –3: the negatives of the eigenvalues of the


Laplacian matrix of the corresponding network. In general, (8.16) has solution
⎡ ⎤ ⎡ ⎤
1 1
⎢ ⎥ ⎢ ⎥
x = (α1 + γ1 t) ⎣ 1 ⎦ + (α2 cos t + γ2 sin t) ⎣ 0 ⎦
1 –1
⎡ ⎤
√ √ ⎢ 1⎥
+ (α3 cos 3t + γ3 sin 3t) ⎣ –2 ⎦ . (8.18)
1

This solution is expressed in terms of the eigenvectors associated with the


eigenvalues μj of the Laplacian matrix.
Our conclusion is that we can interpret the eigenvalues and eigenvectors of the
Laplacian matrix of a network in terms of the vibrations of a mass-spring network.
The eigenvalues of the Laplacian are the squares of the natural frequencies of the
system. The natural frequencies are the frequencies at which a system tends to
oscillate in the absence of any external force. On the other hand, the eigenvectors
of the Laplacian matrix represent the displacements of each node due to harmonic
oscillations. For instance, for the three-node path we have a mode corresponding
to the translational movement of the network, i.e. the eigenvector corresponding
to μ1 = 0, plus two additional modes illustrated in Figure 8.2.
90 Classical Physics Analogies

Figure 8.2 Schematic representation of


the vibrational modes in P2 . The size of
the arrows is proportional to the magni-
tude of the mode (a) (b)

This analogy can be very helpful, not least when we use the eigenvectors of
the Laplacian to make partitions of the nodes in a network. Recall that the Fiedler
vector partitions a network into two clusters and we can think of this partition as
the grouping of the nodes according to the vibrational mode of the nodes in the
network corresponding to the slowest or fundamental natural frequency of the
1/2
system, μ2 .

Problem 8.1
Write down the Hamiltonian for the mass-spring network illustrated in Figure 8.3.
Clearly,
" #
1 p21 p2 p2 p2 p2
T= + 2 + 3 + 4 + 5
2 m1 m2 m3 m4 m5

and
1 *
V = k1 (x2 – x1 )2 + k2 (x3 – x2 )2 + k3 (x4 – x3 )2 + k4 (x5 – x4 )2
2
+
Figure 8.3 A network with edges +k5 (x5 – x1 )2 + k6 (x4 – x2 )2 .
labelled by force constants ki
Letting
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 –1 0 0 –1 k1 0 0 0 0 x1
⎢–1 0⎥ ⎢0 0⎥ ⎢x ⎥
⎢ 3 –1 –1 ⎥ ⎢ k2 0 0 ⎥ ⎢ 2⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
L=⎢ 0 –1 2 –1 0⎥ , K = ⎢ 0 0 k3 0 0 ⎥ , x = ⎢x3 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 0 –1 –1 3 –1⎦ ⎣0 0 0 k4 0⎦ ⎣x4 ⎦
–1 0 0 –1 2 0 0 0 0 k5 x5

1
gives H(x, p) = [T + xT KLx].
2

8.3 Networks as electrical circuits


An electrical circuit is an arrangement of devices—voltage and current sources,
resistors, inductors, and capacitors—which are interconnected in specific ways,
forming a network. The connectivity of these devices determines the relation be-
tween the physical variables of the system, such as voltages and currents. The
circuit has nodes, which represent the junction between two or more wires, and
edges which represent the wires. The wires are characterized by having some re-
sistance to the current flow and the nodes by the existence of voltage, which is
Networks as electrical circuits 91

created by the differences in the currents flowing through the edges of the circuit
and their respective resistance. According to Ohm’s law the voltage is related to
the current I and the resistance R according to the relation V = IR. The inverse
of the resistance is known as the conductance of the corresponding edge.
For a circuit we can represent the voltages, currents, and resistances of all the
edges by means of vectors. That is, let v be the vector representing the voltages
at each node of a circuit. Then the current at each edge is given by the vector
representation of Ohm’s law

i = R –1 BT v, (8.19)

where B is the node-to-edge incidence matrix and R is a diagonal matrix of edge


resistances. Using the conductance matrix C = R –1 , Ohm’s law is

i = CBT v. (8.20)

In general, a circuit contains external sources of voltage and current used


to provide fixed sources of energy to drive the circuit. There is a relationship
between the voltage and current sources known as Norton equivalence. Conse-
quently, we only need to consider the external current sources as any voltage
external source can be transformed into its equivalent current one. Furthermore
by using Kirchhoff’s current law, which states that the algebraic sum of the currents
for all edges that meet in a common node is zero, we can write

iext = BT i, (8.21)

and from (8.20),

iext = BT CBT v = Lv, (8.22)

where L is the Laplacian of a weighted network.


If our aim is to find the voltages at the nodes generated by the external sources
of current we need to solve (8.22). That is, we would like to find

v = L –1 iext . (8.23)

However, Laplacians have zero eigenvalues, so L –1 does not exist. A general


trick used to overcome the singularity in (8.22) is to ground a node, which in
practice means that we remove the corresponding row and column of L, and we
solve the reduced system v0 = L0–1 iext0 . Another alternative is to let v = L + iext ,
where
" #–1
1 1 T
L = L + eeT
+
– ee (8.24)
n n

is the Moore–Penrose pseudoinverse of the Laplacian matrix.


92 Classical Physics Analogies

Let us now suppose that we want to calculate the voltage at each node induced
when a current of 1 amp enters at node p and a current of –1 amp leaves node q.
Using (8.24) we have

 T
+
v = (Lp1 +
– Lq1 ) +
(Lp2 +
– Lq2 ) ... +
(Lpn +
– Lqn ) . (8.25)

If we want the difference in voltage created at the nodes p and q, we only need

pq = vp – vq = (Lpp
+ +
– Lqp +
) – (Lpq +
– Lqq +
) = Lpp +
+ Lqq +
– 2Lpq . (8.26)

+ +
Notice that Lpq = Lqp because the network is undirected.
The voltage difference pq represents the effective resistance between the two
nodes p and q, which indicates the potential drop measured when a unit current
is injected at node p and extracted at node q. Hence

pq = (ep – eq )T L + (ep – eq ). (8.27)

Problem 8.2
Show that the effective resistance is a Euclidean distance between a pair of nodes.
By definition, the effective resistance is given by

pq = Lpp
+ +
+ Lqq +
– 2Lpq .

It turns out that the spectral decomposition of the Moore–Penrose pseudoin-


verse of the Laplacian is


n
+
Lpq = μ–1
j qj (p)qj (q),
j=2

where 0 = μ1 < μ2 ≤ · · · ≤ μn are the eigenvalues of the Laplacian and qj the


eigenvector associated with μj . So

n
1
pq = [qj (p) – qj (q)]2 ,
j=2
μj

which can be expressed in matrix-vector form as

pq = (qp – qq )T M –1 (qp – qq ),

where
 T
qr = q2 (r) q3 (r) ... qn (r)
Networks as electrical circuits 93

and M is a diagonal matrix of the eigenvalues of the Laplacian with the trivial one
replaced by a nonzero. Hence

pq = (M –1/2 (qp – qq ))T M –1/2 (qp – qq )(M –1/2 qp – M –1/2 qq )T


×(M –1/2 qp – M –1/2 qq ).

Letting yr = M –1/2 qr allows us to write

pq = (yp – yq )T (yp – yq ) = yp – y2 .

Consequently, the effective resistance between two nodes in a network is a


squared Euclidean distance. In fact, it is known in the literature as the resistance
distance between the corresponding nodes.

The resistance distance between all pairs of nodes in the network can be rep-
resented in a matrix form, namely as the resistance matrix . This matrix can be
written as

 = elT + leT – 2(L + (1/n)E)–1 , (8.28)

where l is the diagonal of the inverse of L + (1/n)E.

Example 8.1

Let us compare the shortest path distance and the resistance distance for the
network illustrated in Figure 8.3. These are
⎡ ⎤ ⎡ ⎤
0 1 2 2 1 0 0.727 1.182 0.909 0.727
⎢ 0 1 1 2⎥ ⎢ 0.636 0.545 0.909⎥
⎢ ⎥ ⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥
D=⎢ 0 1 2⎥ and  = ⎢ 0 0.636 1.182⎥ .
⎢ ⎥ ⎢ ⎥
⎣ 0 1⎦ ⎣ 0 0.727⎦
0 0

Although the pairs (1, 3) and (1, 4) are the same distance apart according
to the shortest path distance, the second pair is closer according to the resist-
ance distance. The reason is that 1 and 4 are part of a square in the network
while the smallest cycle 1 and 3 are part of is a pentagon. In fact, the short-
est resistance distance is between the nodes 2 and 4, which is the only pair
of nodes which are part of a triangle and a square at the same time. Thus,
the resistance distance appears to take into account not only the length of the
shortest path connecting two nodes, but also the cycles involving them.
94 Classical Physics Analogies
..................................................................................................

FURTHER READING

Doyle, P.G. and Snell, J.L., Random Walks and Electric Networks, John Wiley and
Sons, 1985.
Estrada, E. and Hatano, N., A vibrational approach to node centrality and
vulnerability in complex networks, Physica A 389:3648–3660, 2010.
Klein, D.J. and Randić, M., Resistance distance, Journal of Mathematical Chem-
istry 12:81–95, 1993.
Susskind, L. and Hrabovsky, G., Classical Mechanics: The Theoretical Minimum,
Penguin, 2014.
Degree Distributions
9
In this chapter
9.1 Motivation 95
We start by introducing the concept of degree distribution. We analyse some
of the most common degree distributions found in complex networks, such 9.2 General degree distributions 95

as the Poisson, exponential, and power-law degree distributions. We explore 9.3 Scale-free networks 97
some of the main problems found when fitting real-world data to certain kinds Further reading 100
of distributions.

9.1 Motivation
The study of degree distributions is particularly suited to the analysis of complex
networks. This kind of statistical analysis of networks is inappropriate for the
small graphs typically studied in graph theory. The aim is to find the best fit
for the probability distribution of the node degrees in a given network. From
a simple inspection of adjacency matrices of networks one can infer that there
are important differences in the way degrees are distributed. In this chapter we
introduce the tools which allow us to analyse these distributions in more detail.

9.2 General degree distributions


Let p(k) = n(k)/n, where n(k) is the number of nodes of degree k in a network
of size n. Thus p(k) represents the probability that a node selected uniformly at
random has degree k and we can represent the degree distribution of the network
by plotting p(k) against k.
What should we expect to find when we look at degree distributions? Consider
a network with n nodes that is generated at random so that an edge between two
nodes exists with probability p. The average degree will be np and the probability
that a particular node v has degree k will follow a binomial distribution
" #
n–1 k
Pr(deg(v) = k) = p (1 – p)n – 1 – k .
k

For large n (and np = k̄ constant) one finds in the limit that node degree
follows a Poisson distribution

k̄k e–λ
Pr(deg(v) = k) → .
k!
96 Degree Distributions

As λ grows, this distribution in turn becomes a normal distribution.


Another way we could imagine a network emerging is that we start with a
single node and nodes are added one at a time and attach themselves randomly to
existing nodes. The newer the node, the lower its expected degree and one finds
that a network generated in this way will have an exponential degree distribution

Pr(deg(v) = k) = Ae–k/k̄ .

Many real networks can be found that have degree distributions similar to
those illustrated. But there is another distribution one frequently sees that deserves
attention. That is the power-law distribution

Pr(deg(v) = k) = Bk–γ .

One can show that such a distribution emerges in a random graph where as
new nodes are added they attach preferentially to nodes with high degree. This is
a model of popularity, and is often the predicted behaviour one expects to see in
social networks.
If a large network has a power-law degree distribution then one can expect to
see many end vertices and other nodes of low degree and a handful of very well
connected nodes.
In Figure 9.1 we illustrate some common distributions found in complex
networks.

0.14 0.02

0.12 0.016
0.1
0.012
0.08
p(k)

p(k)

0.06 0.008
0.04
0.004
0.02
0 0
0 5 10 15 20 25 30 35 40 0 10 20 30 40 50 60 70 80 90100
k k
k
e−k k (b) p(k) = 1 2 2
e−(k−k) / (2σk)
(a) p(k) = √2πσk
k!
0.02
0.1
0.016
0.08

0.012 0.06
p(k)

p(k)

0.008 0.04

0.02
0.004
0
Figure 9.1 Common examples of de- 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30 35 40

gree distributions found in complex k k

networks (c) p(k) =Ae−k/k (d) p(k) = Bk−γ


Scale-free networks 97

0.5000 0.5000

0.0500 0.0500
ln p(k)

ln p(k)
0.0050
0.0050

Best-fit exponential distribution


0.0005 Best-fit power-law distribution
0.0005
1 5 25 75 10 20 30 40 506070
ln k ln k
(a) Probability degree distribution (b) Fittings of the distribution

Figure 9.2 Degree distribution of a real-world network

The degree distributions of real-world networks, however, do not look so


smooth as the ones illustrated in Figure 9.1 and there are a number of difficul-
ties in fitting the best model to describe the experimental data. For example, the
relationship of p(k) and k can be erratic (see Figure 9.2a); there may not be suf-
ficient data to fit a statistically significant model; or there may be many statistical
distributions to which the same dataset can be fitted (see Figure 9.2b).
Often the data are most erratic for large k. There are two main approaches
that can be used to reduce the noisy effect in the tail of probability distributions.
The first is to build a histogram in which the bin sizes increase exponentially with
degree. The second is to consider the cumulative distribution function (CDF)
defined as

k
P  (k) = p(k ). (9.1)
k =1

This represents the probability of choosing at random a node with degree


smaller than or equal to k.
Common CDFs for degree distributions are
k
 k̄i
Poisson: P  (k) = e–k̄ .
i=1
i

Exponential: P  (k) = 1 – e–k/k̄ .


Power-law: P  (k) = 1 – Bk–γ +1 .

9.3 Scale-free networks


Networks with power-law degree distributions are usually referred to as ‘scale-
free’. If p(k) = Ak–γ then scaling the degree by a constant factor c produces a
proportionate scaling of the probability distribution
98 Degree Distributions

p(k, c) = A(ck)–γ = c–γ p(k).

Power-law relationships are usually represented on a logarithmic scale since


the distribution then appears as the straight line

ln p(k) = –γ ln k + ln A,

where –γ is the slope and ln A the intercept of the function. Scaling by a constant
factor c only alters the intercept and the slope is preserved so

ln p(k, c) = –γ ln k + ln A – γ ln c

The existence of a scaling law in a system means that the phenomenon under
study will reproduce itself on different time and/or space scales. That is, it has
self-similarity.
In network theory we often use the quantity


P(k) = p(k ) = 1 – P  (k) + p(k). (9.2)
k ≥k

in place of the standard CDF. It represents the probability of choosing at random


a node with degree larger than or equal to k. For example, if p(k) = Ak–γ then

 ∞
P(k) = Ak– γ dk = Bk–γ +1 ,
k

for some constant B. For a power-law or an exponential distribution (P(k) = e–k/k̄ )


it is easier to use P(k) and in this book we will always use it as the CDF in place
of P  (k) unless stated otherwise. In Figure 9.3 we illustrate the CDF for the PDF
displayed in Figure 9.2a.
All sorts of power-law degree distributions have been ‘observed’ in the litera-
ture. Some of these distributions have been subsequently reviewed and better fits
with other probability distributions have been found in some cases. The more
generic term of ‘fat-tail’ degree distribution has been proposed to encompass a
range of distributions which including power-law, log-normal, Burr, log-Gamma,
and Pareto.

Problem 9.1
A representation of the internet as an autonomous system gives a network with
4,885 nodes. The CDF of its degree distribution is illustrated in Figure 9.4 on a
log–log plot. The slope of the best fitting straight line is –1.20716 and the intercept
is –0.11811.
Scale-free networks 99

0.5000

0.0500
ln P(k)

0.0050

0.0005

Figure 9.3 Cumulative degree distri-


1 5 25 75 500 bution of the network whose PDF is
ln k illustrated in Figure 9.2a

100

10–1
P(k)

10–2

10–3

10–4
100 101 102 103 104 Figure 9.4 The CDF of an internet’s
k degree distribution

1. Calculate the percentage of pendant nodes in this network.


2. How many nodes are expected to have degree greater than or equal to 200?
3. Assuming that the network has a unique node of maximal degree, kmax ,
calculate the expected value of kmax .

Recalling that the cumulative degree P(k) represents the probability of finding
a node with degree larger at least k we note that

P(k) = 0.7619k–1.20716 . (9.3)

The number of nodes with degree at least one is nP(1) (since P(k) = n(k)/n,
where n(k) is the number of nodes with degree at least k) and

nP(1) = 4885 × 0.7619 = 3722. (9.4)


100 Degree Distributions

The number of nodes with degree of at least two is

nP(2) = 4885 × 0.7619 × 2–1.20716 = 1612. (9.5)

Consequently, the number of pendant nodes is 3722 – 1612 = 2110 or 43.2%.


The number of nodes expected to have degree of at least 200 is

nP(200) = 4885 × 0.7619 × 200–1.20716 = 6. (9.6)

Assuming there is only one such node with degree equal to kmax then its degree
will satisfy the equation nP(kmax ) = 1. So

0.7619k–1.20716 = n–1

or

log(0.7619n–1 )
log kmax = – .
–1.20716

Hence kmax = 908.

..................................................................................................

FURTHER READING

Caldarelli, G., Scale-Free Networks: Complex Webs in Nature and Technology,


Oxford University Press, 2007.
Clauset, A., Rohilla Shalizi, C., and Newman, M.G.J., Power-law distributions in
empirical data, SIAM Review, 51:661–703, 2010.
Newman, M.E.J., Networks: An Introduction, Oxford University Press, 2010,
Chapter 8.
Clustering Coefficients
of Networks
10
10.1 Motivation 101
In this chapter 10.2 The Watts–Strogatz clustering
coefficient 101
We introduce the notion of the clustering coefficient. We define the Watts–
10.3 The Newman clustering
Strogatz and the Newman clustering coefficients, give some of their mathem- coefficient 102
atical properties, and compare them.
Further reading 104

10.1 Motivation
Many real-world networks are characterized by the presence of a relatively large
number of triangles. This characteristic feature of a network is a general conse-
quence of high transitivity. For instance, in a social network it is highly probable
that if Bob and Phil are both friends of Joe then they will eventually be introduced
to each other by Joe, closing a transitive relation, i.e. forming a triangle. Our rela-
tive measure is between the proportion of triangles existing in a network and the
potential number of triangles it can support given the degrees of its nodes. In this
chapter we study two methods of quantifying this property of a network, known
as its clustering coefficient.

10.2 The Watts–Strogatz clustering


coefficient
The first proposal for a clustering coefficient was put forward by Watts and
Strogatz in 1998. If we suppose that the clustering of a node is proportional to

number of transitive relations of node i


Ci = (10.1)
total number of possible transitive relations of node i

and ti designates the number of triangles attached to node i of degree ki then

ti 2ti
Ci = = . (10.2)
ki (ki – 1)/2 ki (ki – 1)
102 Clustering Coefficients of Networks

Thus the average clustering coefficient of the network is

1 
C= Ci . (10.3)
n i

Example 10.1

Consider the network illustrated in Figure 10.1.

5 1 2 6

8 4 3 7

Figure 10.1 A network used to cal-


culate a clustering coefficient

Nodes 1 and 3 are equivalent. They both take part in two triangles and
their degrees is 4. Thus,

2 · (2) 1
C1 = C3 = = . (10.4)
4·3 3

In a similar way, we obtain

1
C2 = C4 = . (10.5)
3

Notice that because nodes 5–8 are not involved in any triangle we have,
Ci ≥ 5 = 0. Consequently,
" #
1 4 1
C= = . (10.6)
8 3 6

10.3 The Newman clustering coefficient


Another way of quantifying the global clustering of a network is by means
of the Newman clustering coefficient, also known as the transitivity index of
the network. Let t = |C3 | be the total number of triangles, and let|P2 | be
the number of paths of length two in the network (representing all potential
three-way relationships). Then,

3t 3|C3 |
C= = . (10.7)
|P2 | |P2 |
The Newman clustering coefficient 103

Example 10.2

Consider again the network illustrated in Figure 10.1. We can obtain the
number of triangles in that network by using the spectral properties of the
adjacency matrix. That is,

1
t= tr(A3 ) = 2. (10.8)
6

The number of paths of length two in the network can be obtained using the
following formula (which we will justify in Chapter 13).

n " #
 n
k1 ki (ki – 1)
|P2 | = = = 18. (10.9)
i=1
2 i=1
2

Thus,

3×2 1
C= = . (10.10)
18 3

Problem 10.1
Figure 10.2 A network formed by tri-
Consider the network illustrated in Figure 10.2. Obtain an expression for the
angles joined at a central node. The
average clustering, C, and the network transitivity, C, in terms of the number of
dashed line indicates the existence of
nodes n. Analyse your results as n → ∞.
other triangles.
To answer this problem, observe that there are two types of nodes in the net-
work, which will be designated as i and j (see Figure 10.3). There is one node of
type i and n – 1 nodes of type j.
The average clustering coefficient is then,
Ci + (n – 1)Cj
C= . (10.11)
n
Evidently, Cj = 1 and
2t
Ci = , (10.12)
ki (ki – 1)

where t is the number of triangles in the network (note that node i is involved in
Figure 10.3 A labelling of Figure 10.2
all of them) and ki is the degree of that node. It is easy to see that ki = 2t = n – 1.
to indicate nodes with different properties
Then,
2t 1 1
Ci = = = (10.13)
2t(2t – 1) 2t – 1 n – 2
104 Clustering Coefficients of Networks

and
" #
1
+ (n – 1) · 1
Ci + (n – 1)Cj n–2 1 n–1
C= = = + . (10.14)
n n n(n – 2) n

Now, for the Newman transitivity coefficient we have


1

 " kt #
0.9
0.8 1 (n – 1) 2t(2t – 1)
|P2 | = = kt (kt – 1) = (2 × 1) +
0.7
t
2 2 t 2 2
0.6
= 2t + t(2t – 1) = t(2t + 1). (10.15)
0.5
C

0.4
Thus,
0.3
0.2 3t 3t 3 3
0.1 C= = = = . (10.16)
|P2 | t(2t + 1) 2t + 1 n
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
C As the number of nodes tends to infinity we have:

Figure 10.4 Correlation between 1 n–1


lim C = lim + lim = 0 + 1 = 1, (10.17)
Watts–Strogatz (C) and Newman (C) n→∞ n(n – 2) n→∞ n
n→∞

clustering coefficients for 20 real-world 3


lim C = lim = 0. (10.18)
networks n→∞ n→∞ n

This indicates that the indices are accounting for different structural charac-
teristics of a network.

In general, the Watts–Strogatz index quantifies how clustered a network is lo-


cally, while the Newman index indicates how clustered the network is as a whole.
Often there is a good correlation between both indices for real-world networks as
illustrated in Figure 10.4.

..................................................................................................

FURTHER READING

Estrada, E., The Structure of Complex Networks: Theory and Applications, Oxford
University Press, 2011, Chapter 4.5.1.
Newman, M.E.J., Networks: An Introduction, Oxford University Press, 2010,
Chapter 7.9.
Random Models
of Networks 11
11.1 Motivation 105
In this chapter 11.2 The Erdös–Rényi model of
random networks 105
We introduce simple and general models for generating random networks:
11.3 The Barabási–Albert model 108
the Erdös–Rényi model, the Barabási–Albert model, and the Watts–Strogatz
11.4 The Watts–Strogatz model 110
model. We study some of the general properties of the networks generated by
using these models, such as their densities, average path length, and clustering Further reading 113

coefficient, as well as some of their spectral properties.

11.1 Motivation
Every time that we look at a real-world network and analyse its most important
topological properties it is worth considering how that network was created. In
other words, we have to figure out what are the mechanisms behind the evolution
of a group of nodes and links which give rise to the topological structure we
observe.
Intuitively we can think about a model in which pairs of nodes are connected
with some probability. That is, if we start with a collection of n nodes and for
each of the n(n – 1)/2 possible links, we connect a pair of nodes u, v with certain
probability pu,v . Then, if we consider a set of network parameters to be fixed
and allow the links to be created by a random process, we can create models
that permit us to understand the influence of these parameters on the structure
of networks. Here we study some of the better known models that employ such
mechanisms.

11.2 The Erdös–Rényi model of random


networks
In this model, put forward by Erdös and Rényi in 1959, we start with n isolated
nodes. We then pick a pair of nodes and with probability p > 0 we add a link
between them. In practice we fix a parameter value p from which we generate the
network. For each pair of nodes we generate a random number, r, uniformly from
106 Random Models of Networks

(a) p = 0 (b) p = 0.106

Figure 11.1 Erdös–Rényi random net-


works for different probabilities p (c) p = 0.265 (d) p = 1

[0, 1] and if p > r we add a link between them. Consequently, if we select p = 0


the network will remain fully disconnected forever and if p = 1 we end up with
a complete graph. In Figure 11.1 we illustrate some examples of Erdös–Rényi
random networks with 20 nodes and different linking probabilities.
The Erdös–Rényi (ER) random network is written as either GER (n, m) or
GER (n, p). A few properties of the random networks generated by this model are
summarized below.
0.08
0.07 n(n – 1)p
1. The expected number of edges per node is m̄ = .
0.06 2
0.05
2. The expected node degree is k̄ = (n – 1)p.
e–k̄ k̄k
p(k)

0.04 3. The degrees follow a Poisson distribution p(k) = as illustrated in


0.03 k
Figure 11.2 for ER random networks with 1,000 nodes and 4,000 links.
0.02 The solid line is the expected distribution and the dots represent the
0.01 values for the average of 100 realizations.
0
20 30 40 50 60 4. The average path length for large n is
k
ln n – γ 1
l̄(G) = + , (11.1)
Figure 11.2 Empirical Erdös–Rényi ln( pn) 2
degree distribution where γ ≈ 0.577 is the Euler–Mascheroni constant.
The Erdös–Rényi model of random networks 107

5. The average clustering coefficient is C = p. 1000


900
6. As p increases, most nodes tend to be clustered in one giant compo-

in giant component
800

Number of nodes
nent, while the rest of nodes are isolated in very small components. In 700
Figure 11.3 we illustrate the change of the size of the main connected 600
component in an ER random network with 1,000 nodes, as a function of 500
400
the linking probability.
300
7. The structure of GER (n, p) changes as a function of p = k̄/(n – 1), giving 200
rise to the following three stages. 100
0
(a) Subcritical k̄ < 1, where all components are simple and very small. 0 0.5 1 1.5 2 2.5 3
The size of the largest component is S = O(ln n). p × 10–3

(b) Critical k̄ = 1, where the size of the largest component is Figure 11.3 Connectivity of Erdös–
S = O(n2/3 ). Rényi random networks
(c) Supercritical k̄ > 1, where the probability that ( f –ε)n < S < ( f +ε)n
is 1 when n → ∞, ε > 0, and where f = f (k̄) is the positive solution
of the equation e–k̄f = 1 – f . The rest of the components are very
small, with the second largest having size about ln n.
In Figure 11.4 we illustrate this behaviour for an ER random network with
100 nodes and different linking probabilities. The nodes in the largest
connected component are drawn in a darker shade.
8. The largest eigenvalue of the adjacency matrix in an ER network grows
λ1 (A)
proportionally to n so that lim = p.
n→∞ n
9. The second largest eigenvalue grows more slowly than λ1 . In fact,
λ2 (A)
lim =0
n→∞ nε
for every ε > 0.5.
10. The most negative eigenvalue grows in a similar way to λ2 (A). Namely,
λn (A)
lim =0
n→∞ nε
for every ε > 0.5.

Figure 11.4 Emergence of a giant


component in Erdös–Rényi random
(a) p = 0.0075, S = 8 (b) p = 0.01, S = 21 (c) p = 0.025, S = 91 networks
108 Random Models of Networks

11. The spectral density of an ER random network follows Wigner’s semi-


circle law. That is, almost all of the eigenvalues
% of an ER random network
0.6
lie in the range [–2r, 2r] where r = np(1 – p) and within this range the
0.5 density function is given by
0.4

4 – λ2
ρ ( λ) r

0.3 ρ(λ) = .

0.2

0.1 This is illustrated in Figure 11.5.

0.0
−3 −2 −1 0 1 2 3
λ/r
11.3 The Barabási–Albert model
Figure 11.5 Spectral density for a net-
work generated with the ER model The ER model generates networks with Poisson degree distributions. However,
it has been empirically observed that many networks in the real-world have a fat-
tailed degree distribution of some kind, which varies greatly from the distribution
observed for ER random networks. A simple model to generate networks in which
the probability of finding a node of degree k decays as a power law of the degree
was put forward by Barabási and Albert in 1999. We initialize with a small network
with m0 nodes. At each step we add a new node u to the network and connect it
to m ≤ m0 of the existing nodes v ∈ V . The probability of attaching node u to
node v is proportional to the degree of v. That is, we are more likely to attach new
nodes to existing nodes with high degree. This process is known as preferential
attachment.
We can assume that our initial random network is connected and of ER type
with m0 nodes, GER = (V , E). In this case the Barabási–Albert (BA) algorithm
can be understood as a process in which small inhomogeneities in the degree
distribution of the ER network grow in time. A typical BA network is illustrated
in Figure 11.6.
Networks generated by this model have several global properties

1. The probability that a node has degree k ≥ d is given by

2d(d – 1)
p(k) = ≈ k–3 . (11.2)
k(k + 1)(k + 2)

That is, the distribution is close to a power law as illustrated in


Figure 11.7.
Figure 11.6 A Barabási–Albert net- 2. The cumulative degree distribution is P(k) ≈ k–2 , illustrated on a log–log
work with n = 20 and m = 4 scale in Figure 11.8
3. The expected value for the clustering coefficient, C, approximates
d – 1 log2 n
as n → ∞.
8 n
The Barabási–Albert model 109

0.3 100

0.25
10–1
0.2

PDF, P (k)
PDF, P(k)

0.15 10–2

0.1
10–3
0.05

0 10–4 0
50 100 150 200 10 101 102 103
Degree, k Degree, k

(a) Linear scale (b) log –log scale

Figure 11.7 The characteristic power-law degree distribution of a BA network

100

10–1
CDF, P (k)

10–2

10–3
Figure 11.8 Cumulative degree distri-
100 101 102 bution for a network generated with the
Degree, k BA model

4. The average path length is given by

ln n – ln(d/2) – 1 – γ 3
l̄ = + , (11.3)
ln ln n + ln(d/2) 2

where again γ is the Euler–Mascheroni constant. Comparing (11.3)


with (11.1) we find that for the same number of nodes and average
degree, BA networks have smaller average path length than their ER
analogues. We illustrate this in Figure 11.9a, which shows the change
in the average path length of random networks created with the BA and
ER models as the number of nodes increases.
110 Random Models of Networks

(a) 6 (b) 0.6

5.5
0.5
5 ER networks
Average path length

4.5 0.4

ρ(λ) r
0.3
3.5
BA networks 0.2
3

2.5 0.1

2
0.0
1.5 −4 −2 0 2 4
0 0.5 1 1.5 2
×105 λ/r
Number of nodes

Figure 11.9 (a) Comparison of the small-worldness of BA and ER networks. (b) Spectral density of a model BA network

5. The density of eigenvalues follows a triangle distribution




⎨ (λ + 2)/4, –2 ≤ λ/r ≤ 0,
ρ(λ) = (2 – λ)/4, 0 ≤ λ/r ≤ 2, (11.4)

⎩ 0, otherwise.

This is illustrated in Figure 11.9b.

The BA model can been generalized to fit general power-law distributions


where the probability of finding a node with degree k decays as a negative power
of the degree: p(k) ∼ k–γ .

11.4 The Watts–Strogatz model


The phrase ‘six degrees of separation’ is commonly used to express how surpris-
ingly closely connected we are to each other in terms of shared acquaintances. The
phrase originates from a famous experiment in network theory. Stanley Milgram
carried out the experiment in 1967. He asked some randomly selected people in
the US cities of Omaha (Nebraska) and Wichita (Kansas) to send a letter to a
target person who lived in Boston (Massachusetts) on the East Coast. The rules
stipulated that the letter should be sent to somebody the sender knew person-
ally. Although the senders and the target were separated by about 2,000 km and
there were 200 million inhabitants in the USA at the time, Milgram found two
The Watts–Strogatz model 111

characteristic effects. First, the average number of steps needed for the letters to
arrive to its target was around six. And second, there was a large group inbreed-
ing, which resulted in acquaintances of one individual feeding a letter back into
his/her own circle, thus usually eliminating new contacts.
Although the ER model reproduces the first characteristic very well, i.e. that
most nodes are separated by a very small average path length, it fails in reprodu-
cing the second. That is, the clustering coefficient in the ER network is very small
in comparison with those observed in real-world systems. The model put forward
by Watts and Strogatz in 1998 tries to sort out this situation.
First we form the circulant network with n nodes connected to k neighbours.
We then rewire some of its links: each of the original links has a probability p
(fixed beforehand) of having one of its end points moved to a new randomly
chosen node. If p is too high, meaning almost all links are random, we approach
the ER model.
The general process is illustrated in Figure 11.10. On the left is a circulant
graph and on the right is a random ER network. Somewhere in the middle are the
so-called ‘small-world’ networks.
In Figure 11.11 we illustrate the rewiring process, which is the basis of the
Watts–Strogatz (WS) model for small-world networks. Starting from a regular
circulant network with n = 20, k = 6 links are rewired with different choices of
probability p.
Networks generated by the WS model have several general properties, listed
below.

3(k – 2)
1. The average clustering coefficient is given by C = . For large
4(k – 1)
values of k, C approaches 0.75.
2. The average path length decays very fast from that of a circulant graph,

(n – 1)(n + k – 1)
l̄ = , (11.5)
2kn
to approach that of a random network. In Figure 11.12 we illustrate the
effect of changing the rewiring probability on both the average path length

Figure 11.10 Schematic representation


of the Watts–Strogatz rewiring process
112 Random Models of Networks

(a) p = 0.00 (b) p = 0.10

Figure 11.11 Watts–Strogatz random


networks for different rewiring probabil-
ities p (c) p = 0.50 (d) p = 0.95

and clustering coefficient on random graphs generated using the WS


model with 100 nodes and 5,250 links.
1
0.9
Problem 11.1
0.8
Normalised parameter

Let GER (n, p) be an Erdös–Rényi random network with n nodes and probability
0.7
Cp/C0 p. Use known facts about the spectra of ER random networks to show that if k̄ is
0.6
0.5
the average degree of this network then the expected number of triangles tends to
Ip/I0
0.4 k̄3 /6 as n → ∞.
0.3
0.2 For any graph the number of triangles is given by
0.1
1  3
n
0 1
10–2 10–1 100 t= tr(A3 ) = λ . (11.6)
Rewiring probability
6 6 j=1 j

Figure 11.12 Changes in network stat- In an ER graph we know that


istics for Watts–Strogatz graphs
λ2 λn
lim λ1 = np, lim = 0, lim ε = 0, (11.7)
n→∞ n→∞ nε n→∞ n
The Watts–Strogatz model 113

and, since |λi | ≤ max{|λ2 |, |λn |} for i ≥ 1, Table 11.1 Degree frequencies in an
example network.
1 3 1
lim t = λ = (np)3 . (11.8)
n→∞ 6 1 6 k n(k)

Since k̄ = p(n – 1), as n → ∞, 4 343


5 196
k̄3
t→ . (11.9)
6 10 23

Problem 11.2 20 3
The data shown in Table 11.1 belong to a network having n = 1,000 nodes and
m = 4,000 links. The network does not have any node with k ≤ 3. Let n(k) be the Table 11.2 Probability distribution of
number of nodes with degree k. Determine whether this network was generated degrees in an example network.
by the BA model.
The probability that a node chosen at random has a given degree is shown in k p(k)
Table 11.2.
4 0.343
A sketch of the plot of k against p(k) in Figure 11.13 indicates that there is
a fast decay of the probability with the degree, which is indicative of fat-tailed 5 0.196
degree distributions, like the one produced by the BA model. 10 0.023
If the network was generated with the BA models it has to have a PDF of the
form p(k) ∼ k–3 which means that ln p(k) ∼ –3 ln k + b. 20 0.003
Given two degree values k1 and k2 , the slope of a log–log plot is given by

ln( p(k2 )) – ln( p(k1 ))


m= . (11.10) 0.35
ln k2 – ln k1
0.3
Using data from the table, 0.25
PDF, p(k)

ln(0.003) – ln(0.196) 0.2


m= = –3.0149 ≈ –3, (11.11)
ln 20 – ln 5 0.15

indicative of a network generated by the BA model. 0.1

0.05

0
.................................................................................................. 0 5 10 15 20 25
Degree, k
FURTHER READING
Figure 11.13 Degree distribution in an
Barabási, A.-L. and Albert, R., Emergence of scaling in random networks, Science illustrative network
286:509–512, 1999.
Bollobás, B., Mathematical results on scale-free random graphs, in Bernholdt, S.
and Schuster, H.G. (eds.), Handbook of Graph and Networks: From the Genome
to the Internet, Wiley-VCH, 1–32, 2003.
Bollobás, B., Random Graphs, Cambridge University Press, 2001.
Watts, D.J., Strogatz, S.H., Collective dynamics of ‘small-world’ networks, Nature
393:440–442, 1998.
Matrix Functions
12
In this chapter
12.1 Motivation 114
As we have seen, we can analyse networks by understanding properties of
12.2 Matrix powers 114 their adjacency matrices. We now introduce some tools for manipulating
12.3 The matrix exponential 118 matrices which will assist in a more detailed analysis, namely functions of
12.4 Extending matrix powers 121 matrices. The three most significant in terms of networks will be matrix
12.5 General matrix functions 123 polynomials, the resolvent, and the matrix exponential and we give a brief
Further reading 128 introduction to each. We then present elements of a unifying theory for func-
tions of matrices and give examples of some familiar scalar functions in an
n × n dimensional setting.

12.1 Motivation
Amongst one’s earliest experiences of mathematics is the application of the basic
operations of arithmetic to whole numbers. As we mature mathematically, we see
that it is natural to extend the domain of these operations. In particular, we have
already exploited the analogues of these operations to develop matrix algebra.
Now we look at some familiar functions as applied to matrices. We will exploit
these ideas later to better understand networks, for example to develop measures
of centrality in Chapter 15 and to measure global properties of networks in Chap-
ters 17 and 18. We start with some familiar ideas that will prove vital in defining
a comprehensive generalization.
Note that we will only consider functions of square matrices. Throughout this
chapter, assume that A ∈ Rn×n unless otherwise stated.1

12.2 Matrix powers


We saw matrix powers, Ap when p ∈ N, when we looked at walks in a network.
1 Almost all results generalize immedi- In this case, Ap is defined simply as the product of p copies of A. To extend the
ately to complex matrices. definition, we first let A0 = I if A = O.
Matrix powers 115

Example 12.1

If D = diag(λ1 , λ2 , . . . , λn ) then Dp = diag(λp1 , λp2 , . . . , λpn ).

In many applications we need to form Ap for a whole series of values of p.


Theoretically, it may be desirable to consider an infinite sequence of powers. The
asymptotic behaviour of matrix powers is best understood by looking at eigen-
values. Suppose that A = XJX –1 where J is the Jordan canonical form of A.
Then
( )p
Ap = XJX –1 = XJX –1 XJX –1 XJ · · · JX –1 = XJ p X –1

and the asymptotic behaviour of powers can be understood by considering the


powers of J. Assuming that A is simple (which is certainly true for matrices arising
from simple networks),
⎡ p ⎤
λ1
⎢ λp2 ⎥
⎢ ⎥ –1
p
A =X⎢ ⎢ ⎥X .
.. ⎥
⎣ . ⎦
λpn

Clearly, lim Ap = O if and only if ρ(A) < 1 and if ρ(A) > 1 then the powers
p→∞
of A grow unboundedly.
If ρ(A) = 1 and A is simple then the limiting behaviour is trickier to pin down.
While the powers do not grow explosively, lim Ap exists if and only if the only
p→∞
eigenvalue of size one is one itself.
As implied by the following example, things are more complicated for defective
matrices.

Examples 12.2
 
λ 1
(i) Let J = . Then
0 λ

     
2 λ2 2λ 3 λ3 3λ2 p λp pλp – 1
J = 2 , J = 3 , J = .
0 λ 0 λ 0 λp

Note that if |λ| < 1 then limp → ∞ pλp – 1 = 0 and the powers converge. If |λ| ≥ 1 they diverge.

continued
116 Matrix Functions

Examples 12.2 continued

(ii)
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 0 0 0 1 0 0 0 0 1
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 0 1 0⎥ ⎢0 0 0 1⎥ ⎢0 0 0 0⎥
Let J = ⎢ ⎥ . Then J 2 = ⎢ ⎥ , J3 = ⎢ ⎥ , J 4 = O.
⎣0 0 0 1⎦ ⎣0 0 0 0⎦ ⎣0 0 0 0⎦
0 0 0 0 0 0 0 0 0 0 0 0

Notice that ρ( J) = 0 . If all the eigenvalues of a matrix A are zero then it is said to be nilpotent and we can show
that Ap = O for p ≥ n.

Once we have defined powers, we can extend the definition of polynomial


functions to take matrix arguments. If pm (x) = a0 + a1 x + · · · + am xm is a degree m
polynomial then we define

pm (A) = a0 I + a1 A + · · · + am Am .

pm (A) is defined for all square matrices.

Example 12.3
     
1 1 1 3 0 –3
(i) Let p(x) = 1 – x2 and A = . Then p(A) = I – = .
0 2 0 4 0 –3
(ii) Recall that the characteristic polynomial of A is given by p(z) = det(A – zI ). This is a degree n polynomial with the
property that p(z) = 0 ⇐⇒ z ∈ σ (A). The Cayley–Hamilton theorem tells us that p(A) = O.

If p(x) and q(x) are polynomials then we can define the rational function
r(x) = p(x)/q(x) so long as q(x) = 0. It is natural to write.

r(A) = p(A)q(A)–1 = q(A)–1 p(A),

where q(A)–1 is the inverse matrix of q(A). This is well defined so long as q(A) is
nonsingular.

12.2.1 Resolvent matrix


We are going to need to consider infinite series of matrix powers to get a general
understanding of matrix functions. A good place to start is with geometric series.
Matrix powers 117

Examples 12.4

(i) If z ∈ C and z = 1 then

1 – zp + 1
1 + z + z2 + · · · + zp = .
1–z
If |z| < 1 then



zi = (1 – z)–1 .
i=0



The function f (z) = (1 – z)–1 is an analytic continuation of zi to the
punctured disc C – {1}. i=0

(ii) (I + A + A2 + · · · + Ap )(I – A) = I – Ap + 1 . If (I – A)–1 exists then we can


write

I + A + A2 + · · · + Ap = (I – Ap + 1 )(I – A)–1 = (I – A)–1 (I – Ap + 1 ).

(I – A) is invertible so long as 1 is not an eigenvalue of A.


(iii) If ρ(A) < 1 then, since lim Ap = O,
p→∞



Ai = (I – A)–1 .
i=0

If |s| < 1/ρ(A) then ρ(sA) < 1 and hence



s (sA)i = s(I – sA)–1 = (zI – A)–1 ,
i=0

where z = 1/s.

The matrix (zI – A)–1 is defined so long as (zI – A) has no zero eigenvalues,
which is the case so long as z ∈
/ σ (A). Given a matrix A we call

f (z) = (zI – A)–1

its resolvent function. The resolvent function is an analytic continuation of an infin-


ite series of matrix powers to (almost) the whole complex plane. It has a number
of applications in network theory but is also useful in extending the notion of
matrix functions (even though it is not itself a matrix function).
118 Matrix Functions

12.3 The matrix exponential


The matrix exponential arises naturally in the solution of systems of ordinary
differential equations: just as the solution to the scalar differential equation
dx
= kx, x(0) = x0
dt
is x(t) = x0 ekt , so the solution of
dx
= Ax, x(0) = x0 ,
dt
where A is an n × n matrix and x(t) is a vector of length n, is x(t) = et A x0 where
et A is a matrix exponential. This can be defined in a number of ways by general-
izing the usual exponential function. Probably the most natural way to write the
exponential of a matrix is as the limit of the infinite series
∞
Ak
eA = . (12.1)
k=0
k!

The exponential function of A can be defined as




(zA)k
ezA = ,
k=0
k!

for all z ∈ C.
By bounding the size of the terms in this series we can use the comparison test
(extended to matrix series) to show that for any square matrix A and any finite
∞
(zA)k
z ∈ C, the series is convergent.
k=0
k!

Examples 12.5

∞
(zI )k  zk

(i) ezI = = I = ez I .
k=0
k! k=0
k!
(ii) Let D = diag(λ1 , λ2 , . . . , λn ).
⎡ ⎤ ⎡ ⎤
λk1 eλ1
⎢ λk2 ⎥ ⎢ eλ2 ⎥
∞
Dk  1 ⎢

⎥ ⎢ ⎥
eD = = ⎢ ⎥=⎢ ⎥.
k=0
k! k=0
k! ⎢

..
.
⎥ ⎢
⎦ ⎣
..
.


λkn eλn

(iii) If A is symmetric then so is Ak and from (12.1) one can see that e A is
T
symmetric, too. Again, from (12.1), e A = (e A )T .
The matrix exponential 119

Many, but by no means all, of the familiar properties of exponentials ex-


tend to the matrix exponential. The following list is not exhaustive. Most can
be established by simply inserting the appropriate argument into (12.1).

Theorem 12.1 Let A and B be square matrices of the same size and let s, t ∈ C. The
following properties hold for matrix exponentials.

1. eO = I .
d ( t A)
2. e = Aet A .
dt
3. e(s+t)A = esA et A .
4. e A+B = e A eB if and only if AB = BA.
( )–1
5. et A is nonsingular and et A = e–t A .
–1
6. If B is nonsingular then Bet A B–1 = etBAB .
7. eA = lim (I + A/k)k .
( )t
k→∞
8. et A = e A .

Problem 12.1
Show that if AB = BA then e A+B = e A eB .
By a simple induction, ABk = Bk A for any k and hence from (12.1) AetB =
e A. Similarly, XetY = etY X for any combination of X and Y chosen from A, B,
tB

and A + B.
Now let F(t) = et(A+B) e–Bt e–At . By the product rule,

F  (t) = (A + B)e(A+B)t e–Bt e–At + e(A+B)t (–B)e–Bt e–At + e(A+B)t e–Bt (–A)e–At ,

and by commutativity, the right-hand side of this expression is zero. Thus F(t) is
a constant matrix and since F(0) = eO eO eO = I we know that

et(A+B) = et A etB

for all t (in particular when t = 1).

Examples 12.6
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 0 0 1 1 1 12
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
(i) Let J = ⎣0 0 1⎦. Then, J 2 = ⎣0 0 0⎦ and J 3 = O so eJ = ⎣0 1 1 ⎦.
0 0 0 0 0 0 0 0 1
⎡ ⎤
λ 1 0
⎢ ⎥
(ii) If A = ⎣0 λ 1⎦ then A = λI + J (where J is taken from (i)). So
0 0 λ
⎡ ⎤
1 1 12
⎢ ⎥
eA = eλI+J = eλI eJ = eλ ⎣0 1 1 ⎦ .
0 0 1
continued
120 Matrix Functions

Examples 12.6 continued


 
0 1
(iii) Let A = . It is easy to show that Ak = A for all k. So
0 1

 ∞   
∞
A 1 1/k! 1 e–1
A
e =I+ = k=1
∞ = .
k! 0 k=0 1/k! 0 e
k=1

   
0 0 0 1
Write A = D + F where D = ,F = .
0 1 0 0
   
2 D 1 0 F 1 1
Since D is diagonal and F = O, e = and e = .
0 e 0 1
   
1 1 1 e
D F
So e e = F D
and e e = . Neither of these equal e A since DF = FD.
0 e 0 e

Suppose that we have computed the Jordan decomposition of a matrix:


A = XJX –1 . Using Theorem 12.1 we have
–1
et A = etXJX = XetJ X –1 ,

and if we know etJ we can use it to compute etA . In particular, if A is simple we


can take advantage of the fact that computation of the exponential of a diagonal
matrix is particularly straightforward.

Examples 12.7

     –1
5 3 1 1 2 0 1 1
(i) A = = hence
–6 –4 –1 –2 0 –1 –1 –2
     
1 1 e2 0 2 1 2e2 – e–1 e2 – e–1
eA = =
–1 –2 0 e–1 –1 –1 2e–1 – 2e2 2e–1 – e2
 
tA 2e2t – e–t e2t – e–t
and e = .
2e–t – 2e2t 2e–t – e2t
     –1
–5 2 1 2 1 0 1 2
(ii) B = = hence
–15 6 3 5 0 0 3 5
     
B 1 2 e 0 –5 2 6 – 5e 2e – 2
e = = .
3 5 0 1 3 –1 15 – 15e 6e – 5
Extending matrix powers 121

    
A B 2e2 – e–1 e2 – e–1 6 – 5e 2e – 2 –290.36 128.93
(iii) e e = –1 2 = . We can show that
2e – 2e 2e–1 – e2 15 – 15e 6e – 5 278.08 –123.50
 
A+B –1.759 –0.931
e = .
3.910 –2.131

As is the case for matrix powers, the asymptotic behaviour of etA is linked
directly to the spectrum of A. If A is simple then
⎡ ⎤
eλ1 t
⎢ eλ2 t ⎥
⎢ ⎥ –1
et A = X ⎢
⎢ ..
⎥X ,
⎥ (12.2)
⎣ . ⎦
eλn t
where the λi are the eigenvalues of A (and X contains the eigenvectors). We can
infer that if A is simple then lim et A = O if and only if all the eigenvalues of A lie
t→∞
to the left of the imaginary axis in the complex plane; that the limit diverges if any
of the eigenvalues lie strictly to the right, and that things are more complicated
when eigenvalues lie on (but not to the right of) the imaginary axis.
The matrix exponential has a number of roles in network theory. As an
example, consider the following.

Problem 12.2
Show that a network with adjacency matrix A is connected if and only if e A > 0.
If there is a walk of length p between nodes i and j then there is a nonzero in the
(i, j)th element of Ap . If a network is connected then a nonzero must eventually
appear in the sequence

aij , (A2 )ij , (A3 )ij , . . .


∞
Ak
for every i and j. Recall that eA = and a nonzero must therefore appear in
k=0
k!
every element of the series at some point. Since A is nonnegative, every nonzero
entry gives a positive contribution to the final value of eA .
Conversely, if eA > 0 then we can trace every nonzero entry of the exponential
to a nonzero in the series and hence to a path in the network.

12.4 Extending matrix powers


Just as for numbers, the power of a matrix, Ap , can be extended to have meaning
for any p ∈ R (and even for complex powers, too). For negative p we simply use
( )p
the matrix inverse: for p ∈ N, A–p = A–1 . This is well defined so long as A is non-
singular. For rational powers we need to extend the idea of pth roots to matrices.
122 Matrix Functions

12.4.1 The matrix square root


If z ∈ C is nonzero then there are two distinct numbers, ω0 and ω1 such that
ωi2 = z. These two numbers are the square roots of z and ω0 = –ω1 . A naïve
approach to extending the idea of the square root to a matrix A is to look for so-
lutions to the equation X 2 = A. A brief consideration of this problem shows some
of the complications of extending the definition of functions to accept matrix
arguments.

Examples 12.8
 
0 1
(i) Let A = and suppose X 2 = A. Since A is nilpotent then so is X .
0 0
But then X 2 = O, a contradiction, so A has no square roots.
Notice that A is one of the (infinite) solutions to the equation
X 2 = O.
(ii) Suppose A = SDS –1 where D = diag(λ1 , λ2 , . . . , λn ) and let
E = diag(μ1 , μ2 , . . . , μn ) where μi is one of the square roots of λi . Then
( )2
SES–1 = SE 2 S –1 = A.

If λi = 0 then there are two choices for μi and by going through the
various permutations we get (up to) 2n different solutions to X 2 = A.
(iii) If X 2 = A then (–X )2 = A, too.
   
1
1 1 1
(iv) The only two solutions to X 2 = are X = ± 2 .
0 1 0 1

From the examples we see that the matrix equation X 2 = A exhibits an un-
expectedly complicated behaviour compared to the scalar analogue, particularly
for defective matrices.
√ To define a matrix square root function we need to give a

unique value to A. Recall that the principal square root of z ∈ C, written z, is
the solution of ω2 = z with smallest principal argument. If z is real and positive

then so is z.
If A is simple and has the factorization A = XDX –1 where D = diag(λ1 , . . . , λn )
then the principal square root of A is the unique matrix
√ % %
A = X diag( λ1 , . . . , λn )X –1 . (12.3)


If A is defective then the square root function, A, is not defined. If all the
eigenvalues of a simple matrix are real and positive then the same is true of
its principal square root. For most applications, the principal square root is the
General matrix functions 123

right one to take. Having defined the principal square root, there is an obvious
√ 1/p
extension to pth roots by replacing λi in (12.3) with λi .
Combining pth powers, qth roots, and the inverse, one can define a principal
rth power of a matrix for any r ∈ Q. For irrational values of r we need to make use
of the matrix exponential in a similar way to the extension of irrational powers of
scalars.

Example 12.9

If D = diag(λ1 , λ2 , . . . , λn ) then eD = diag(eλ1 , eλ2 , . . . , eλn ). If p ∈ Q then


( D )p
e = diag(eλ1 , eλ2 , . . . , eλn )p = diag(epλ1 , epλ2 , . . . , epλn ) = epD ,

where we take the principal value of (eλi )p in each case.

If A is simple we can extend the last example using the Jordan decomposition
( )p
to show that epA = eA for rational values of p. We can extend the definition of
matrix powers to irrational indices by saying At = etX where eX = A.2

12.5 General matrix functions


Equations (12.1) and (12.2) suggest two ways of extending the definition of any
analytic function to take matrix arguments. Suppose f : C → C is analytic on
some domain with Maclaurin series



f (z) = a k zk
k=0

then define f : Rn×n → Rn×n by



f (A) = ak A k . (12.4)
k=0

Alternatively, if A is simple with Jordan decomposition XDX –1 where


D = diag(λ1 , . . . , λn ) then define

f (A) = X diag( f (λ1 ), . . . , f (λn ))X –1 . (12.5)

There are other ways of extending the definition of scalar functions to matri-
ces, too. We will give just one more. If you have studied complex analysis you may
recall that Cauchy’s integral formula tells us that if f is analytic inside some region
R and a ∈ R then 2 But how do we find X ?
124 Matrix Functions
-
1 f (z)
f (a) = dz
2π i C z–a

where C is a circle in R containing a. For matrix functions this becomes


-
1
f (A) = f (z)(zI – A)–1 dz (12.6)
2π i C

for an appropriate contour C: notice the role of the resolvent function. We


interpret the integral as being over each of the n2 elements of the function
f (z)(zI – A)–1 . But what contour should we use? We need to enclose all the poles
of the resolvent, so C needs to contain all the eigenvalues of A. The contour inte-
gral formula can be useful for reducing the length of proofs of certain theoretical
results and allows matrix functions to be generalized even further to operators.
Recently, computational techniques for computing matrix functions that exploit
the integral formula have been developed. These make use of quadrature rules to
perform the integrations.
If A is a simple matrix then
.∞ /
 

X diag( f (λ1 ), . . . , f (λn ))X –1
= X diag ak λk1 , . . . , ak λkn X –1
k=0 k=0
 


( k )
=X ak diag λ1 , . . . , λn X –1
k

k=0

∞ 

= ak XDk X –1 = ak Ak ,
k=0 k=0

so (12.4) and (12.5) are equivalent.

Problem 12.3
Show that if f (z) is analytic in a region R containing the spectrum of A then (12.5)
and (12.6) are equivalent.
If z ∈
/ σ (A) then (zI – A)–1 is well defined and (zI – A)–1 = X (zI – D)–1 X –1 .
Note that (zI – D)–1 is diagonal with entries of the form 1/(z – λi ). So, since

σ (A) C = ∅,

- "- #
1 1
f (z)(zI – A)–1 dz = X f (z)(zI – D)–1 dz X –1
2πi C 2π i C
⎛ ⎡ ⎤ ⎞
1
⎜- ⎢ z – λ1 ⎥ ⎟
⎜ ⎢ ⎥ ⎟
1 ⎜ ⎢ . ⎥ ⎟ –1
= X ⎜ f (z) ⎢ . . ⎥ dz⎟ X
2π i ⎜ C ⎢ ⎥ ⎟
⎝ ⎣ 1 ⎦ ⎠
z – λn
General matrix functions 125
⎡ - ⎤
1 f (z)
⎢ 2π i dz ⎥
⎢ C z – λ1 ⎥
⎢ .. ⎥ –1
=X⎢ . ⎥X
⎢ - ⎥
⎣ 1 f (z) ⎦
dz
2π i C z – λn
⎡ ⎤
f (λ1 )
⎢ ⎥ –1
=X⎢

..
.
⎥X .

f (λn )

It can be shown using some basic tools of analysis that (12.4) and (12.6) are
also equivalent for defective matrices.3 If J is a ( p + 1) × ( p + 1) defective Jordan
block of the eigenvalue λ then one can show that if we define

⎡ ⎤

f ( p) (λ)
⎢f (λ) f (λ) ...
⎢ p! ⎥ ⎥
⎢ .. ⎥
⎢ . ⎥
f ( J) = ⎢ f (λ) ⎥ (12.7)
⎢ ⎥
⎢ .. ⎥
⎣ . f (λ) ⎦


f (λ)

then the consequent extension of (12.5) is equivalent to the other definitions, too.

Examples 12.10

(i) If f (z) = z then f  (z) is undefined for z = 0.
 
0 1
If J = then (12.7) is undefined, there is no Maclaurin series
0 0

for f (z) and the conditions for the Cauchy integral formula are not met
because any contour enclosing σ ( J) includes zero. Recall that J has no
square root.

(ii) If A = O then we still cannot use (12.4) or (12.6) to define A,
but (12.5) works fine.

If f (z) is analytic at z ∈ σ (A) then the value of f (A) given by our equivalent
definitions is called the primary value of f (A). Applications for values other than
the primary are limited and we will not consider them further.
Using (12.4) we can establish a number of generic properties that any ma- 3 One uses the fact that simple matri-
trix function satisfies. If f is analytic on the spectrum of A then, for example, ces form a dense set amongst all square
f (AT ) = f (A)T and f (XAX –1 ) = Xf (A)X –1 . matrices.
126 Matrix Functions

Problem 12.4
Show that if AB = BA and f is analytic on the spectrum of A and B then
f (A)A = Af (A), f (A)B = Bf (A), and f (A)f (B) = f (B)f (A).
.∞ / .∞ /
 
∞ 
f (A)A = am Am A = Am+1 = A am Am = Af (A).
m=0 m=0 m=0

Also, since Am B = Am–1 AB = Am–1 BA = Am–2 ABA = · · · = BAm ,


.∞ / .∞ /
 
∞ 
∞ 
f (A)B = am Am B = (Am B) = (BAm ) = B am Am = Bf (A).
m=0 m=0 m=0 m=0

Finally, write C = f (B). We have just shown that AC = CA hence

f (A)f (B) = f (A)C = Cf (A) = f (B)f (A).

We finish the chapter by defining matrix versions of some familiar functions.

12.5.1 log A
X is a logarithm of A if eX = A. Suppose A = XDX –1 where D = diag(λ1 , . . . , λn ).
Since A is nonsingular, all the λj are nonzero and if

ξj = log |λj | + iArg(λj ) + 2π kj i,

for any kj ∈ Z then eξj = λj . For any choice of kj let L = diag(ξ1 , . . . , ξn ).


–1
Using (12.5) we can show that eXLX = A. Hence every nonsingular simple ma-
trix has an infinite number of logarithms. We can show that if A is singular then
there are no solutions to eX = A.
Of the infinite number of possibilities for nonsingular matrices, it is very un-
usual to use anything other than the primary logarithm defined by (12.5). To
ensure continuity, one often restricts the definition of the logarithm to matrices
without eigenvalues on the negative real axis. The matrix logarithm has a number
of applications when coupled with the matrix exponential. As log z is not defined
at zero, we cannot use (12.4) but we can derive power series representations by
using formulae for the scalar logarithm.

Examples 12.11

(i) If |z| < 1 then

z2 z3
log(1 + z) = z + + + ··· .
2 3
General matrix functions 127

If ρ(A) < 1 and A is nonsingular then one can show that

1 2 1 3
log(I + A) = A + A + A + ··· .
2 3

(ii) Suppose X = log A, Y = log B, and AB = BA. By Theorem 12.1,


XY = YX and so by Theorem 12.1 eX eY = eX +Y .
Therefore if AB = BA, AB = elog A+log B . Taking logs of each side gives

log(AB) = log A + log B.

Strictly speaking, we have not shown that each of the logs in this
expression is the primary value, but there is a large class of matrices for
which this is true.
(iii) Let X = log A. By Theorem 12.1 Ap = (eX )p = epX .
Taking logs of each side gives

log Ap = p log A.

In particular, log A–1 = – log A.

12.5.2 cos A and sin A


We can define cos A and sin A for any matrix using the Maclaurin series for the
scalar functions. Alternatively, we can use Euler’s formula

eiθ = cos θ + i sin θ

and write
1 iA 1
cos A = (e + e –i A ), sin A = (eiA – e–iA ).
2 2i
We can use these identities to show that many trigonometric identities still hold
when we use matrix arguments. For example, the addition formulae for cosine and
sine are true for matrices so long as the matrices involved commute.

Example 12.12

cos2 A + sin2 A = (cos A + i sin A)(cos A – i sin A) = eiA e–iA = I .

As with there scalar equivalents, cos A and sin A arise naturally in the solution
of (systems of) second order ODEs. They are far removed from their original
role in trigonometry!
128 Matrix Functions

We can define hyperbolic functions of matrices, too.

1 A 1
cosh A = (e + e–A ), sinh A = (eA – e–A ).
2 2

Examples 12.13

(i) Using the power series definition of e A we can write

e A – e –A  A2k+1

sinh A = = .
2 k=0
(2k + 1)!

Recall that in a bipartite network there are no odd cycles. This


manifests itself as a zero diagonal in sinh A where A is the adjacency
matrix.
(ii) Similarly,

∞
A2k
cosh A = .
k=0
(2k)!

A comparison between cosh and sinh of adjacency matrices gives


insight into the weighting of odd and even walks.

..................................................................................................

FURTHER READING

Benzi, M. and Boito, P. Quadrature rule-based bounds for functions of adjacency


matrices, Lin. Alg. Appl. 433:637–652, 2010.
Estrada, E. and Higham, D.J., Network properties revealed through matrix
functions, SIAM Rev., 52:696–714, 2010.
Higham, N.J., Functions of Matrices, SIAM, 2008.
Moler, C. and Van Loan, C., Nineteen Dubious Ways to Compute the Exponen-
tial of a Matrix, Twenty-Five Years Later, SIAM Rev., 45:3–49, 2003.
Fragment-based Measures
13
In this chapter
13.1 Motivation 129
We start with the definition of a fragment, or subgraph, in a network. We
then introduce the concept of network motif and analyse how to quantify its 13.2 Counting subgraphs in networks 129

significance. We illustrate the concept by studying motifs in some real-world 13.3 Network motifs 140
networks. We then outline mathematical methods to quantify the number of Further reading 142
small subgraphs in networks analytically. We develop some general techniques
that can be adapted to search for other fragments.

13.1 Motivation
In many real-life situations, we are able to identify small structural pieces of a
system which are responsible for certain functional properties of the whole sys-
tem. Biologists, chemists, and engineers usually isolate these small fragments of
the system to understand how they work and gain understanding of their roles in
the whole system. These kinds of structural fragments exist in complex networks.
In Chapter 11 we saw that triangles can indicate transitive relations in social net-
works. They also play a role in interactions between other entities in complex
systems. In this chapter we develop techniques to quantify some of the simplest
but most important fragments or subgraphs in networks. We also show how to
determine whether the presence of these fragments in a real-world network is just
a manifestation of a random underlying process, or that they signify something
more significant.
In network theory fragments are synonymous with subgraphs. Typical sub-
graphs are illustrated in Figure 13.1. In general, a subgraph can be formed
by one (connected subgraph) or more (disconnected subgraphs) connected
components, and they may be cyclic or acyclic.

13.2 Counting subgraphs in networks


In order to count subgraphs in a network we need to use a combination of al-
gebraic and combinatorial techniques. First, we are going to develop some basic
techniques which can be combined to count many different types of subgraph.
130 Fragment-based Measures

(a)

Figure 13.1 Three subgraphs (bottom


line) in an undirected network (top) (b) (c) (d)

13.2.1 Counting stars


We know from previous chapters that the number of edges in a network can be
obtained from the degrees of the nodes. An edge is a star subgraph of the type
S1,1 . Thus

1 
n
|S1,1 | = m = ki . (13.1)
2 i=1

The next star subgraph is S1,2 . Copies of S1,2 in a network can be enumer-
ated by noting that they are formed from any two edges incident to a common
node. That is, |S1,2 | is equal to the number of times that the nodes attached to a
particular node can be combined in pairs. This is simply
n " #
 1 
n
ki
|S1,2 | = = ki (ki – 1). (13.2)
i=1
2 2 i=1

Since S1,2 is the same as P1 , we have generated the formula we used in


calculating transitivity in Chapter 10.
Similarly, the number of S1,3 star subgraphs equals the number of times that
the nodes attached to a given node can be combined in triples, namely,
n " #
 1 
n
ki
|S1,3 | = = ki (ki – 1)(ki – 2). (13.3)
i=1
3 6 i=1

In general, the number of star subgraphs of the type S1,s is given by


n " #
 ki
|S1,s | = . (13.4)
i=1
s
Counting subgraphs in networks 131

13.2.2 Using closed walks


The idea of using closed walks (CWs) to count subgraphs is very intuitive and
simple. Every time that we complete a CW in a network we have visited a se-
quence of nodes and edges which together form a subgraph. For instance, a CW
that goes from a node to any of its neighbours and back again describes an edge,
while every CW of length three necessarily visits all the nodes of a triangle. Note
that CWs of length l = 2d, d = 1, 2, . . ., that go back and forth between adjacent
nodes also describe edges. Similarly, a CW of length l = 2d + 1, d = 1, 2, . . ., visit-
ing only three nodes of the network describes a triangle. Keeping this in mind,
we can design a technique to express the number of CWs as a sum of fragment
contributions. We start by designating by μl the number of CWs of length l in a
network. Then μ0 = n and, in a simple network, μ1 = 0. In general, we know that


n
μk = tr(Ak ) = λki
i=1

but we can rewrite the right-hand side of this expression in terms of particular
subgraphs. Before continuing, visualize what happens with a CW of length two.
Each such walk represents an edge. But in an undirected network there are two
closed walks along each edge (i, j), namely i → j → i and j → i → j. Thus,

μ2 = 2|S1,1 | = 2|P1 |.

Similarly, there are six CWs of length three around every triangle i, j,k since
we can start from any one of its three nodes and move either clockwise or anti-
clockwise: i → j → k → i; i → k → j → i; j → k → i → j; j → i → k → j;
k → i → j → k; k → j → i → k. So,

μ3 = 6|C3 |.

Things begin to get messy for longer CWs as there are several subgraphs as-
sociated with such walks. For example, a CW of length four can be generated by
moving along the same edge four times. This can be done in two ways

i → j → i → j → i and j → i → j → i → j.

We could also walk along two edges and then return to the origin in two ways,

i → j → k → j → i and k → j → i → j → k.

There are two ways of visiting two nearest neighbours before returning to the
origin,

j → i → j → k → j and j → k → j → i → j.
132 Fragment-based Measures

And finally, there are eight ways of completing a cycle of length four in a square
i, j, k, l since we can start from any node and go clockwise or anticlockwise. For
example, starting from node i gives

i → j → k → l → i and i → l → k → j → i.

Consequently,

μ4 = 2|P1 | + 4|P2 | + 8|C4 |.

Problem 13.1
Let G be a regular network with n = 2r nodes of degree k and spectrum

σ (G) = {[k]1 , [1]r–1 , [–1]r–1 , [–k]1 }. (13.5)

Find expressions for the number of triangles and squares in G.


The number of triangles in a network is given by

1  3 16 3 6 77
n
1
t= tr(A3 ) = λi = k + (–k)3 + (n – 1) 13 + (–1)3 = 0.
6 6 i=1 6

The number of squares is given by |C4 | = μ4 /8 – |P1 |/4 – |P2 |/2. Since each
node has degree k,

kn
|P1 | = = kr
2

and

n " #
 k nk(k – 1)
|P2 | = = = nk(k – 1). (13.6)
i=1
2 2

Given that

μ4 = tr(A4 ) = k4 + (–k)4 + (r – 1) + (r – 1)(–1)4 = 2k4 + 2(r – 1), (13.7)

we conclude that

k4 + (r – 1) rk(k – 1) rk k4 + r – 1 – 2rk(k – 1) – rk
|C4 | = – – =
4 2 4 4
k4 + r – rk(2k – 1) – 1
= . (13.8)
4
Counting subgraphs in networks 133

Example 13.1

A network of the type described in problem 13.2 is the cube Q3 which has
the spectrum
σ (G) = {[3]1 , [1]3 , [–1]3 , [–3]1 }.

Applying the formula obtained for the number of squares gives


34 + 4 – 60 – 1
|C4 | = = 6,
4
which is the number of faces on a cube.

13.2.3 Combined techniques


We start by considering a practical example. In this case, we will be interested in
computing the number of fragments of the type illustrated in Figure 13.2. This
fragment is known as a tadpole T3,1 subgraph.
The fragment T3,1 is characterized by having a node which is simultaneously
part of a triangle and of a path of length one. We can combine the idea of calcu-
lating the number of triangles in which the node is involved with the number of
nodes attached to it. Let ti be the number of triangles attached to the node i and
let ki > 2 be the degree of this node. The number of nodes not in the triangle Figure 13.2 The tadpole graph T3,1
which are attached to i is just the remaining degree of the node, i.e. ki – 2. Thus,
the number of tadpole subgraphs in which the node i is involved is ki – 2 times its
number of triangles. Consequently,


|T3,1 | = ti (ki – 2). (13.9)
ki >2

Example 13.2

Find the number of fragments T3,1 in Figure 13.3.

5 1 2 2
6

8 4 3 6
2
7

Figure 13.3 A network


with many tadpole fragments

continued
134 Fragment-based Measures

Example 13.2 continued

Using (13.9) and concentrating only on those nodes with degree larger than
two we have,

|T3,1 |2 × (4 – 2) + 1 × (3 – 2) + 2 × (4 – 2) + 1 × (3 – 2) = 10.

Can you see all of them?

Let us add another edge and consider the fragment illustrated in Figure 13.4.
This subgraph is known as the cricket graph, which we designate by Cr. Here
again, we can use a technique that combines the calculation of the two subgraphs
forming this fragment. That is, this fragment is characterized by a node i that is
simultaneously part of a triangle and a star S1,2 . Using ti as before, we consider
nodes for which ki > 3.
Figure 13.4 Illustration of the cricket If ti > 0 then node i has ki – 2 additional
" # nodes which are attached to it. These
graph ki – 2
ki – 2 nodes can be combined in pairs to form all the S1,2 subgraphs in
2
which node i is involved.
The number of crickets involving node i is then

" #
ki – 2
|Cri | = ti (13.10)
2

and hence

 " ki – 2 # 1 
|Cr| = ti = ti (ki – 2) · (ki – 3). (13.11)
k ≥4
2 2 k ≥4
i i

Example 13.3

Find the number of Cr fragments in the network illustrated in Figure 13.3.


Only the nodes labelled as one and three have degree larger than or equal
to four. We have,

1
|Cr| = (2(4 – 2)(4 – 3) + 2(4 – 2)(4 – 3)) = 4. (13.12)
2

The four crickets are illustrated in Figure 13.5:


Counting subgraphs in networks 135

(a) (b)

(c) (d)
Figure 13.5 Illustration of the four cricket subgraphs within the
network in figure 13.3

13.2.4 Other techniques 1 2

The diamond graph (D) is characterized by the existence of two connected nodes
(1 and 3) which are also connected by two paths of length two (1-2-3 and
1-4-3). It is illustrated in Figure 13.6. To calculate the number of diamonds in
a network we note that the number of walks of length two between two connected 4 3
nodes is given by (A2 )ij Aij and hence that the number of pairs of paths of length
two among two connected nodes i, j is given by Figure 13.6 The diamond graph

" 2 #
(A )ij Aij
.
2

Consequently, the number of diamond subgraphs in a network is given by

" #
1  (A2 )ij Aij 1 6 2 76 7
|D| = = (A )ij Aij (A2 )ij Aij – 1 . (13.13)
2 i, j 2 4 i, j

Problem 13.2
Find an expression for |C5 |, the number of pentagons in a network.
A CW of length l = 2d+1 necessarily visits only nodes in subgraphs containing
at least one odd cycle. So a CW of length five can visit only the nodes of a triangle,
C3 ; a tadpole, T3,1 ; or a pentagon, C5 . Hence

μ5 = a|C3 | + b|T3,1 | + c|C5 | (13.14)


136 Fragment-based Measures

i → j →k → l → j → i j → i → j → l →k → j
i → j → l →k → j → i k → l → j → i → j →k

j →k → l → j → i → j k → j → i → j → l →k

j → l →k → j → i → j l → j → i → j →k → l

Figure 13.7 CWs of length five in T3,1 j → i → j →k → l → j l →k → j → i → j → l

and
1
|C5 | = (μ5 – a|C3 | – b|T3,1 |). (13.15)
c

We have seen how to calculate |C3 | and |T3,1 | hence our task is to determine
the coefficients a, b, and c.
To find a we must enumerate all the CWs of length five in a triangle. This
can be done by calculating tr(A5C ) where AC is the adjacency matrix of C3 . From
Chapter 6 we know that the eigenvalues of C3 are 2, –1, and –1 hence

a= λ5j = 25 + 2(–1)5 = 30. (13.16)
i

To find b we can enumerate all the CWs of length five involving all the nodes
of T3,1 . This is done in Figure 13.7 and we see that b = 10.
We could also proceed in a similar way as for the triangle, but in the tadpole
T3,1 not every CW of length five visits all the nodes of the fragment. That is, there
are CWs of length five which only visit the nodes of the triangle in T3,1 . Thus,

b = tr(A5T ) – a (13.17)

where AT is the adjacency matrix of T3,1 . Computing A5T explicitly we find that
tr(A5T ) = 40 and, again,

b = 40 – 30 = 10.

Finally, to find c, note that for every node in C5 there is one CW of length five
in a clockwise direction and another anticlockwise, e.g. i → j → k → l → m → i
and i → m → l → k → j → i. Thus, c = 10. Finally,

1 ( )
|C5 | = μ5 – 30|C3 | – 10|T3,1 | . (13.18)
10

13.2.5 Formulae for counting small subgraphs


Formulae for a number of simple subgraphs can be derived using very similar
techniques to the ones we have encountered so far. The results are summarized
over the next few pages.
Counting subgraphs in networks 137
1
F1 |F1 | = ki (ki – 1)
2 i

1
F2 |F2 | = tr(A3 )
6

F3 |F3 | = (ki – 1)(kj – 1) – 3|F2 |
(i, j)∈E

1
F4 |F4 | = ki (ki – 1)(ki – 2)
6 i

1
F5 |F5 | = (tr(A4 ) – 4|F1 | – 2m)
8


F6 |F6 | = ti (ki – 2)
ki >2

1 6 2 76 7
F7 |F7 | = (A )ij Aij (A2 )ij · Aij – 1
4 i, j
138 Fragment-based Measures

1 ( )
F8 |F8 | = tr(A5 ) – 30|F2 | – 10|F6 |
10

1
F9 |F9 | = ti (ki – 2)(ki – 3)
2 k ≥4
i

1  "(A2 )ij #
F10 |F10 | = (ki – 2) × – 2|F7 |
2 i i, j
2


F11 |F11 | = (A2 )ij (ki – 2)(kj – 2) – 2|F7 |
(i, j)∈E

⎛ ⎞
 
F12 |F12 | = ti ⎝ (A2 )ij ⎠ – 6|F2 | – 2|F6 | – 4|F7 |
i i = j

1
F13 |F13 | = ti (ti – 1) – 2|F7 |
2 i
Counting subgraphs in networks 139


F14 |F14 | = (A3 )ij (A2 )ij – 9|F2 | – 2|F6 | – 4|F7 |
(i, j)∈E

1 (
|F15 | = tr(A6 ) – 2m – 12|F1 | – 24|F2 | – 6|F3 |
F15 12 )
–12|F4 | – 48|F5 | – 36|F7 | – 12|F10 | – 24|F13 |

1
|F16 | = (ki – 2)Bi – 2|F14 | where
2 i
F16   ( )
Bi = (A5 )ii – 20ti – 8ti (ki – 2) – 2 (A2 )ij (kj – 2) – 2 tj – (A2 )ij
(i, j)∈E (i, j)∈E

 "(A2 )ij #
F17 |F17 | =
(i, j)∈E
3

  "(A2 )ij #
F18 |F18 | = ti · – 6|F7 | – 2|F14 | – 6|F17 |
i i = j
2
140 Fragment-based Measures

13.3 Network motifs


We can use the techniques we have developed in this chapter to count the number
of occurrences of a given fragment in a real-world network. Certain fragments
arise inevitably through network connectivity. It is possible that the frequency
with which they appear is similar to equivalent random networks. In this case,
we cannot use the abundance of a fragment to explain any evolutionary mech-
anism giving rise to the structure of that network. However, if a given fragment
appears more frequently than expected we can infer that there is some structural
or functional reasons for the over expression. This is precisely the concept of a
network motif. A subgraph is considered a network motif if the probability P of it
appearing in a random network an equal or greater number of times than in the
real-world network is lower than a certain cut-off value, which is generally taken
to be Pc = 0.01.
In order to quantify the statistical significance of a given motif we use the
Z-score which, for a given subgraph i, is defined as

Nireal – Nirandom 
Zi = , (13.19)
σirandom

where Nireal is the number of times the subgraph i appears in the real network,
Nirandom  and σirandom are the average and standard deviation of the number of
times that i appears in an ensemble of random networks, respectively. Similarly,
the relative abundance of a given fragment can be estimated using the statistic:

Nireal – Nirandom 
αi = . (13.20)
Nireal + Nirandom 

13.3.1 Motifs in directed networks


Motifs in directed networks are simply directed subgraphs which appear more
frequently in the real-world network than in its random counterpart. The situ-
ation is more complex because the number of motifs with the same number of
nodes is significantly larger than for the undirected networks (for instance, there
are seven directed triangles versus only one undirected); and in general there are
no analytical tools for counting such directed subgraphs. However, there are sev-
eral computational approaches that allow the calculation of the number of small
directed subgraphs and directed motifs in networks. In Figure 13.8 we illustrate
some examples of directed triangles found in real-world networks as motifs.
A characteristic feature of network motifs is that they are network-specific.
That is, what is a motif for one is not necessarily a motif for another. However, a
family of networks can be identified if they share the same series of motifs. One
can characterize this by generating a vector whose ith entry gives the importance
Network motifs 141

(a) (b) (c)


Figure 13.8 Motifs in directed net-
works (a) Feed-forward loop (neurons)
(b) Three-node feedback loop (Electronic
circuits) (c) Up-linked mutual dyad (d)
Feedback with mutual dyads (e) Fully
connected triad. Dyads are typical in the
(d) (e) WWW

of the ith motif with respect to the other motifs in the network. The resulting
component of the significance profile vector is given by
Zi
SPi =  . (13.21)
Zj2
i

13.3.2 Motifs in undirected networks


In Figure 13.9 we illustrate the relative abundance of 17 of the small subgraphs
we have discussed for six complex networks representing different systems in
the real-world. The average is taken over random networks whose nodes have
the same degrees as the real ones. It can be seen that there are a few fragments
which are over-represented in some networks while other fragments are under-
represented. Fragments which appear less frequently in a real-world network than
is expected in an analogous random one are called anti-motifs.

Problem 13.3
The connected component of the protein–protein interaction network of yeast has
2,224 nodes and 6,609 links. It has been found computationally that the number
of triangles in that network is 3,530. Determine the relative abundance of this
fragment in order to see whether it is a motif in this network.
We use the formula
tireal – tirandom 
αi = ,
tireal + tirandom 

where we know that tireal = 3,530. We have to estimate tirandom . Let us consider
Erdös-Rényi random networks with 2,224 nodes and 6,609 links for which
2m
p= = 0.00267.
n(n – 1)
142 Fragment-based Measures

1.0

0.8

0.6

0.4

0.2

0.0

–0.2

–0.4

–0.6
Internet Drug users
–0.8 Airports Yeast PPI
Thesaurus Prison immates
–1.0
Figure 13.9 Motifs and anti-motifs in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
undirected networks

We also know that for large n, λ1 → pn and all the other eigenvalues are
negligible so we use the approximation

1 3
n
λ3 (np)3
tirandom  = λj ≈ 1 = .
6 j=1 6 6

Thus, tirandom  ≈ 35. This estimate is very good indeed. For instance, the
average number of triangles in 100 realizations of an ER network is tirandom  =
35.4 ± 6.1. Using the value of tirandom  ≈ 35 we obtain αi = 0.98, which is very
close to one. We conclude that the number of triangles in the yeast PPI is sig-
nificantly larger than expected by chance, and we can consider it as a network
motif.

..................................................................................................

FURTHER READING

Alon, N., Yuster, R., and Zwick, U., Finding and counting given length cycles,
Algorithmica 17:209–223, 1997.
Milo, R. et al., Network motifs: Simple building blocks of complex networks,
Science 298:824–827, 2002.
Milo, R. et al., Superfamilies of evolved and designed networks, Science
303:1538–1542, 2004.
Classical Node Centrality
14
In this chapter
14.1 Motivation 143
The concept of node centrality is motivated and introduced. Some proper-
ties of the degree of a node are analysed along with extensions to consider 14.2 Degree centrality 143

non-nearest neighbours. Two centralities based on shortest paths on the net- 14.3 Closeness centrality 146

work are defined—the closeness and betweenness centrality—and differences 14.4 Betweenness centrality 152
between them are described. We finish this chapter with a few problems to Further reading 156
illustrate how to find analytical expression for these centralities in certain
classes of networks.

14.1 Motivation
The notion of centrality of a node first arose in the context of social sciences
and is used in the determination of the most ‘important’ nodes in a network.
There are a number of characteristics, not necessarily correlated, which can be
used in determining the importance of a node. These include its ability to com-
municate directly with other nodes, its closeness to many other nodes, and its
indispensability to act as a communicator between different parts of a network.
Considering each of these characteristics in turn leads to different central-
ity measures. In this chapter we study such measures and illustrate the different
qualities of a network that they can highlight.

14.2 Degree centrality


The degree centrality simply corresponds to degree, and clearly measures the
ability of a node to communicate directly with others. As we have seen, the degree
of node i in a simple network G is defined using its adjacency matrix, A, as


n
ki = aij = (eT A)i = (Ae)i . (14.1)
j=1

So with degree centrality, i is more central than j if ki > kj . In a directed network,


where in-degree and out-degree can be different, we can utilize degree to get two
centrality measures, namely,
144 Classical Node Centrality

n 
n
kin
i = aji = (eT A)i , kout
i = aij = (Ae)i . (14.2)
i=1 j=1

The following are some elementary facts about the degree centrality. You are
invited to prove these yourself.

1. ki = (A2 )ii .
n
2. ki = 2m, where m is the number of links.
i=1

n 
n
3. kin
i = kout
i = m , where m is the number of links.
i=1 i=1

Example 14.1

Let us consider the network illustrated in Figure 14.1

1 4

3 5

Figure 14.1 A simple labelled net-


work

Since the adjacency matrix of the network is


⎡ ⎤
0 1 1 0 0
⎢1 0 1 1 1⎥
⎢ ⎥
⎢ ⎥
A = ⎢1 1 0 0 0⎥
⎢ ⎥
⎣0 1 0 0 0⎦
0 1 0 0 0

the node degree vector is


⎡ ⎤⎡ ⎤ ⎡ ⎤
0 1 1 0 0 1 2
⎢1 1⎥ ⎢1⎥ ⎢4⎥
⎢ 0 1 1 ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
k = Ae = ⎢1 1 0 0 0⎥ ⎢1⎥ = ⎢2⎥ .
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣0 1 0 0 0⎦ ⎣1⎦ ⎣1⎦
0 1 0 0 0 1 1
That is, the degrees of the nodes are: k(1) = k(3) = 2, k(2) = 4, k(4) =
k(5) = 1, indicating that the most central node is 2.
Degree centrality 145

Example 14.2

Let us consider the network displayed in Figure 14.2 together with its adjacency matrix
⎡ ⎤
0 1 0 1 1 0
⎢1 0 0 0 0 0⎥
⎢ ⎥
⎢ ⎥
⎢0 1 0 1 0 0⎥
A=⎢ ⎥.
⎢0 0 0 0 0 0⎥
⎢ ⎥
⎣0 0 0 1 0 0⎦
1 0 0 0 0 0

2 1

3 4

Figure 14.2 A labelled directed net-


work

The in- and out-degree vectors are then obtained as follows:


⎡ ⎤T ⎡ ⎤ ⎡ ⎤
0 1 0 1 1 0 1 2
⎢1 0 0 0⎥ ⎢ ⎥ ⎢ ⎥
⎢ 0 0 ⎥ ⎢1⎥ ⎢2⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 1 0 1 0 0⎥ ⎢1⎥ ⎢0⎥
kin = (eT A)T = AT e = ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥,
⎢0 0 0 0 0 0⎥ ⎢1⎥ ⎢3⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣0 0 0 1 0 0⎦ ⎣1⎦ ⎣1⎦
1 0 0 0 0 0 1 0
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 1 0 1 1 0 1 3
⎢1 0⎥ ⎢ ⎥ ⎢ ⎥
⎢ 0 0 0 0 ⎥ ⎢1⎥ ⎢1⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢0 1 0 1 0 0⎥ ⎢1⎥ ⎢2⎥
kout = Ae = ⎢ ⎥⎢ ⎥ = ⎢ ⎥.
⎢0 0 0 0 0 0⎥ ⎢1⎥ ⎢0⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣0 0 0 1 0 0⎦ ⎣1⎦ ⎣1⎦
1 0 0 0 0 0 1 1
Nodes 3 and 6 are known as sources because their out-degrees are equal to zero but not their in-degrees. Node 4 is
a sink because its in-degree is zero but not its out-degree. If both, the in- and out-degree are zero for a node, the node
is isolated.
The most central node in sending information to its nearest neighbours is node 1 and in receiving information is
node 4.
146 Classical Node Centrality

Example 14.3

Let us now consider a real-world network. It corresponds to the food web of St Martin island in the Caribbean, in
which nodes represent species and food sources and the directed links indicate what eats what in the ecosystem. Here
we represent the networks in Figure 14.3 by drawing the nodes as circles with radius proportional to the corresponding
in-degree in (a) and out-degree in (b).

(a) (b)
Figure 14.3 Food webs in St Martin with nodes drawns as circles of radii proportional to (a)
in-degree (b) out-degree

The in- and out-degree vectors are calculated in exactly the same way as in the last example. In this case, every
node has a label which corresponds to the identity of the species in question. In analysing this network according to
the in- and out-degree, we can point out the following observations which are of relevance for the functioning of this
ecosystem.

• Nodes with high out-degree are predators with a large variety of prey. Examples include the lizards Anolis
gingivinus (the Anguilla Bank Anole) and Anolis pogus; and the birds the pearly-eyed thrasher and the yellow
warbler.
• High in-degree nodes represent species and organic matter which are eaten by many others in this ecosystem,
such as leaves, detritus, and insects such as aphids.
• In general, top predators are not predated by other species, thus having significantly higher out-degree than
in-degree.
• The sources with zero in-degree are all birds: the pearly-eye thrasher, yellow warbler, kestrel, and grey kingbird.
• Highly predated species are not usually prolific predators, thus they have high in-degree but low out-degree.
• The sinks are all associated with plants or detritus.

14.3 Closeness centrality


The closeness centrality of a node characterizes how close this node is from
the rest of the nodes. This closeness is measured in terms of the shortest path
distance. The closeness of the node i in an undirected network G is defined as
Closeness centrality 147
n–1
CC(i) = , (14.3)
s(i)
where the distance-sum s(i) is calculated from the shortest path distances d(i, j) as

s(i) = d(i, j). (14.4)
j∈V (G)

In a directed network a node has in- and out-closeness centrality. The first cor-
responds to how close this node is to nodes it is receiving information from. The
out-closeness centrality indicates how close the node is from those it is sending
information to. In directed networks the shortest path is a pseudo-distance due to
a possible lack of symmetry.

Example 14.4

Consider the network illustrated in Figure 14.4.

2 3 9

4 8

1 10

7 6

Figure 14.4 A network where close-


ness centrality does not match degree
centrality

We start by constructing the distance matrix of this network, which is given by


⎡ ⎤
0 1 1 1 2 1 1 2 3 3
⎢1 0 1 2 3 2 2 3 4 4⎥
⎢ ⎥
⎢ ⎥
⎢1 1 0 1 2 2 2 2 3 3⎥
⎢ ⎥
⎢1 2 1 0 1 2 2 1 2 2⎥
⎢ ⎥
⎢2 3 2 1 0 1 2 2 3 3⎥
D=⎢ ⎢1 2 2 2 1 0 1 3 4 4⎥

⎢ ⎥
⎢1 2 2 2 2 1 0 3 4 4⎥
⎢ ⎥
⎢ ⎥
⎢2 3 2 1 2 3 3 0 1 1⎥
⎢ ⎥
⎣3 4 3 2 3 4 4 1 0 2⎦
3 4 3 2 3 4 4 1 2 0
The vector of distance-sum of each node is then

s = De = (eT D)T = [ 15 22 17 14 19 20 21 18 26 26 ]T

continued
148 Classical Node Centrality

Example 14.4 continued

And we use (14.3) to measure the closeness centrality of each node. For instance for node 1

9
CC(1) = = 0.6.
15

The full vector of closeness centralities is

CC = [0.600 0.409 0.529 0.643 0.474 0.450 0.428 0.500 0.346 0.346]T,

indicating that the most central node is node 4. Notice that in this case the degree centrality identifies another node
(namely 1) as the most important whereas in Figure 14.1 node 2 has both the highest degree and closeness centralities.

Example 14.5

We consider here the air transportation network of the USA, where the nodes represent the airports in the USA and the
links represent the existence of at least one flight connecting the two airports. In Figure 14.5 we illustrate this network
in which the nodes are represented as circles with radii proportional to the closeness centrality.

Figure 14.5 A representation of the USA air transportation network in 1997


Closeness centrality 149

The most central airports according to the closeness centrality are given in Table 14.1.

Table 14.1 Central airports according to the closeness centrality.

Airport Closeness centrality × 100

Chicago O’Hare Intl 60.734


Dallas/Fort Worth Intl 55.444
Minneapolis-St Paul Intl 53.997
William B Hartsfield, Atlanta 53.560
San Francisco Intl 53.301
Lambert-St Louis Intl 52.875
Seattle-Tacoma Intl 52.623
Los Angeles Intl 52.456

The first four airports in this list (and the sixth) correspond to airports in the geographic centre area of continental
USA. The other three are airports located on the west coast. The first group are important airports in connecting
the East and West of the USA with an important traffic also between north and south of the continental USA. The
second group represents airports with important connections between the main USA and Alaska, as well as overseas
territories like Hawaii and other Pacific islands. The most highly ranked airports according to degree centrality are
given in Table 14.2. Notice that the group of west coast airports is absent.

Table 14.2 Highly ranked airports according to degree centrality.

Airport Degree centrality

Chicago O’Hare Intl 139


Dallas/Fort Worth Intl 118
William B Hartsfield, Atlanta 101
Pittsburgh Intl 94
Lambert-St Louis Intl 94
Charlotte/Douglas Intl 87
Stapleton Intl 85
Minneapolis-St Paul Intl 78
150 Classical Node Centrality

Problem 14.1
Let CC(i) be the closeness centrality of the ith node in the path network Pn–1
labelled 1 – 2 – 3 – 4 – · · · – (n – 1) – n.

(a) Find a general expression for the closeness centrality of the ith node in
Pn–1 in terms of i and n only.
(b) Simplify the expressions found in (a) for the node(s) at the centre of the
path Pn–1 (for both odd and even values of n).
(c) Show that the closeness centrality of these central nodes is the largest in a
path Pn–1 .

The solution can be arrived at as follows

(a) We start by considering the sum of all the distances from one node to the
rest of the nodes in the path.

Table 14.3 The sum of all the distances from one node.

Node d ij
j =i

1 1 + 2 + ··· +n – 1
2 1 + 1 + 2 + ··· +n – 2
3 2 + 1 + 1 + 2 + ··· + n– 3
.. ..
. .
i (i – 1) + (i – 2) + · · · + 2 + 1 + 1 + 2 + · · · + n – i

It is important to notice here that for each node the sum of the dis-
tances corresponds to a ‘right’ sum, i.e. the sum of the distances of all
nodes to the right of the node i and a ‘left’ sum, i.e. the sum of the dis-
tances of all nodes located to the left of i, (i – 1) + (i – 2) + · · · + 2 + 1.
These two sums are given, respectively, by
(n – 1)(n – i + 1)
1 + 2 + ··· +n – i = , (14.5)
2
(i – 1)i
(i – 1) + (i – 2) + · · · + 2 + 1 = . (14.6)
2
Then, by substituting into the formula (14.3) we obtain
n–1
CC(i) = , (14.7)
(i – 1)i (n – i)(n – i + 1)
+
2 2
Closeness centrality 151

which can be written as

2(n – 1)
CC(i) = . (14.8)
(i – 1)i + (n – i)(n – i + 1)

n+1
(b) For a path with an odd number of nodes the central node is i = .
2
By substitution into (14.8) we obtain
" #
n+1 2(n – 1)
CC = " # " #" #,
2 n+1 n+1 n+1 n+1
–1 + n– n– +1
2 2 2 2
(14.9)

which reduces to
" #
n+1 4
CC = . (14.10)
2 (n + 1)

n
For a path with an even number of nodes the central nodes are i = and
2
n+1
i= . Now,
2
6n7 2(n – 1)
CC = 6n 7n 6 n76 n 7, (14.11)
2 –1 + n– n– +1
2 2 2 2

and in this case


6n7 4(n – 1)
CC = . (14.12)
2 n2

n+1
We recover the same value when i = .
2
(c) Simply consider

2(n – 1)
CC(i + 1) – CC(i) =
(i + 1)i + (n – i – 1)(n – i)
2(n – 1)
– . (14.13)
i(i – 1) + (n – i)(n – i + 1)

Putting the right-hand side over a common denominator gives the


numerator

4(n – 1)(n – 2i), (14.14)

which is positive if i < n/2 and negative if i > n/2, so CC(i) reaches its
maximum value in the centre of the path.
152 Classical Node Centrality

14.4 Betweenness centrality


The betweenness centrality characterizes how important a node is in the com-
munication between other pairs of nodes. That is, the betweenness of a node
accounts for the proportion of information that passes through a given node in
communications between other pairs of nodes in the network.
As for the closeness centrality, betweenness assumes that the information trav-
els from one node to another through the shortest paths connecting those nodes.
The betweenness of the node i in an undirected network G is defined as

  ρ( j, i, k)
BC(i) = , i = j = k, (14.15)
i k
ρ( j, k)

where ρ( j, k) is the number of shortest paths connecting the node j to the node
k, and ρ( j, i, k) is the number of these shortest paths that pass through node i in
the network.
If the network is directed, the term ρ( j, i, k) refers to the number of directed
paths from the node j to the node k that pass through the node i, and ρ( j, k) to
the total number of directed paths from the node j to the node k.

Example 14.6

We consider again the network used in Figure 14.4 and we explain how to
obtain the betweenness centrality for the node labelled as one. For this, we
construct Table 14.4 in which we give the number of shortest paths from any
pair of nodes that pass through the node 1, ρ( j, 1, k). We also report the total
number of shortest paths from these pairs of nodes ρ( j, k).
The betweenness centrality of the node 1 is simply the total sum of the
terms in the last column of Table 14.4,

  ρ( j, 1, k)
BC(1) = = 12.667.
j k
ρ( j, k)

Using a similar procedure, we obtain the betweenness centrality for each


node:

BC = [12.667 0.000 2.333 20.167 2.000 1.833 0.000 15.000


0.000 0.000]T ,

which indicates that the node 4 is the most central one, i.e. it is the most
important in allowing communication between other pairs of nodes.
Betweenness centrality 153

Table 14.4 The number of shortest paths from any pair of


nodes passing through 1, ρ( j, 1.k)

ρ( j, i, k)
( j, k) ρ( j, 1, k) ρ( j, k)
ρ( j, k)
2,4 1 2 1/2
2,5 2 3 2/3
2,6 1 1 1
2,7 1 1 1
2,8 1 2 1/2
2,9 1 2 1/2
2,10 1 2 1/2
3,6 1 1 1
3,7 1 1 1
4,6 1 2 1/2
4,7 1 1 1
6,8 1 2 1/2
6,9 1 2 1/2
6,10 1 2 1/2
7,8 1 1 1
7,9 1 1 1
7,10 1 1 1
12.667

Example 14.7

In Figure 14.6 we illustrate the urban street network of the central part of
Cordoba, Spain. The most central nodes according to the betweenness cor-
respond to those street intersections which surround the central part of the
city and connect it with the periphery.
continued
154 Classical Node Centrality

Example 14.7 continued

Figure 14.6 The street network of Cordoba with nodes of


radii proportional to their betweenness centrality

Problem 14.2
Let G be a tree with n = n1 + n2 + 1 and the structure displayed in Figure 14.7.
State conditions for the nodes labelled a, b, and c to have the largest value of
betweenness centrality.
We start by considering the betweenness centrality of node a. Let us designate
by V1 and V2 the two branches of the graph, the first containing n1 and the second
a b c
n2 nodes.

n1 n2 Fact 1 Because the network is a tree, the number of shortest paths from p to
q that pass through node k, ρ( p, k, q), is the same as the number of shortest
Figure 14.7 A networked formed by paths from p to q, ρ( p, q). That is, ρ( p, k, q) = ρ( p, q).
joining two star networks together.
Fact 2 There are n1 nodes in the branch V1 . Let us denote by i any node in
Dashed lines indicate the existence of
this branch which is not a and by j any node in V2 which is not c. There are
other equivalent nodes
n1 – 1 shortest paths from nodes i to node b. That is,

ρ(i, a, b) = n1 – 1. (14.16)

Fact 3 We can easily calculate the number of paths from a node i to any node
in the branch V2 which go through node a. Because there are n1 – 1 nodes
of type i and n2 nodes in the branch V2 we have

ρ(i, a, V2 ) = (n1 – 1)n2 .


Betweenness centrality 155

Fact 4 Any path going from a node denoted by i to another such node passes
through the node a. Because there are n1 – 1 nodes of the type i we have that
the number of these paths is given by
" #
n1 – 1 (n1 – 1)(n1 – 2)
ρ(i, a, i) = = . (14.17)
2 2

Therefore the total number of paths containing the node a, and conse-
quently its betweenness centrality, is

(n1 – 1)(n1 – 2)
BC(a) = 2(n1 – 1) + (n1 – 1)(n2 – 1) +
2
(n1 – 1)(2n2 + n1 )
= .
2

In a similar way we obtain

(n2 – 1)(2n1 + n2 )
BC(c) = . (14.18)
2

To calculate the betweenness centrality for node b we observe that every path
from the n1 nodes in branch V1 to the n2 nodes in branch V2 passes through
node b. Consequently,

BC(b) = n1 n2 . (14.19)

Obviously, all nodes apart from a, b, and c have zero betweenness centrality.
We consider in turn the conditions for the three remaining nodes to be central.
In order for node a to have the maximum BC, the following conditions are
necessary:

BC(a) > BC(c) and BC(a) > BC(b).

First,

(n1 )(2n2 + n1 ) (n2 – 1)(2n1 + n2 )


BC(a) > BC(c) ⇒ >
2 2
⇒ (n21 + n1 ) > (n22 + n2 ) ⇒ n1 > n2 ,

and

(n1 – 1)(2n2 + n1 ) n1 (n1 – 1)


BC(a) > BC(b) ⇒ > n1 n2 ⇒ > n2 .
2 2

The second condition is fulfilled only if n1 ≥ n2 and n1 > 3. By combining


both conditions we conclude that BC(a) is the absolute maximum in the graph
only in the cases when n1 > n2 and n1 > 3.
By symmetry, BC(c) is the absolute maximum if n2 > n1 and n2 > 3.
156 Classical Node Centrality

Finally, for BC(b) to be the absolute maximum we need

BC(b) > BC(a) BC(b) > BC(c)


(n1 – 1)(2n2 + n1 ) (n2 – 1)(2n1 + n2 )
n1 n2 > and n1 n2 >
2 2
n1 (n1 – 1) n2 (n2 – 1)
n2 > n1 >
2 2

The two conditions are fulfilled simultaneously only if n1 = n2 < 3. That


is, BC(b) is the absolute maximum only when n1 = n2 = 1 or when n1 = n2 = 2,
which correspond to paths of lengths 3 and 5, respectively (check this by
yourself ).

..................................................................................................

FURTHER READING

Borgatti, S.P., Centrality and network flow, Social Networks 27:55–71, 2005.
Borgatti, S.P. and Everett, M.G., A graph-theoretic perspective on centrality,
Social Networks 28:466–484, 2006.
Brandes, U. and Erlebach, T. (Eds.), Network Analysis: Methodological Founda-
tions, Springer, 2005, Chapters 3–5.
Estrada, E., The Structure of Complex Networks: Theory and Applications, Oxford
University Press, 2011, Chapter 7.
Wasserman, S. and Faust, K., Social Network Analysis: Methods and Applications,
Cambridge University Press, 1994, Chapter 5.
Spectral Node Centrality
15
In this chapter
15.1 Motivation 157
The necessity of considering the influence of a node beyond its nearest
neighbours is motivated. We introduce centrality measures that account for 15.2 Katz centrality 158

long-range effects of a node, such as the Katz index, eigenvector centrality, 15.3 Eigenvector centrality 160
the PageRank index, and subgraph centrality. A common characteristic of 15.4 Subgraph centrality 164
these centrality measures is that they can be expressed in terms of spectral Further reading 169
properties of the networks.

15.1 Motivation
Suppose we use a network to model a contagious disease amongst a population.
Nodes represent individuals and edges represent potential routes of infection be-
tween these individuals. We illustrate a simple example in Figure 15.1. We focus
on the nodes labelled 1 and 4 and ask which of them has the higher risk of con-
tagion. Node 1 can be infected from nodes 2 and 3, while node 4 can be infected
from 5 and 6. From this point of view it looks like both nodes are at the same level
of risk. However, while 2 and 3 cannot be infected by any other node, nodes 5
and 6 can be infected from nodes 7 and 8, respectively. Thus, we can intuitively
think that 4 is at a greater risk than 1 as a consequence of the chain of transmis-
sion of the disease. Local centrality measures like node degree do not account for
a centrality that goes beyond the first nearest neighbours, so we need other kinds
of measures to account for such effects. In this chapter we study these measures
and illustrate the different qualities of a network that they can highlight.

2 5 7

1 4
Figure 15.1 A simple network. Nodes
3 6 8 1 and 4 are rivals for title of most central
158 Spectral Node Centrality

15.2 Katz centrality


The degree of the node i counts the number of walks of length one from i to
every other node of the network. That is, ki = (Ae)i . In 1953, Katz extended this
idea to count not only the walks of length one, but those of any length starting at
node i. Intuitively, we can reason that the closest neighbours have more influence
over node i than more distant ones. Thus when combining walks of all lengths,
one can introduce an attenuation factor so that more weight is given to shorter
walks than to longer ones. This is precisely what Katz did and the Katz index is
given by
∞ 
* 0 0 + 
Ki = (α A + αA + α A + · · · + α A + · · · )e i =
2 2 k k k k
(α A )e . (15.1)
k=0 i

–1
The series in (15.1) is related to the resolvent function (zI – A) . In particular,
we saw in Example 12.4(iii) that the series converges so long as α < ρ(A) in which
case
* +
Ki = (I – αA)–1 e i . (15.2)

The Katz index can be expressed in terms of the eigenvalues and eigen-
vectors of the adjacency matrix. From the spectral decomposition A = QDQT
(see Chapter 5),
 1
Ki = qj (i)qj (l) . (15.3)
l j
1 – αλj

When deriving his index, Katz ignored the contribution from A0 = I and
instead used
*( ) +
K i = (I – αA)–1 – I e i . (15.4)

While the values given by (15.2) and (15.4) are different, the rankings are
exactly the same. We will generally use (15.2) because of the nice mathematical
properties of the resolvent.

Example 15.1

We consider the network illustrated in Figure 15.1. The principal eigenvalue


of the adjacency matrix for this network is λ1 = 2.1010. With α = 0.3 we
obtain the vector of Katz centralities

K = [3.242 1.972 1.972 3.524 2.591 2.591 1.777 1.777]T .

Node 4 has the highest Katz index, followed by node 1 which accords with
our intuition on the level of risk of each of these nodes in the network.
Katz centrality 159

15.2.1 Katz centrality in directed networks


In directed networks we should consider the Katz centrality of a node in terms
of the number of links going in and out from a node. This can be done by
considering the indices

* + * +
Kiout = (I – αA)–1 e i , Kiin = eT (I – αA)–1 i .

The second index can be considered as a measure of the ‘prestige’ of a node


because it accounts for the importance that a node inherits from those that point
to it.

Example 15.2

We measure the Katz indices of the nodes in the network illustrated in


Figure 15.2.

1 2 5

4 3 6

Figure 15.2 A directed net-


work. In- and out- centrality
measures vary

Using α = 0.5 we obtain the Katz indices

 
K in = 1.50 2.25 2.62 1.00 1.00 2.31 ,
 T
K out = 1.88 1.75 1.50 2.69 1.88 1.00 .

Notice that nodes 2 and 3 are each pointed to by two nodes. However,
node 3 is more central because it is pointed to by nodes with greater centrality
than those pointing to 2. In fact, node 6 is more central than node 2 because
the only node pointing to it is the most important one in the network. On the
other hand, out-Katz identifies node 4 as the most central one. It is the only
node having out-degree of two.
160 Spectral Node Centrality

15.3 Eigenvector centrality


Let us consider the following modification of the Katz index:

.∞ / ⎛ ⎞ ⎛ ⎞
 
∞ 
n
1 n  ∞
ν = α k–1 Ak e = ⎝ α k–1 qj qTj λkj ⎠e = ⎝ (αλj )k qj qTj ⎠e
k=1 k=1 j=1
α j=1 k=1
⎛ ⎞
1 n
1
=⎝ q q ⎠e.
T
α j=1 1 – αλj j j

Now, let the parameter α approach the inverse of the largest eigenvalue of the
adjacency matrix from below, i.e. α → 1/λ–1 . Then

⎛ ⎞ . /
1 n
(1 – αλ 1 )ν
n
lim (1–αλ1 )ν = lim – ⎝ q j q j ⎠e = λ 1
T
q1 (i) q1 = γ q1 .
α→1/λ–
1
α→1/λ
1 α j=1 1 – αλj i=1

Thus the eigenvector associated with the largest eigenvalue of the adjacency ma-
trix is a centrality measure conceptually similar to the Katz index. Accordingly,
the eigenvector centrality of the node i is given by q(i), the ith component of the
principal eigenvector q1 of A. Typically, we normalize q1 so that its Euclidean
length is one. By the Perron–Frobenius theorem we can choose q1 so that all of
its components are nonnegative.

Examples 15.3

(i) The eigenvector centralities for the nodes of the network in Figure 15.1 are

q1 = [0.500 0.238 0.238 0.574 0.354 0.354 0.168 0.168]T .

Here again node 4 is the one with the highest centrality, followed by node 1. Node 4 is connected to nodes which
are higher in centrality than the nodes to which node 1 is connected. High degree is not the only factor considered
by this centrality measure. The most central nodes are generally connected to other highly central nodes.
(ii) Sometimes being connected to a few very important nodes make a node more central than being connected to
many not so central ones. For instance, in Figure 15.3, node 4 is connected to only three other nodes, while
1 is connected to four. However, 4 is more central than 1 according to the eigenvector centrality because it is
connected to two nodes with relatively high centrality while 1 is mainly connected to peripheral nodes. The vector
of centralities is
 T
q1 = 0.408 0.167 0.167 0.500 0.408 0.167 0.167 0.167 0.408 0.167 0.167 0.167 .
Eigenvector centrality 161

6
7

2 5 8

13 1 4

3 9 10

11

12

Figure 15.3 A network highlighting the difference between


degree and eigenvector centrality

Problem 15.1
Let G be a simple connected network with n nodes and adjacency matrix A with
spectral decomposition QDQT . Let Nk (i) be the number of walks of length k
starting at node i. Let

Nk (i)
sk (i) = n
j=1 Nk (j)

be the ith element of the vector sk . Show that if G is not bipartite then there is a
scalar α such that as k → ∞, sk → αq1 almost surely. That is, the vector sk will
tend to rank nodes identically to eigenvector centrality.
Since Ak = QDk QT ,

k
eTi Ak e eTi QDk QT e qTi Dk r qTi D r
sk (i) = = = T k = k
,
eT Ak e eT QDk QT e r Dr rT D r

where r = QT e and D = D/λ1 .


k
Since G is connected and not bipartite, |λ1 | > |λj | for all j > 1 so D → e1 eT1
as k → ∞.1 We have established that
1 Note that e eT is a matrix whose only
1 1
qT e1 eT1 r
sk (i) → Ti = αqi (1) nonzero element is a 1 in the top left hand
r e1 eT1 r corner.
162 Spectral Node Centrality

where α = 1/(eTi r) and so

lim sk = αq1 ,
k→∞

as desired. Note that we require eTi r = 0 in this analysis, which is almost surely
true for a network chosen at random.

15.3.1 Eigenvector centrality in directed networks


As with our other measures, we can define eigenvector centrality for directed net-
works. In this case we use the principal right and left eigenvectors of the adjacency
matrix as the corresponding centrality vectors for the nodes in a directed network.
If Ax = λ1 x and AT y = λ1 y then the elements of x and y give the right and left
eigenvector centralities, respectively.
The right eigenvector centrality accounts for the importance of a node through
the importance of nodes to which it points. It is an extension of the out-degree
concept. On the other hand, the left eigenvector centrality accounts for the im-
portance of a node by considering those nodes pointing towards a corresponding
node and it is an extension of the in-degree centrality.

Example 15.4

2 1 5

3 4
Figure 15.4 A directed network
highlighting the difference between
left and right eigenvector centrality

The left and right eigenvector centralities of the network in Figure 15.4 are
 T  T
x = 0.592 0.288 0.366 0.465 0.465 , y = 0.592 0.465 0.366 0.288 0.465 .

Notice the differences in the rankings of nodes 4 and 5. According to the right eigenvector, both nodes ranked as
the second most central. They both point to the most central node of the network according to this criterion, node 1.
However, according to the left eigenvector, while node 5 is still the second most important, node 4 has been relegated
to the least central one. Node 5 is pointed to by the most central node, but node 4 is pointed to only by a node with
low centrality.
Eigenvector centrality 163

15.3.2 PageRank centrality


When we carry out a search for a particular term, a search engine is likely to
return thousands or millions of related web pages. A good search engine needs to
make sure that pages that are most likely to match the query are promoted to the
front of this list and centrality measures are a key tool in this process.
By viewing the World Wide Web (WWW) as a giant directed network whose
nodes are pages and whose edges are the hyperlinks between them, search engines
can attempt to rank pages according to centrality. While this is only one of the
factors involved nowadays, much of the initial success of Google has been credited
to their use of their own centrality measure, which they dubbed PageRank.
PageRank is closely related to eigenvector centrality and it explicitly measures
the importance of a web page via the importance of other web pages pointing to
it. In simple terms, the PageRank of a page is the sum of the PageRank centralities
of all pages pointing into it.
The first step in computing PageRank is to manipulate the adjacency matrix,
A. The WWW is an extremely complex network and is known to be disconnected.
In order to apply familiar analytic tools, such as the Perron–Frobenius theorem,
we need to make adjustments to A. In practice, the simplest approach is to arti-
ficially rewire nodes which have no outbound links so that they are connected to
all the other nodes in the network. That is, we replace A with a new matrix H
defined so

aij , kout
i > 0,
Hij = (15.5)
1 kout
i = 0.

PageRank can then be motivated by considering what would happen if an


internet surfer moved around the WWW from page to page by picking out-links
uniformly at random. The insight that the developers of Google had was that the
surfer is more likely to visit pages that have been deemed important in that they
have in-links from other important pages. Using the theory of Markov chains, it
can be shown that the relative frequency of page visits can be measured by the
elements of the principal left eigenvector of the stochastic matrix

S = D–1 H ,

where D = diag(H e) is a diagonal matrix containing the out-degrees of the net-


work with adjacency matrix H . Mathematically, PageRank is related to probability
distributions so the vector of centralities is usually normalized to sum to one.
Of course, computing this eigenvector with a network as big as the WWW
(which has billions of nodes) is a challenge in itself. For reasons of expediency, an
additional parameter α (not to be confused with the one previously used for the
Katz index) is introduced and rather than working with S we work with

1–α T
P = αS + ee . (15.6)
n
164 Spectral Node Centrality

The parameter is motivated by the suggestion that every so often, instead of


following an out-link, our surfer teleports randomly to another page somewhere
on the internet, preventing the user from getting stuck in a corner of the WWW.
The value α = 0.85 has been shown to work well in internet applications, but
there is no reason whatsoever to use this same parameter value when PageRank is
used on general complex networks.

Example 15.5

2 1

7 5

3 4

Figure 15.5 A directed network illus-


trating PageRank centrality

The normalized PageRank of the network in Figure 15.5 is


 T
PG = 0.301 0.308 0.030 0.211 0.107 0.021 0.021

when α = 0.85. Notice that node 1 has higher PageRank than node 4 due
to its in-link from node 2. In this example, the rankings vary little as we
change α.

15.4 Subgraph centrality


Katz centrality is computed from the entries of the matrix


K= α l Al .
l=0

We can easily generalize this idea and work with other weighted sums of the
powers of the adjacency matrix, namely,


f (A) = cl Al . (15.7)
l=0

The coefficients cl are expected to ensure that the series is convergent; they
should give more weight to small powers of the adjacency matrix than to the
larger ones; and they should produce positive numbers for all i ∈ V .
Subgraph centrality 165

Notice that if the first of the three requirements hold then (15.7) defines a
matrix function and we can use theory introduced in Chapter 12. The diagonal
entries, fi (A) = f (A)ii , are directly related to subgraphs in the network and the
second requirement ensures that more weight is given to the smaller than to the
bigger ones.

Example 15.6

1
Let us examine (15.7) when we truncate the series at l = 5 and select cl =
l!
to find an expression for fi (A).
Using information collected in Chapter 13 on enumerating small sub-
graphs we know that

(A2 )ii = |F1 (i)|, (15.8)

(A3 )ii = 2|F2 (i)|, (15.9)

(A4 )ii = |F1 (i)| + |F3 (i)| + 2|F4 (i)| + 2|F5 (i)|, (15.10)

(A5 )ii = 10|F2 (i)| + 2|F6 (i)| + 2|F7 (i)| + 4|F8 (i)| + 2|F9 (i)|. (15.11)

where the rooted fragments are illustrated in Figure 15.6. So,

fi (A) = (c2 + c4 )|F1 (i)| + (2c3 + 10c5 )|F2 (i)| + (c4 )|F3 (i)| + (2c4 )|F4 (i)|
+ (2c4 )|F5 (i)| + (2c5 )|F6 (i)| + (2c5 )|F7 (i)| + (4c5 )|F8 (i)|
+ (2c5 )|F9 (i)|.
(15.12)

1
By using cl = we get
l

13 5 1 1
fi (A) = |F1 (i)| + |F2 (i)| + |F3 (i)| + |F4 (i)|
24 12 24 12
1 1 1 1 1
+ |F5 (i)| + |F6 (i)| + |F7 (i)| + |F8 (i)| + |F9 (i)|.
12 60 60 30 60
(15.13)

Clearly, the edges (and hence node degrees) are making the largest
contribution to the centrality, followed by paths of length two, triangles,
and so on.

continued
166 Spectral Node Centrality

Example 15.6 continued

(a) F1(i ) (b) F2(i ) (c) F3(i )

(d) F4(i ) (e) F5(i ) (f) F6(i )

(g) F7(i ) (h) F8(i ) (i) F9(i )

Figure 15.6 A collection of rooted fragments (roots designated by the letter i)

To define subgraph centrality we do not truncate (15.7) but work with the
matrix functions which arise with particular choices of coefficients cl . Some of
the most well known are
.∞ /
 Al ( )
EEi = = eA ii , (15.14)
l=0
l!
ii
.∞ /
 A2l+1
odd
EEi = = (sinh(A))ii , (15.15)
l=0
(2l + 1)!
ii
.∞ /
 A2l
even
EEi = = (cosh(A))ii , (15.16)
l=0
(2l)!
ii
.∞ /
 Al 6 7
res
EEi = = (I – αA) –1
, 0 < α < 1/λ1 . (15.17)
l=0
αl ii
ii

odd even
Notice that EE and EE take into account only contributions from odd or
even closed walks in the network, respectively. We will refer generically to EE as
the subgraph centrality. Using the spectral decomposition of the adjacency ma-
trix, these indices can be represented in terms of the eigenvalues and eigenvectors
of the adjacency matrix as follows:
Subgraph centrality 167


EEi = ql (i)2 exp(λl ),
l=0



EEiodd = ql (i)2 sinh(λl ),
l=0



EEieven = ql (i)2 cosh(λl ),
l=0

∞
ql (i)2
EEires = , 0 < α < 1/λ1 .
l=0
1 – αλl

Example 15.7

We compare some centrality measures for the network illustrated in


Figure 15.7.
The regularity means that most centrality measures are unable to distin-
guish between nodes. The degree of each node is equal to six. Also, because
the network is regular, q1 = e/3 and the closeness and betweenness centrali-
ties are uniform with CC(i) = 0.8 and BC(i) = 2 for all i ∈ V . Observe also
that each node is involved in ten triangles of the network.
The subgraph centrality, however, differentiates two groups of nodes
{1, 3, 4, 6, 8} with EEi = 45.65 and {2, 4, 7, 9} with EEi = 45.70. This indi-
cates that the nodes in the second set participate in a larger number of small
subgraphs than those in the first group. For instance, each node in the second
group takes part in 45 squares versus 44 for the nodes in the first group.

Figure 15.7 A regular graph with com-


mon degree 6
168 Spectral Node Centrality

15.4.1 Subgraph centrality in directed networks


Subgraph centrality can be calculated for both directed and undirected networks
using (15.14). Recall that a directed closed walk is a succession of directed links of
the form uv, vw, wx, . . . , yz, zu. This means that the subgraph centrality of a node
in a directed network is EE(i) > 1 only if there is at least one closed walk that
starts and returns to this node. Otherwise EE(i) = 1. The subgraph centrality of a
node in a directed network indicates the ‘returnability’ of information to this node.

Example 15.8

Figure 15.8 A network representing the


observed flow of votes from 2000–2013
between groups of countries in the Euro-
vision song contest

In the Eurovision song contest, countries vote for their favourite songs from
other countries. We can represent these countries as nodes and the votes as
directed edges. The aggregate voting over the 2000–2013 contests has been
measured with links weighted according to the sum of votes between coun-
tries over the 14 years.2 The countries can be grouped together according
to their pattern of votes. Groups which vote in a similar way are represented
by the directed network illustrated in Figure 15.8. The labels correspond to
countries as follows.

1. Azerbaijan, Ukraine, Georgia, Russia, Armenia, Belarus, Poland,


Bulgaria, Czech Republic.
2. Netherlands, Belgium.
3. Moldova, Romania, Italy, Israel.
4. Turkey, Boznia-Herzegovina, Macedonia, Albania, Serbia, Croatia,
Slovenia, Austria, France, Montenegro.
2 Details at tinyurl.com/oq9kpj7.
Subgraph centrality 169

5. Ireland, United Kingdom, Malta.


6. Estonia, Lithuania, Latvia, Slovakia.
7. Iceland, Denmark, Sweden, Norway, Finland, Hungary.
8. Spain, Portugal, Germany, Andorra, Monaco, Switzerland.
9. Greece, Cyprus, San Marino.

The directed subgraph centrality for the groups of countries are


 T
EE = 7.630 2.372 8.579 12.044 1.000 2.950 3.553 5.431 5.431 .

The highest ‘returnabilities’ of votes are obtained for groups 4, 3, and 1.


The lowest returnability of votes is observed for group 5, which has no re-
turnable votes at all, followed by group 2. Curiously, no countries from these
two groups have won the contest in the last 14 years, while all the other groups
(except group 3, which last won in 1998) have won the contest at least once
in that time.

..................................................................................................

FURTHER READING

Langville, A.N. and Meyer, C.D., Google’s PageRank and Beyond: The Science of
Search Engine Rankings, Princeton University Press, 2006.
Estrada, E., The Structure of Complex Networks: Theory and Applications, Oxford
University Press, 2011, Chapter 7.2.
Newmann, M.E.J., Networks: An Introduction, Oxford University Press, 2010,
Chapter 7.
Quantum Physics Analogies
16
In this chapter
16.1 Motivation 170
We introduce the basic principles and formalism of quantum mechanics. We
16.2 Quantum mechanical analogies 171 study the quantum harmonic oscillator and introduce ladder operators. Then,
16.3 Tight-binding models 174 we introduce the simplest model to deal with quantum (electronic) systems,
16.4 Some specific the tight-binding model. We show that the Hamiltonian of the tight-binding
quantum-mechanical systems 177
model of a system represented by a network corresponds to the adjacency
Further reading 178 matrix of that network and its eigenvalues correspond to the energy levels of
the system. We briefly introduce the Hubbard and Ising models.

16.1 Motivation
In Chapter 8 we used classical mechanics analogies to study networks. In a simi-
lar way, we can use quantum mechanics analogies. Quantum mechanics is the
mechanics of the microworld. That is, the study of the mechanical properties of
particles which are beyond the limits of our perception, such as electrons and
photons. We remark here again that our aim is not simply to consider complex
networks in which the entities represented by nodes behave quantum mechanic-
ally but to use quantum mechanics as a metaphor that allows us to interpret some
of the mathematical concepts we use for studying networks in an amenable phys-
ical way. At the same time we aim to use elements of the arsenal of techniques
and methods developed for studying quantum systems in the analysis of complex
networks. Analogies are just that: analogies.
As we will see in this chapter, through the lens of quantum mechanics we can
interpret the spectrum of the adjacency matrix of a network as the energy levels
of a quantum system in which an electron is housed at each node of the network.
This will prepare the terrain for more sophisticated analysis of networks in terms
of how information is diffused through their nodes and links. And we will inves-
tigate other theoretical tools, such as the Ising model, which have applications in
the analysis of social networks. Thus, when we apply these models to networks
we will be equipped with a better understanding of the physical principles used
in them.
Quantum mechanical analogies 171

16.2 Quantum mechanical analogies


We start this chapter with the postulates of quantum mechanics.

1. A quantum mechanical system is defined by a Hilbert space H con-


taining complex-valued vectors ψ(x) known as wave functions which
describe the quantum state of moving particles. Wave functions are often
normalized so that
 ∞
|ψ(x)|2 dx = 1. (16.1)
–∞

2. Every physical observable is represented by a linear differential operator


which acts on ψ(x) and is self-adjoint or Hermitian, i.e. for the operator
Ô, Ô = Ô∗ . That is,
 ∞  ∞
ψ ∗ (x)(Ôφ)(x)dx = (Ôψ)∗ (x)φ(x)dx, (16.2)
–∞ –∞

for all square-integrable1 wave functions φ and ψ.


3. In any measurement of the observable associated with the operator Ô, the
only values that will ever be observed are the eigenvalues λ, which satisfy
the eigenvalue equation

Ôψ = λψ. (16.3)

4. A complete description of the system is obtained from the normalized


state vector ψ, such as through
 ∞
Ôψ = ψ ∗ Ôψdx, (16.4)
–∞

which represents the expected result of measuring the observable Ô.


5. The time evolution of a state of a system is determined by the time-
dependent Schrödinger equation

i h̄ = Ĥψ, (16.5)
dt

where Ĥ is the Hamiltonian or total energy operator. If the Hamiltonian


is independent of time, the energy levels of the system are obtained from
the eigenvalue equation

Ĥψ(r, t) = Eψ(r, t), (16.6)

which then evolves in time according to


–itE ∞
ψ(r, t) = e h̄ ψ(r, 0). (16.7)
1 That is, –∞ |ψ(x)|2 dx < ∞.
172 Quantum Physics Analogies

It is common in quantum mechanics and its applications to network theory to


use the so-called Dirac notation. Using this notation we can make the following
substitutions:

(i) ψ(x) → |ψ.


(ii) ψ ∗ (x) → ψ|.
(iii) ψ(x) = c1 ψ 1 (x) + c2 ψ 2 (x) → |ψ = c1 |ψ 1  + c2 |ψ 2 .

(iv) φ ∗ (r)ψ(r)dr → φ|ψ.

(v) Ôψ(r) → Ô|ψ = |Ôψ.



(vi) φ ∗ (r)Ôψ(r)dr → φ|Ôψ.

To warm up, we will start by considering a simple harmonic oscillator (SHO).


Recall from Chapter 8 that the Hamiltonian for this system can be written

p2 1 p2 1
H(x, p) = + kx2 = + mω2 x2 . (16.8)
2m 2 2m 2

In our new notation, quantizing the system gives

1 2 1
Ĥ = p̂ + mω2 x̂2 . (16.9)
2m x 2

For the momentum we have


p̂x = –i h̄ . (16.10)
∂x

Let us first see what happens if we apply these two operators in a different
order to a given function φ(x). That is,

∂φ(x) ∂
x̂p̂x φ(x) = –i h̄x , p̂x x̂φ(x) = –i h̄ [xφ(x)].
∂x ∂x

We obtain one of the fundamental results of quantum mechanics


 
( ) ∂φ(x) ∂x ∂φ(x)
x̂ p̂x – p̂x x̂ φ(x) = –i h̄ x – φ(x) –x = i h̄φ(x). (16.11)
∂x ∂x ∂x

That is, the momentum and the coordinates do not commute. A common
alternative representation of (16.11) is [ x̂, p̂x ] = i h̄.
Since
" #" #
∂ ∂ ∂2
p̂2x = –i h̄ –i h̄ = –h̄2 2 , (16.12)
∂x ∂x ∂x
Quantum mechanical analogies 173

we can rewrite (16.9) to express the Hamiltonian operator for the SHO as

h̄2 d 2 1
Ĥ = + mω2 x̂2 . (16.13)
2m dx2 2
From (16.6),

h̄2 d 2 ψ 1
– + mω2 x2 ψ = Eψ. (16.14)
2m dx2 2
8
mω 2E
Letting u = x and ε = , (16.14) becomes
h̄ h̄ω

d2ψ
+ (u – ε 2 )ψ = 0. (16.15)
du2
Solutions of this second order differential equation can be written as
2 /2
ψ j (u) = Hj (u)e–u , (16.16)

where Hj (z) is the Hermite polynomial (see Further reading)

2 d j –z2
Hj (z) = (–1)j ez (e ). (16.17)
dzj
Thus
" #1/4 "8 #
1 mω mω 2
ψ j (x) = √ Hj x e–mωx /2h̄ . (16.18)
2 n! π h̄
n h̄

Applying the series solution to the Schrödinger equation we obtain the values
of the energy of the quantum SHO
" #
1
Ej = h̄ω j + , j = 0, 1, 2, . . . . (16.19)
2

This is notably different from the classical SHO because now the energy can
take only certain discrete values, i.e. it is quantized. Indeed the first energy levels
of the SHO are:
1 3 5
E0 = h̄ω, E1 = h̄ω, E2 = h̄ω, . . . .
2 2 2
A useful technique in solving the quantum SHO is to use the so-called lad-
der operators. The so-called annihilation (lowering) operator ĉ and the creation
(rising) operator ĉ † are defined as
8 " #
mω i p̂
ĉ = x̂ + , (16.20)
2h̄ mω
8 " #
mω i p̂
ĉ † = x̂ – . (16.21)
2h̄ mω
174 Quantum Physics Analogies

These satisfy the commutation relation [ ĉ, ĉ † ] = 1. Additionally

[ ĉ, Ĥ] = h̄ωĉ, [ ĉ † , Ĥ] = –h̄ωĉ. (16.22)

The Hamiltonian of the quantum SHO can be written in terms of the ladder
operators as
" # " #
1 1
Ĥ = –h̄ω ĉ † ĉ + = –h̄ω N̂ + , (16.23)
2 2

where N̂ is the so-called number operator.

Problem 16.1
Show that ĉ lowers the energy of a state by an amount h̄ω and that ĉ † raises the
energy by the same amount.
From (16.22), Ĥ ĉ = ĉ Ĥ – h̄ωĉ. Now consider the effect of Ĥ on the action of
applying the annihilation operator to a state of the system, ĉ|j . That is,

( ) 6 7 ( )( )
Ĥ ĉ|j  = ĉ Ĥ – h̄ωĉ |j  = Ej – h̄ω ĉ|j  .

Then, the operator ĉ has lowered the energy Ej of |j  by h̄ω.


Similarly Ĥ ĉ † = ĉ † Ĥ + h̄ωĉ † and so
( ) 6 7 ( )( )
Ĥ ĉ † |j  = ĉ † Ĥ + h̄ωĉ † |j  = Ej + h̄ω ĉ † |j  ,

which indicates that the operator ĉ † has raised the energy Ej of |j  by h̄ω.

16.3 Tight-binding models


To use our quantum models on networks we assume that we place an electron
at each node of a network. Because the nodes are so heavy in comparison with
the electrons we can assume that the properties of the system are mainly deter-
mined by the dynamics of these electrons. Here, the electrons can play the role
of information which can be transferred between the nodes. This analogy allows
us to use the techniques of condensed matter physics in which the properties of
the solid state and of certain molecular systems are determined by considering a
Hamiltonian of the form
⎡ ⎤

n
h̄ 2 
Ĥ = ⎣– p̂2 + V (rj – rk ) + U (rj )⎦ , (16.24)
j=1
2m j k =j

where V (rj – rk ) is the potential describing the interactions between electrons and
U (rj ) is an external potential which we will assume is zero.
Tight-binding models 175

The electron has a property which is unknown in classical physics, called the
spin. It is an intrinsic form of angular momentum and mathematically it can be
described by a state in the Hilbert space

α|+ + β|–, (16.25)

which is spanned by the basis vectors |±. Using the ladder operators previously
introduced, the Hamiltonian (16.24) can be written as

 1
Ĥ = – tij ĉi † ĉj + Vijkl ĉi † ĉk† ĉl ĉj , (16.26)
ij
2 ijkl

where tij and Vijkl are integrals which control the hopping of an electron from
one site to another and the interaction between electrons, respectively. We can
further simplify our Hamiltonian if we suppose that the electrons do not interact
with each other, so all the Vijkl equal zero. This method, which is known as the
tight-binding approach or the Hückel molecular orbital method is very useful to
calculate the properties of solids and molecules, like graphene. The Hamiltonian
of the system becomes

Ĥtb = – tij ĉiρ† ĉiρ , (16.27)
ij

where ĉiρ creates (and ĉiρ† annihilates) an electron with spin ρ at the node i. We
can now separate the in-site energy αi from the transfer energy βij and write the
Hamiltonian as
 
Ĥtb = αi ĉiρ† ĉiρ + βij ĉiρ† ĉiρ , (16.28)
ij ijρ

where the second sum is carried out over all pairs of nearest-neighbours. Con-
sequently, in a network with n nodes, the Hamiltonian (16.28) is reduced to an
n × n matrix,


⎨ αi , i = j,
Ĥij = βij , i is connected to j, (16.29)

⎩ 0, otherwise.

Assuming a homogeneous geometrical and electronic configuration, it is ap-


propriate to give fixed values to the αi and the βij . Typically, αi = α is a Fermi
energy and βij = β is fixed at –2.70eV for all pairs of connected nodes. Thus,

Ĥ = αI + βA, (16.30)
176 Quantum Physics Analogies

where I is the identity matrix, and A is the adjacency matrix of the graph repre-
senting the electronic system. The energy levels of the system are simply given by
the eigenvalues of the adjacency matrix of the network:

Ej = α + βλj . (16.31)

Notice that since β < 0 we can interpret the eigenvalues of the adjacency
matrix of a network as the negative of the energy levels of a tight-binding system,
as described in this section. This will be very useful when we introduce statistical
mechanics concepts for networks. For each energy level the molecular orbitals are
constructed as linear combinations of the corresponding atomic orbitals for all the
atoms in the system. That is,


ψj = cj (i)qj (i), (16.32)
i

where qj (i) is the ith entry of the jth eigenvector of the adjacency matrix and cj (i)
are coefficients of a linear combination.

Example 16.1

Figure 16.1 A represen-


tation of eigenvalues and
eigenvectors as energy lev-
els and orbitals

In Figure 16.1 we represent the eigenvalues of the adjacency matrix of


a network as the negative energy levels. The corresponding entries of the
eigenvectors are then represented as the ‘network orbitals’ (the equivalent of
molecular orbitals in tight-binding theory). The node contributions (atomic
orbitals) to the network orbital for the lowest energy level (principal eigen-
value) can be used as a measure of the centrality of the nodes in a network.
Some specific quantum-mechanical systems 177

16.4 Some specific quantum-mechanical


systems
16.4.1 The Hubbard model
The Hubbard model is an extension of the tight-binding Hamiltonian in which
we introduce electron–electron interactions. To keep things simple, we allow on-
site interactions only. That is, we consider one electron per site and Vijkl = 0
in (16.26) if and only if i, j, k, and l all refer to the same node. In this case the
Hamiltonian is written as
 
Ĥ = –t Aij ĉiσ† ĉjσ + U ĉi↑† ĉi↑ ĉi↓† ĉi↓ , (16.33)
i,j,σ i

where t is the hopping parameter and U > 0 indicates that the electrons repel
each other.
Notice that if there is no electron–electron repulsion (U = 0 ), we recover the
tight-binding Hamiltonian studied in the previous section.

16.4.2 The Ising model


Again, we consider a network in which we place an electron per node. Each
electron can have either spin up (+) or down (–), as illustrated in Figure 16.2.
In the Ising model, we make the following assumptions.

(i) Two spins interact only if they are located in nearest neighbour nodes.
(ii) The interaction between every pair of spins has the same strength.
(iii) The energy of the system decreases with the interaction between two Figure 16.2 A network with nodes
identical spins and increases otherwise. signed according to spin up (+) or
(iv) Each spin can interact with an external magnetic field H . down (–)

By combining these four assumptions we end up with a simplified Hamiltonian


for the spin system, namely,
 
Ĥ = –J σi σj – H σi , (16.34)
(i, j)∈E i

where σi = ±1 represents the direction of the spin. If J > 0, the interaction is


called ferromagnetic and if J < 0 it is antiferromagnetic. Usually the system is
simplified by assuming no external magnetic field, hence,

Ĥ = –J σi σj . (16.35)
(i, j)∈E
178 Quantum Physics Analogies

Often we let J = β = (kB T )–1 be the inverse temperature of the system, where
kB is the Boltzmann constant (more details in Chapter 20). Then, at low tempera-
ture, configurations in which most spins are aligned have lower energy. It is easy
to imagine some potential applications of the Ising model in studying complex
networks. If we consider a social network in which the nodes represent people
and the links their social interactions, the spin can represent a vote in favour or
against a certain statement. One can use the model to investigate whether the local
alignment of voting or opinions among the nodes can generate a global state of
consensus in the whole network.

..................................................................................................

FURTHER READING

McMahon, D., Quantum Mechanics Demystified, McGraw Hill Education, 2013.


Holznet, S., Quantum Physics for Dummies, John Wiley and Sons, 2013.
Cipra B.A., An introduction to the Ising model, American Mathematical Monthly
94:937–959, 1987.
Global Properties
of Networks I 17
17.1 Motivation 179
In this chapter 17.2 Degree–degree correlation 179

We study the correlation between the degrees of the nodes connected by links 17.3 Network reciprocity 187
in a network. Using these correlations, we classify whether a network is as- 17.4 Network returnability 189
sortative or disassortative, indicating the tendency of high-degree nodes to be Further reading 191
connected to each other or to low-degree nodes, respectively. We show how
to represent this statistical index in a combinatorial expression. We also study
other global properties of networks, such as the reciprocity and returnability
indices in directed networks.

17.1 Motivation
Characterizing complex networks at a global scale is necessary for many reasons.
For example, we can learn about the global topological organization of a given
network; it also allows us to compare networks with each other and to obtain
information about potential universal mechanisms that give rise to networks with
similar structural properties. First we will uncover global topological properties by
analysing how frequently the high-degree nodes, or hubs, in a network are con-
nected to each other. We will check the average reciprocity of links in a directed
network and the degree to which information departing from a node of a network
can return to its source after wandering around the nodes and links. In Chapter 18
we will look at other important global topological properties of networks, too.

17.2 Degree–degree correlation


To measure degree–degree correlation, we record the degrees, ki and kj , of the
nodes incident to every edge (i, j) ∈ E in the network. We can then calculate
statistics on this set of ordered pairs. If we quantify the linear dependence be-
tween ki and kj by means of the Pearson correlation coefficient, we will obtain a
value –1 ≤ r ≤ 1. The networks where r > 0 (positive degree–degree correlation)
are known as assortative and those for which r < 0 (negative degree–degree cor-
relation) are known as disassortative ones. Those networks for which r = 0 are
simply know as neutral.
180 Global Properties of Networks I

Example 17.1

In Figure 17.1 we have marked each ordered pair of two real-world networks as dots. The social network illustrated is
an example of an assortative network, while the mini internet illustrated in the same figure is disassortative.

(a) (b)
Figure 17.1 (a) Social network of the American corporate elite (b) the internet at the autonomous
system level

Let e(ki , kj ) be the fraction of links that connect a node of degree ki to a node of
degree kj . For mathematical convenience, we will consider ‘excess degrees’, which
are simply one less than the degree of the corresponding nodes. Let p(kj ) be the
probability that a node selected at random in the network has degree kj . Then,
the Pearson correlation coefficient for the degree–degree correlation is given by

1 
r= ki kj [e(ki , kj ) – q(ki )q(kj )] , (17.1)
σq2 k k
i j

where

(kj + 1)p(kj + 1)
q(kj ) =  , (17.2)
ki p(ki )
i

represents the distribution of the excess degree of a node at the end of a randomly
chosen link and σq2 is the standard deviation of the distribution q(kj ). We call
Degree–degree correlation 181

this index the assortativity coefficient of a network for obvious reasons. It can be
rewritten as
⎛ ⎞2
1  1 
ki kj – ⎝ (ki + kj )⎠
m (i, j)∈E 2m (i, j)∈E
r= ⎛ ⎞2 , (17.3)
1  2 1 
(k + k2j ) – ⎝ (ki + kj )⎠
2m (i, j)∈E i 2m (i, j)∈E

where m = |E|.
A revealing property of assortativity can be found by showing that the
denominator of (17.3) is nonnegative. We can confirm that
   
(k2i + k2j ) = k3i and (ki + kj ) = k2i ,
(i, j)∈E i (i, j)∈E i

thus we can write the denominator in (17.3) as


⎡ . /2 ⎤ ⎡ . /2 ⎤
1  3 1  2 1   
⎣ ki – ki ⎦ = ⎣ ki k3i – k2i ⎦
2m i
2m i
4m2 i i i
⎡ ⎤
1 ⎣ 
= ki kj (k2i + k2j ) – 2 (ki kj )2 ⎦
8m2 i, j i, j
⎡ ⎤
1 
= ⎣ ki kj (ki – kj )2 ⎦ ≥ 0.
8m2 i, j

Equality is reached if and only if ki = kj for all i ∈ V , j ∈ V , i.e. for a regular


graph. That is, in a regular network the assortativity coefficient is undefined.
In Table 17.1 we give some values of the assortativity coefficient for a few
real-world networks.

Table 17.1 Assortativity coefficients of real-world networks.

Network r Network r

Drug users –0.118 Roget thesaurus 0.174


Inter-club friendship –0.476 Protein structure 0.412
Students dating –0.119 St. Marks food web 0.118
182 Global Properties of Networks I

Example 17.2

The two networks illustrated in Figure 17.2 correspond to food webs. The first represents mostly macroinvertebrates,
fishes, and birds associated with an estuarine sea-grass community, Halodule wrightii, at St Marks National Wildlife
Refuge in Florida. The second represents trophic interactions between birds and predators and arthropod prey of
Anolis lizards on the island of St Martin in the Lesser Antilles. The first network has 48 nodes and 218 links and the
second has 44 nodes and 218 links.
The assortativity coefficient for these two networks are St Marks r = 0.118 and St Martin r = –0.153. In St
Marks, low-degree species prefer to join to other low-degree ones, while high-degree species are preferentially linked
to other high-degree ones. On the other hand, in the food web of St Martin, the species with a large number of trophic
interactions are preferentially linked to low-degree ones.

(a) St Marks (b) St Martin

Figure 17.2 Illustration of two food webs with different assortativity coefficient values

17.2.1 Structural interpretation of the assortativity


coefficient
Although the idea behind assortativity is very straightforward to describe, there
are many cases, even for small networks, in which the measure is hard to
rationalize in simple terms.

Example 17.3

Consider the two small networks illustrated in Figure 17.3. The networks are
almost identical, except for the fact that the one on the right a path of length 2
instead of one of length 1 attached to the cycle. Despite their close structural
similarity, the two have very different degree assortativity.
Degree–degree correlation 183

(a) r = −0.538 (b) r = 0.200

Figure 17.3 Two similar networks with very different assor-


tativity coefficients

If we analyse what has been written in the literature about the meaning of
assortativity, it is evident that a clear structural interpretation is necessary. For in-
stance, it has been said that in assortatively mixed networks ‘the high-degree vertices
will tend to stick together in a subnetwork or core group of higher mean degree than the
network as a whole’1 . However, this may not be visible to the naked eye.

Example 17.4

In the network illustrated in Figure 17.4, the nodes enclosed in the circle
indicated with a broken line are among the ones with the highest degree in
the network and they are apparently clumped together. Thus we might expect
that this network is assortative. However, the assortativity coefficient for this
network is r = –0.304, showing that it is highly disassortative.

1 Newman, M. E., Assortative mix-

Figure 17.4 Apparent assortativity within a network ing in networks, Physical Review Letters,
89:208701. (2002)
184 Global Properties of Networks I

So what can we learn about the structure of complex networks by analysing


the assortative coefficient? In other words, what kinds of structural characteristics
make some networks assortative or disassortative?
To answer these questions we look for structural interpretations of the different
components of (17.3).
The terms common to the numerator and denominator can be rewritten
    
(ki + kj ) = k2i = (k2i – ki + ki ) = (ki (ki – 1)) + ki = 2|P2 | + 2m.
(i, j)∈E i i i i
(17.4)
To interpret the remaining term in the numerator we first rearrange the
formula for the number of paths of length 3 to
  
|P3 | = (ki –1)(kj –1)–3|C3 | = ki kj – (ki +kj )+m–3|C3 |. (17.5)
(i, j)∈E (i, j)∈E (i, j)∈E

Then
 
ki kj = (ki +kj )–m+3|C3 |+|P3 | = m+2|P2 |+|P3 |+3|C3 |. (17.6)
(i, j)∈E (i, j)∈E

To deal with the final term we first rewrite the expression for the number of
star subgraphs S1,3 as

 " ki # 1 1 3 1 2 1
|S1,3 | = = ki (ki –1)(ki –2) = k – k + ki . (17.7)
i
3 6 i 6 i i 2 i i 3 i

Then
   
(k2i + k2j ) = k3i = 6|S1,3 | + 3 k2i – 2 ki = 6|S1,3 | + 6|P2 | + 2m.
(i, j)∈E i i i
(17.8)
Substituting all these terms into (17.3) gives

1 ( ) 1 ( )2
m + 2|P2 | + |P3 | + 3|C3 | – 2
2|P2 | + 2m
r= m 4m (17.9)
1 ( ) 1 ( )2
6|S1,3 | + 6|P2 | + 2m – 2
2|P2 | + 2m
2m 4m

which simplifies to

|P2 |2
|P3 | + 3|C3 | –
r= m . (17.10)
|P2 |2
3|S1,3 | + |P2 | –
m
Degree–degree correlation 185

Alternatively, let |Pr/s | = |Pr |/|Ps | and |P1 | = m. Multiply and divide the
numerator by |P2 | and we obtain
" #
3|C3 |
|P2 | |P3/2 | + – |P2/1 |
|P2 |
r= . (17.11)
3|S1,3 | + |P2 |(1 – |P2/1 |)

3|C3 |
Since C = is the Newman clustering coefficient,
|P2 |
( )
|P2 | |P3/2 | + C – |P2/1 |
r= ( ). (17.12)
3|S1,3 | + |P2 | 1 – |P2/1 |

Both (17.3) and therefore (17.12) have nonnegative denominators so we need


only consider the sign of the numerator of (17.12) to determine the nature of as-
sortativity. The conditions for assortative and disassortative mixing can be written
as follows.

• A network is assortative (r > 0) if and only if |P2/1 | < |P3/2 | + C.


• A network is disassortative (r < 0) if and only if |P2/1 | > |P3/2 | + C.

Recall that the clustering coefficient is bounded in the range 0 ≤ C ≤ 1.


Examples of assortativity coefficients of real-world networks and the structural
parameters involved in the numerator of (17.12) are given in Table 17.2.
As can be seen in Table 17.2, there are some networks for which
|P3/2 | > |P2/1 | and they are assortative independently of the value of their clus-
tering coefficient. In other cases assortativity arises because |P3/2 | is slightly
smaller than |P2/1 | and the clustering coefficient makes |P3/2 | + C > |P2/1 |.
It is clear that the role played by the clustering coefficient could be secondary in
some cases and determinant in others for the assortativity of a network.

Problem 17.1
Use the combinatorial expression for the assortativity coefficient to show that the
path of n nodes of infinite length is neutral, i.e. r(G) → 0 as n → ∞.
In a path C = 0 and |S1,3 | = 0 so the assortativity formula is further
simplified to
( )
|P2 | |P3/2 | – |P2/1 |
r(Pn – 1 ) = ( ) .
|P2 | 1 – |P2/1 |

Since |P1 | = n – 1, |P2 | = n – 2, and |P3 | = n – 3,

(n – 1)(n – 3) – (n – 2)2 1
r(Pn – 1 ) = =– .
(n – 1)(n – 2) – (n – 2)2 n–2
1
So clearly, lim r(Pn – 1 ) = – lim = 0.
n→∞ n→∞ n–2
186 Global Properties of Networks I

Table 17.2 Assortativity coefficient in terms of path ratios and clustering coefficient in
complex networks

Network |P2/1 | |P3/2 | C r

Prison 4.253 4.089 0.288 0.103


Geom 17.416 22.089 0.224 0.168
Corporate 19.42 20.60 0.498 0.268
Roget 9.551 10.081 0.134 0.174
St Marks 10.537 10.464 0.291 0.118
Protein3 4.406 4.45 0.417 0.412
Zackary 6.769 4.49 0.256 –0.476
Drugs 14.576 12.843 0.368 –0.118
Transc Yeast 12.509 3.007 0.016 –0.410
Bridge Brook 22.419 17.31 0.191 –0.664
PIN Yeast 15.66 14.08 0.102 –0.082
Internet 91.00 11.53 0.015 –0.229

Problem 17.2
Use the combinatorial expression for the assortativity coefficient to show that the
star of n nodes has the maximum possible disassortativity, i.e. r(G) = –1.
In the star with n nodes C = 0 and |P3 | = 0. Also,

|P1 | = n – 1, |P2 | = (n – 1)(n – 2)/2, |S1,3 | = (n – 1)(n – 2)(n – 3)/6.

Thus,
" #
1 1
(n – 1)(n – 2) – (n – 2)
2 2
r(S1,n–1 ) = .
1 1 1
(n – 1)(n – 2)(n – 3) + (n – 1)(n – 2) – (n – 1)(n – 2)2
2 2 4

This simplifies to

1 1
– (n – 1)(n – 2)2 – (n – 2)
r(S1,n–1 ) = 4 " # = 2 = –1.
1 1 1
(n – 1)(n – 2) (n – 2) (n – 2)
2 2 2
Network reciprocity 187

17.3 Network reciprocity


Another index which is based on the ratio of the number of certain subgraphs in
a network is reciprocity. Network reciprocity is defined as
L↔
r= , (17.13)
L
where L ↔ is the number of links for which a reciprocal exists, and L is the to-
tal number of directed links. Thus, if there is a link pointing from A to B, the
reciprocity measures the probability that there is also a link pointing from B to A.
A normalized index allows a better characterization of the link reciprocity in a
network
r–a
ρ= , (17.14)
1–a
where a = L/n(n – 1). In this case, the number of self-loops are not considered for
n  n n
calculating L, so L = Aij – Aii , where Aij is the corresponding entry of
i=1 j=1 i=1
the adjacency matrix A of the directed network. It is also clear that L ↔ = tr(A2 ).
According to this index, a directed network can be reciprocal (ρ > 0), areciprocal
(ρ = 0), or antireciprocal (ρ < 0). The normalized reciprocity index can be
condensed to the formula
n(n – 1)L ↔ – L 2
ρ= . (17.15)
n(n – 1)L – L 2

Example 17.5

2 1

3 4

Figure 17.5 A network with two


pairs of reciprocal links

In the network illustrated in Figure 17.5 there are four reciprocal links (1 – 2, 2 – 1, 3 – 4, and 4 – 3), thus L ↔ = 4, and
the total number of links is L = 9. The probability that a link picked randomly in this network is reciprocal is r = 4/9.
The normalized reciprocity index is then

continued
188 Global Properties of Networks I

Example 17.5 continued

30 · 4 – 81
ρ= = 0.206.
30 · 9 – 81

In Table 17.3 we illustrate some values of the reciprocity index for real-world networks.

Table 17.3 Some reciprocal and antireciprocal networks

Reciprocal ρ Areciprocal ρ Antireciprocal ρ

Metab. E. coli 0.764 Transc. E. coli –0.0003 Skipwith –0.280


Thesaurus 0.559 Transc. yeast –5.5 · 10–4 St Martin –0.130
ODLIS 0.203 Internet –5.6 · 10–4 Chesapeake –0.057
Neurons 0.158 Little Rock –0.025

Problem 17.3
A network with n nodes has reciprocity equal to –0.25. How many links should
become bidirectional for the network to show reciprocity equal to 0.1?
The reciprocity for this network in its current state is given by
n(n – 1)L1↔ – L 2 1
=– ,
n(n – 1)L – L 2 4
where L1↔ is the number of bidirectional links in the network. So,
5L 2 – n(n – 1)L
L1↔ = .
4n(n – 1)
Notice that because L1↔ > 0, the number of directed links is bounded by
L > n(n – 1)/5.
Now, let L2↔ be the number of bidirectional links when the reciprocity is 0.1.
Then,
9L 2 + n(n – 1)L
L2↔ = .
10n(n – 1)
Let L ↔ = L2↔ – L1↔ be the increase in the number of reciprocal links.
Consequently, the number of reciprocal links should be increased by
7[n(n – 1)L – L 2 ]
L ↔ = .
20n(n – 1)
For instance, if the network has n = 22, L = 100, and L1↔ = 2, the number of
reciprocal links should increase by L ↔ ≈ 27. This means that the new network
should have L2↔ = 29 in order to have ρ = 0.1. You can convince yourself by
substituting the values into (17.15).
Network returnability 189

17.4 Network returnability


Consider a directed network with n nodes, and with adjacency matrix D. The
weighted contribution of all returnable cycles in the network gives a global meas-
ure of the returnability of information that depart from the nodes. Because no
cycles of length smaller than 2 exist we have

tr(D2 ) tr(D3 ) tr(Dk )


Kr = + + ···+ + · · · = tr(exp(D)) – n – S,
2 3 k

where S is the number of self-loops in the network.


The relative returnability can be obtained by normalizing Kr to

tr(exp(D)) – n – S
Kr = , (17.16)
tr(exp(A)) – n – S

where A is the adjacency matrix of the same network when all edges are con-
sidered to be undirected. This index is bounded in the range 0 ≤ Kr ≤ 1, where
the lower bound is obtained for a network with no returnable cycle and the upper
bound is obtained for any network with a symmetric adjacency matrix.

Example 17.6

1 2 5

4 3 6

Figure 17.6 A network with zero


returnability

For the network in Figure 17.6, tr(exp(D)) = n so Kr = 0.

Problem 17.4
Find the returnability of the directed triads shown in Figure 17.7.
The returnability for these networks can be written

tr(exp(D)) – 3
Kr = ,
tr(exp(A)) – 3 (a) A (b) B
3
where tr(exp(D)) = j=1 exp(λj (D)) and tr(exp(A)) = exp(2) + 2 exp(–1) be- Figure 17.7 Two triangles with differ-
cause the underlying undirected graph is K3 . In order to find the eigenvalues of ent returnabilities
190 Global Properties of Networks I

the adjacency matrix of the directed network A, we should find the roots of the
polynomial
, ,
, –λ ,
, 0 1 ,
, ,
, 1 –λ 1 , = –λ3 + 2λ + 1 = 0
, ,
, 1 1 –λ ,

√ √
5+1 5–1
which are λ1 = , λ2 = –1, and λ3 = . Notice that λ1 = ϕ, the
2 2
golden ratio, and λ3 = 1 – ϕ. Hence,

eϕ + e–1 + e1 – ϕ – 3
Kr (A) = = 0.567.
e2 – 2e–1 – 3

For the network B we have


, ,
, –λ ,
, 1 0 ,
, ,
, 0 –λ 1 , = –λ3 + 1 = 0.
, ,
, 1 0 –λ ,

Thus its eigenvalues are the cubic roots of unity


" # " #
2π j 2π j
λj = cos + i sin
3 3

for j = 0, 1, 2. Substituting into (17.15) we get Kr (B) = 0.098.

Example 17.7

In Figure 17.8 we illustrate the returnability of all directed triads.

(a) Kr = 1.000 (b) Kr = 0.576 (c) Kr = 0.319 (d) Kr = 0.212

(e) Kr = 0.212 (f )Kr = 0.098 (g) Kr = 0.000

Figure 17.8 Returnability measure of all directed triangles


Network returnability 191
..................................................................................................

FURTHER READING

Estrada, E., The Structure of Complex Networks: Theory and Applications, Oxford
University Press, 2011, Chapters 2.3, 4.5.2, and 5.4.
Newmann, M.E.J., Networks: An Introduction, Oxford University Press, 2010,
Chapter 7.
Newman, M.E.J., ‘Assortative Mixing in Networks’, Physical Review Letters,
89:208701, (2002).
Global Properties
18 of Networks II

18.1 Motivation 192


18.2 Network expansion properties 192 In this chapter
18.3 Spectral scaling method 195 We introduce the concept of network expansion and the spectral scaling
18.4 Bipartivity measures 200 method. These allow us to determine the global topological structure of a
Further reading 205 complex network. We also study how close to bipartite a network structure
is. That is, we characterize how much resemblance a network has to a bipart-
ite graph, which allows us to study real-world situations in which the global
bipartivity is somehow ‘frustrated’.

18.1 Motivation
Consider two networks with structures such as those displayed in Figure 18.1.
Network A displays a very regular, homogeneous type of structure, while network
B has a few regions which are more densely connected than others, and which we
term structural heterogeneities. Now suppose we use a signal θ that propagates
locally through the links of the network from a randomly selected node to other
nodes relatively close by. At the same time we emit another signal ϑ, which starts
at the same node and propagates on a longer length scale. After a certain time,
both signals return to the original node. Because of the homogeneity of network
A at both scales (close neighbourhood of a node and global network) the times
taken by θ and ϑ to return to the original node are linearly correlated. This is true
for any node of the network. However, in B the signal’s path will be influenced
by the heterogeneity in the network and as a consequence, a lack of correlation
between the return times of the two signals is observed. The level of correlation
between the two signals characterizes the type of structure that a network has at
a global scale. In this chapter, we are going to find a mathematical way to obtain
such kinds of correlations for general networks.

18.2 Network expansion properties


The first step towards our characterization of the global topological structure of
a network is to find an appropriate definition of network homogeneity. First take
Network expansion properties 193

(a) A (b) B

t(θ) t(θ)

t(θ) t(θ) Figure 18.1 A homogeneous and a het-


(c) (d) erogeneous network

a network G = (V , E) and select a subset S ⊂ V . We designate by ∂S the set of


edges that have one endpoint in S and the other outside it, i.e. in S . The set ∂S
is called the boundary of S. Let us now define the following index, known as the
expansion constant to be
! $
|∂S| |V |
φ(G) = inf , S ⊂ V , 0 < |S| ≤ . (18.1)
|S| 2
If every time that we select a set S the ratio |∂S|/|S| is about the same,
we can conclude that the network is very homogeneous. In graph theory these
networks are known as expanders or constant expansion graphs. In contrast, those
networks for which the expansion constant changes significantly from one set S
to another are characterized by a lack of structural homogeneity, like network B
in Figure 18.1.

Problem 18.1
Show that the expansion of a cycle network tends to zero as the size of the network
tends to infinity.
The cycle Cn is an example of a connected graph which is divided into two
connected components by removing two edges. Note that the boundary of any
non-empty set S of nodes from Cn must contain at least two edges and since
|V | n
|S| ≤ ≤ ,
2 2
194 Global Properties of Networks II

|∂S|/|S| is bounded below by 2/(n/2) = 4/n. This bound is attained (for even
n) if we take S to be a string of n/2 connected nodes. Consequently,

4
lim φ(Cn ) = lim = 0.
n→∞ n→∞ n

Problem 18.2
n
Show that a complete network has expansion constant φ(Kn ) = if n is even and
2
n+1
φ(Kn ) = if n is odd.
2
If |S| = m then each node in S has (n – m) edges connected to S and so
|∂S|/|S| = n – m. The infimum is attained when m is as large as possible. That
is m = n/2 if n is even and (n – 1)/2 if n is odd.
A key result in the theory of expander graphs connects the expansion constant
to the eigenvalues of the adjacency matrix of the network. Let λ1 > λ2 ≥ · · · ≥ λn
be these eigenvalues. Then the expansion factor is bounded by

λ1 – λ2 %
≤ φ(G) ≤ 2λ1 (λ1 – λ2 ). (18.2)
2

Thus the larger the spectral gap, λ1 – λ2 , the larger the expansion of the graph.

Example 18.1

The Petersen graph, illustrated in Figure 18.2, has spectrum


& '
[3]1 , [1]5 , [–2]4 .

Figure 18.2 The Petersen graph



Consequently, 1 ≤ φ(G) ≤ 2 3. In fact, the expansion constant of the
Petersen graph is known to be φ(G) = 1.
Spectral scaling method 195

18.3 Spectral scaling method


Our aim here is to find mathematical analogues of the signals θ and ϑ used in our
motivational example. A good candidate for the local signal θ is subgraph central-
ity. It counts the number of closed walks emanating from a node and penalizes
the longer walks more heavily than the shorter ones. It represents a signal that
propagates only to the close neighbourhood of a node before returning to it.
Here and elsewhere, we use the odd subgraph centrality instead of EE(i). The
reason is simply that E odd (i) accounts for subgraphs which always contain a cycle.
It avoids the counting of trivial closed walks which go back and forth without
describing a cycle. However, it should be observed that using EE(i) instead of
EE odd (i) does not alter the results obtained.
Assuming that the network is neither regular nor bipartite write


n
EE odd (i) = q21 (i) sinh(λ1 ) + q2j (i) sinh(λj ). (18.3)
j=2

We now analyse what happens if the network displays structural homogeneity.


In this case λ1  λ2 and then we can use the approximation

EE odd (i) ≈ q21 (i) sinh(λ1 ). (18.4)

Obviously, q1 (i) is the eigenvector centrality of the node i in the network.


In Chapter 15 we established that if a network is not bipartite, the eigenvector
centrality is proportional to the number of walks of infinite length that start at
node i, limk→∞ Nk (i). Thus (18.4) indicates a power-law relationship between a
local characterization of the neighbourhood of the node i (namely EE odd (i)) and
its global environment (q1 (i)). This is exactly what we are looking for! A power
law indicates a self-similarity in a system and in this case it is indicative of the
similarities among the local and global topological environments around a given
node. We can express (18.4) as

1 6% 7
ln q1 (i) ≈ ln EE odd (i) – ln sinh(λ1 ) . (18.5)
2

For any network, the straight line (18.5) defines the ideal situation in which
the local and global environments of all the nodes are highly correlated, i.e. an
ideal topological homogeneity in the network. However, the values of ln q1 (i)
and ln EE odd (i) can deviate from a straight line if the network is not par-
ticularly homogeneous. Such deviations can be quantified, for instance, by
measuring the deviation of the eigenvector centrality of the given node from the
relationship (18.5) using
 1/2
q21 (i) sinh(λ1 )
 ln q1 (i) = ln . (18.6)
EE odd (i)
196 Global Properties of Networks II

Now we can identify the following four general types of correlation between
the local and global environments of the nodes in a network. These can in turn be
shown to represent four structural classes of network topology.

18.3.1 Class I: Homogeneous networks


In this case,  ln q1 (i) ≈ 0 for all i ∈ V so

ln[q21 (i) sinh(λ1 )] ≈ ln EE odd (i), ∀i ∈ V .

0.300

0.090
EC(i)

0.060

0.030

0.009
250 5000 75000
EE(i)
Figure 18.3 Spectral scaling for net-
works in Class I (a) Spectral scaling (b) Network structural pattern

18.3.2 Class II: Networks with holes


In this case,  ln q1 (i) < 0 for all i ∈ V so

ln[q21 (i) sinh(λ1 )] < ln EE odd (i), ∀i ∈ V .

0.32
0.28
0.24
0.20

0.16
EE(i)

0.12

0.08
2 4 6 8 10 12 14
Figure 18.4 Spectral scaling for net- EE(i)

works in Class II (a) Spectral scaling (b) Network structural pattern


Spectral scaling method 197

18.3.3 Class III: Core–periphery networks


In this case,  ln q1 (i) > 0 for all i ∈ V so

ln[q21 (i) sinh(λ1 )] > ln EE odd (i), ∀i ∈ V .

0.7
0.5

0.3
EC(i)

0.1

0.05 0.25 0.75 5.00 25.00


EE(i)
Figure 18.5 Spectral scaling for net-
(a) Spectral scaling (b) Network structural pattern works in Class III

18.3.4 Class IV: Networks with mixed topologies


In this case,  ln q1 (i) < 0 for some i ∈ V but  ln q1 (i) > 0 for other values of i.

0.45
0.35
0.25
EC(i)

0.15

0.05
2 6 10 14 18 22
EE(i)
Figure 18.6 Spectral scaling for net-
(a) Spectral scaling (b) Network structural pattern works in Class IV
198 Global Properties of Networks II

Examples 18.2

(i) Figure 18.7 is a pictorial representation of the food web of St Martin Island and its spectral scaling. The network
clearly belongs to Class I.
(ii) Figure 18.8 is an illustration of a protein (left) and its protein residue network (right). In the network, nodes
represent the amino acids and the links represent pairs of amino acids interacting physically. The spectral scaling
plot in Figure 18.9 shows a clear Class II type for the topology of this network. Proteins are known to fold in 3D,
leaving some holes, which in general represent physical cavities where ligands dock. These holes may represent
binding sites for potential drugs.

100

Eigenvector centrality 10–1

10–2
101 102 103 104 105
Subgraph centrality
(a) (b)

Figure 18.7 Spectral scaling of the St Martin food web. Notice that the scaling perfectly
corresponds to a Class I network

(a) (b)
Figure 18.8 Cartoon representation of a protein (a) and its residue interaction
network (b)
Spectral scaling method 199

100

Eigenvector centrality
10–1

10–2

10–3
10–1 100 101 102
Odd subgraph centrality
Figure 18.9 Spectral scaling of the residue net-
work in Figure 18.8(b)

Problem 18.3
Show that an Erdös–Rényi random network belongs to the first class of homoge-
neous networks as n → ∞.
Recall that as n → ∞, the eigenvalues of an Erdös–Rényi network satisfy

λ1 → np, λj≥2 → 0.

Thus, as n → ∞


n
EE odd (i) = q21 (i) sinh(np) + q2j (i) sinh(0) = q21 (i) sinh(np),
j=2

which means that

1 %
ln q1 (i) = ln EE odd (i) – ln sinh(np)
2

and  ln q1 (i) = 0 as required for a network of Class I.

Problem 18.4
It is known that the largest eigenvalue of the adjacency matrix of a certain protein–
protein interaction (PPI) network is 3.8705. Further analysis has shown that the
values of the eigenvector and subgraph centrality of the network are 0.2815 and
0.5603, respectively. Is this PPI network an example of a homogeneous network?
Or does it belong to the class of networks with holes?
200 Global Properties of Networks II

If the PPI network were in Class I, then  ln qi (i) ≈ 0 for all i ∈ V . Using the
given data, we have

 1/2
0.28152 sinh(3.8705)
 ln q1 (i) = ln = 0.6105,
0.5603

which is far from zero. Thus the network does not belong to the class of homo-
geneous networks. In addition, because  ln q1 (i) > 0 for at least one node, the
network cannot belong to the class of networks with holes. The only possibilities
that remain are that the network is in Class III or Class IV but without additional
data we cannot determine which.

18.4 Bipartivity measures


Bipartivity arises naturally in many real-world networks. Think for instance of

a network with nodes V = V1 V2 in which V1 represents a set of buyers and
V2 represents a set of sellers. In general, there are only links connecting buyers
to sellers. However, in some cases there will be sellers who make purchases from
other sellers. In this case, the network is no longer bipartite but it could be possible
that it still has some resemblances to a bipartite network. In this section our aim
is to characterize the global structure of a network in such a way as to introduce a
grey scale in which at the two extremes there are the networks which are bipartite
or extremely nonbipartite, and in the middle all the networks that display some
degree of bipartivity.
The simplest and more intuitive way of defining the bipartivity of a network
is to account for the fraction of links that ‘destroy’ the bipartivity in the network.
That is, to calculate the minimum number of links we have to remove to make
the network perfectly bipartite. Let md be the number of such links in the network
and let m be the total number of links. Then, the bipartivity of a network can be
measured using

md
bc = 1 – , (18.7)
m

where the subindex c is introduced to indicate that this index may have to be found
computationally. That is, we need to search for links that destroy the bipartivity in
the network in a computationally intensive way. We aim to find the best partition
of the nodes into two almost disjoint subsets. Once we have found such a ‘best
bipartition’, we can simply count the number of links that are connecting nodes in
the same set. Although the method is very simple and intuitive, to state, the com-
putation of this index is an example of an NP-complete problem.1 Computational
1 In short, this means that it is extremely approximations for finding this index have been reported in the literature but we
costly to compute for even fairly modestly are not going to explain them here. Instead we will consider other approaches that
sized networks. exploit some of the spectral properties of bipartite networks.
Bipartivity measures 201

Recall that a bipartite network does not contain any odd-length cycle. Because
every closed walk of odd length involves at least one cycle of odd length, we also
know that a bipartite network does not contain any closed walks of odd length.
Consequently, we can identify bipartivity if tr(sinh(A)) = 0. Or, because

tr(eA ) = tr(sinh(A)) + tr(cosh(A)), (18.8)

we have bipartivity if and only if

tr(eA ) = tr(cosh(A)). (18.9)

Using these simple facts we can design an index to account for the degree of
bipartivity of a network by taking the proportion of even closed walks to the total
number of closed walks in the network giving

n
cosh(λj )
tr(cosh(A)) j=1
bs = = n . (18.10)
tr(eA ) 
exp(λj )
j=1

It is evident that bs ≤ 1 with equality if and only if the network is bipartite.

Example 18.3

We consider the effect of adding a new edge, e, to a network G with spectral bipartivity index bs (G). Let us add a new
edge to G and calculate bs (G + e).
Since cosh x > sinh x for all x,

n 
n
EE even = cosh(λj ) > sinh(λj ) = EE odd .
j=1 j=1

Let a and b be the contributions of e to the even and odd closed walks in G + e, respectively and assume that b ≥ a.
Then

bEE even ≥ aEE odd . (18.11)

Adding EE even (a + EE) to each side gives

EE even (a + b + EE) ≥ EE(EE even + a),

which can be written as


EE even EE even + a
bs (G) = ≥ = bs (G + e). (18.12)
EE EE + a + b
This means that adding a link that makes a larger contribution to odd closed walks than even ones will always
decrease the value of bs with respect to that of the original network.
202 Global Properties of Networks II

Suppose we start from a bipartite network with bs = 1 and add new links. How
small can we make bs ? And for what network?

Problem 18.5
Show that bs → 1/2 in Kn as n → ∞.
The eigenvalues of Kn are n – 1 with multiplicity one and –1 with multiplicity
n – 1 so
cosh(n – 1) + (n – 1) cosh(–1) 1
bs (Kn ) = →
exp(n – 1) + (n – 1) exp(–1) 2
as n → ∞.

Of all graphs with n nodes, the complete graph is the one with the largest num-
ber of odd cycles. So limn→∞ bs (Kn ) = 1/2 is the minimum value of bs that any
network can take.
Another way of accounting for the global bipartivity of a network is to consider
the difference of the number of closed walks of even and odd length, and then to
normalize the index by the sum of closed walks. That is,

n 
n 
n
cosh(λj ) – sinh(λj ) exp(–λj )
j=1 j=1 tr(exp(–A)) j=1
be = = = n . (18.13)
n n
tr(exp(A)) 
cosh(λj ) + sinh(λj ) exp(λj )
j=1 j=1 j=1

Clearly, 0 ≤ be ≤ 1.

Problem 18.6
Show that be = 1 for any bipartite network and that be (Kn ) → 0 as n → ∞.
For a bipartite network the spectrum of the adjacency matrix is symmetrically
distributed around zero. Since sinh is an odd function.

n
sinh(λj ) = 0
j=1

and so be = 1.
Using the eigenvalues of the adjacency matrix of a complete graph we have

exp(1 – n) + (n – 1) exp(1)
be (Kn ) = → 0, (18.14)
exp(n – 1) + (n – 1) exp(–1)

as n → ∞.

Problem 18.7
Suppose that we repeatedly add edges to a network G and each time we do we
add more odd walks than even walks. Show that be decreases monotonically with
the addition of edges.
Bipartivity measures 203

Suppose again that a and b are the contributions of the new edge e to even and
odd closed walks, respectively, and b ≥ a. Then

(EE even + a) – (EE odd + b) EE even + a EE odd + b


be (G + e) = = – .
EE + a + b EE + a + b EE + a + b

Adding (b + EE)EE odd to each side of the inequality bEE even ≥ aEE odd gives

EE(EE odd + b) ≥ EE odd (EE + a + b),

which means that

EE odd + b EE odd
≥ .
EE + a + b EE

We have previously shown that

EE even EE even + a
≥ .
EE EE + a + b

Combining these last two inequalities gives

" # " #
EE even EE odd EE even + a EE odd + b
be (G) = – ≥ – = be (G + e),
EE EE EE + a + b EE + a + b

as desired.

Examples 18.4

(i) In Figure 18.10 we show how the values of the two spectral indices of bipartivity change as we add links to the
complete bipartite network K2,3 .
(ii) Figure 18.11a illustrates a food web among invertebrates in an area of grassland in England. The values of the
spectral bipartivity indices for this network are bs = 0.766 and be = 0.532, both indicating that the network
displays some bipartite-like structure due to the different trophic layers of interaction among the species. We will
see how to find such bipartitions in Chapter 21.
(iii) Figure 18.11b illustrates an electronic sequential logic circuit where nodes represent logic gates. The values of
the spectral bipartivity indices for this network are bs = 0.948 and be = 0.897, both indicating that the network
displays a high bipartivity.

continued
204 Global Properties of Networks II

Examples 18.4 continued

bs = 1.000 bs = 0.829 bs = 0.769 bs = 0.721


(a) (b) (c) (d)
be = 1.000 be = 0.658 be = 0.538 be = 0.462

bs = 0.692 bs = 0.645 bs = 0.645 bs = 0.597


(e) (f) (g) (h)
be = 0.383 be = 0.289 be = 0.289 be = 0.194

Figure 18.10 Monotonic decay of bipartivity indices as edges are added

(a) (b)

Figure 18.11 (a) A food network in English grassland (b) An electronic circuit network
Bipartivity measures 205

Problem 18.8
Find the conditions for which an Erdös–Rényi (ER) network will have be ≈ 1 and
those for which be ≈ 0 when the number of nodes, n, is sufficiently large.
As n → ∞ the eigenvalues of an ER network are given by λ1 = np, λj≥2 = 0.
Thus,

n – 1 + e–np n–1 e–np


be (ER) = = + .
n – 1 + enp n – 1 + enp n – 1 + enp

There is a regime for which the first term on the right-hand side is domin-
ant in the limit and another in which it is the second. In the first case the limit
approaches 1 if exp(np) → 0. In the second case, the limit will approach 0 if
exp(np) → ∞. Consequently, be (ER) ≈ 1 if exp(np) ! n – 1. This condition is
equivalent to

ln(n – 1)
p! . (18.15)
n

ln(n – 1)
Sometimes p < is sufficient. For example, in an ER network with
n
1,000 nodes and 2,990 links we measure be (ER) ≈ 0.9. Notice that here np ≈ 6
while ln(n – 1) ≈ 7. However in most cases where (18.15) is not satisfied we can
expect very low bipartivity in ER networks.

..................................................................................................

FURTHER READING

Estrada, E., Spectral scaling and good expansion properties in complex networks,
Europhys. Lett. 73:649–655, 2006.
Hoory, S., Linial, N. and Wigderson, A., Expander graphs and their applications.
Bulletin of the American Mathematical Society 43:439–561, 2006.
Sarnak, P., What is an Expander? Notices of the AMS, 51:761–763, 2004.
Communicability
19 in Networks

19.1 Motivation 206


19.2 Network communicability 207 In this chapter
19.3 Communicability distance 212 We introduce the concept of communicability between two nodes in a net-
Further reading 216 work. It accounts for all the possible routes (i.e. walks of a given length) that
connect the corresponding pair of nodes and gives more weight to the shorter
than to the longer ones. We define the communicability distance between two
nodes, which accounts for how well connected the nodes are in a network and
show applications.

19.1 Motivation
The transmission of information is one of the principal functions of complex
networks. Such communication among the nodes of a network can represent the
interchange of thoughts or opinions in social networks, the transfer of information
from one molecule or cell to another by means of chemical, electrical, or other
kind of signal, or the routes of transportation of any material. It is intuitive to think
that this communication mainly takes place by using the shortest route connecting
a pair of nodes. That is, assuming that information is transmitted through the
shortest paths of a network. This is represented in Figure 19.1 (a) by the blue
path between the two marked nodes in the network. However, in any network
different from a tree, there are many other routes for communication between
any pair of nodes. This abundance of alternative routes is of great relevance when
there are failures in some of the links in the shortest path, or simply if there is

Figure 19.1 (a) The shortest path be-


tween the two coloured nodes (b) Alter-
native routes when an edge fails (a) (b)
Network communicability 207

heavy traffic along some routes. Some of these alternative routes are illustrated
in Figure 19.1 (b) where the central link of the shortest path connecting the two
marked nodes has been removed.
We conclude that the nodes in a network have improved communication if
there are many relatively short alternative routes between them, other than the
shortest path. The larger the number of these alternative routes the better the
communicability between the corresponding pair of nodes. This is the topic of this
chapter.

19.2 Network communicability


In order to account for the routes of communication between two nodes in a
network we appeal again to the concept of walk. That is, the communicability of
two nodes is characterized by the number of walks starting at one node and ending
at the other and the relative importance of the walks connecting both nodes in
terms of their length. This allows us to use the following mathematical definition
for the communicability of the nodes p and q:



Gpq = ck (Ak )pq , (19.1)
k=0

where the coefficients ck must fulfil the same requirements stated when we
introduced subgraph centrality. By selecting ck = 1/k! we obtain

∞
(Ak )pq
Gpq = = (exp(A))pq . (19.2)
k=0
k!

Recall that Gpp measures the subgraph centrality of a node. Using the spec-
tral decomposition of the adjacency matrix the communicability function can be
expressed as


n
Gpq = qj (p)qj (p) exp(λj ). (19.3)
j=1

Other communicability functions can also be obtained by choosing the


constants ck in other ways. For example,

odd
Gpq = (sinh(A))pq , (19.4)
even
Gpq = (cosh(A))pq , (19.5)
6 7
res
Gpq = (I – αA)–1 , 0 < α < 1/λ1 . (19.6)
pq

However, hereafter we will refer to (19.2) as the communicability function.


208 Communicability in Networks

The communicability function must reproduce some of our intuition about


the transmission of information in a network. For instance, we expect a very high
communicability between nodes in a complete graph. Also we should expect that
the communicability between the endpoints of a path decays asymptotically to
zero as the length of the path increases. This is easily established. For example,
since
& '
σ (Kn ) = [n – 1]1 , [–1]n–1 ,

we can write


n
Gpq (Kn ) = q1 (p)q1 (q)en–1 + e–1 qj (p)qj (p). (19.7)
j=2

And since the eigenvector matrix Q has orthonormal rows and columns with

q1 = e/ n,
" #
en–1 1
Gpq (Kn ) = + e–1 1 – →∞
n n

as n → ∞.
Now, we turn to the path of n edges Pn . In this case, we know that the
eigenvalues are
" #

λj = 2 cos , (19.8)
n+1

and the entries of the eigenvectors are


8
2 jpπ
qj (p) = sin . (19.9)
n+1 n+1

So
.8 / .8 / 6 7

n
2 jpπ 2 jqπ 2 cos

Gpq (Pn ) = sin sin e n+1

j=1
n+1 n+1 n+1 n+1
n " #" # 6 7
2  jpπ jqπ 2 cos

= sin sin e n+1 .
n + 1 j=1 n+1 n+1

Using the standard trigonometric identity 2 sin θ sin ϑ = cos(θ –ϑ) – cos(θ +ϑ),
n " # 6 7
1  jπ (p – q) jπ (p + q) 2 cos n+1

Gpq (Pn ) = cos – cos e
n+1 n+1 n+1
j=1
n ! " # 6 7 " # 6 7$
1  jπ (p – q) 2 cos n+1
jπ 1 jπ (p + q) 2 cos n+1

= cos e – cos e .
n+1 n+1 n+1 n+1
j=1
Network communicability 209

5
4.5
4
3.5
3
I γ (x)

2.5
2
I4(x)
I3(x)
1.5 I0(x) I2(x)
I1(x) I5(x)
1
Figure 19.2 Plot of the modified Bessel
0.5
functions of the first kind. The verti-
0 cal line at x = 2 helps illustrate that
0 1 2 3 4 5
x Iγ (2) → 0

For j = 1, 2, . . . , n the angles jπ/(n + 1) uniformly cover the interval [0, π ],


justifying the integral approximation
 π  π
1 2 cos θ 1
Gpq (Pn ) ≈ cos(θ(p – q))e dθ – cos(θ (p + q))e2 cos θ dθ , (19.10)
π 0 π 0

where θ = jπ/(n + 1). From integral tables,


 π
1
cos(γ θ )ex cos θ dθ = Iγ (x), (19.11)
π 0

which is a modified Bessel function of the first kind. Hence,

Gpq (Pn ) ≈ Ip–q (2) – Ip+q (2). (19.12)

Figure 19.2 illustrates Bessel functions of the first kind. We see that as γ →
∞, Iγ (2) → 0 very quickly. Consequently, Gn1 (Pn ) ≈ (In–1 (2) – In+1 (2)) → 0 as
n → ∞.
For instance, G1,5 (P5 ) = 0.048 and G1,10 (P10 ) = 2.98 × 10–6 .

Example 19.1

Consider the network of interconnections between the different regions of the visual cortex of the macaque. Figure 19.3
illustrates a planar map of the regions of the visual cortex (a) and the map of communicability between pairs of regions
(b). As can be seen, there are a few small regions which communicate intensely.

continued
210 Communicability in Networks

Example 19.1 continued

4
×10
TPT
TS3
TS2
7
TS1
TGD
REIT
PROA
PAL
KA
PAAC
PAAL
PAAR
G
6
A13
A11
A12
A45
A10
A14
A25
A32
A9
A24
5
A23
A3
SMA
A6
A4
A35
ID
IG
A7b
S2
4
R1
A5
A2
A1
A3b
A3a
ER
TGV
A46
FEF
3
A7a
DP
VIP
LIP
PIP
PO
TH
TF
STPa
STPp
2
AITv
AITd
CITv
CITd
PITv
PITd
FST
MSTI
MSTd
MT
1
V4t
VOT
V4
V3a
VP
V3
V2
V1
0

(a) (b)

Figure 19.3 (a) A representation of the macaque cortex (b) The communicability between regions

Problem 19.1
Let S1,n–1 be a star with n nodes in which we have labelled the central node
as 1. Given that the eigenvectors associated with the largest and the smallest
eigenvalues are

1 √ T
q1 = √ n–1 1 ··· 1
2(n – 1)

and

1  √ T
qn = √ – n–1 1 ··· 1
2(n – 1)

and that the spectrum is


! 1  √ 1 $

σ (S1,n–1 ) = n – 1 , [0]n–2 , – n – 1 ,

find expressions for the communicability between any pair of nodes in the
network.
Network communicability 211

In the star graph there are two nonequivalent types of pairs of nodes. One
is formed by the central node and any of the nodes of degree one, the other is
formed by any pair of nodes of degree one. Let us designate the communicability
between the first type by Gp1 (S1,n–1 ) and the second by Gpq (S1,n–1 ).
By substituting the values of the eigenvalues and eigenvectors into the
expression for the communicability we have

1 1 √ 1 1 √  n–1
Gp1 (S1,n–1 ) = √ √ e n–1 – √ √ e– n–1 + qj (1)qj (p), p = 1.
2 2(n – 1) 2 2(n – 1) j=2
(19.13)
From the orthonormality of the eigenvectors we deduce that


n 
n–1
0= qj (1)qj (p) = qj (1)qj (p) + q1 (1)q1 (p) + qn (1)qn (p). (19.14)
j=1 j=2

Thus,


n–1
qj (1)qj (p) = 0, (19.15)
j=2

and if p = 1,

. / 6√ 7
√ √
1 e n–1
– e– n–1 sinh n–1
Gp1 (S1,n–1 ) = √ = √ .
n–1 2 n–1

Similarly,

1 √ 1 √  n–1
Gpq (S1,n–1 ) = e n–1 + e– n–1 + qj (p)qj (q). (19.16)
2(n – 1) 2(n – 1) j=2

Again, from orthonormality considerations, if p and q are different and not


equal to 1,


n–1
1
qj (p)qj (q) = – . (19.17)
j=2
n–1

Thus for p, q > 1, p = q,

1  √ 
Gpq (S1,n–1 ) = cosh( n – 1) – 1 . (19.18)
n–1

Notice that the communicability between the central node and any other node
is determined only by walks of odd length while that between any two noncentral
nodes is determined by even length walks only.
212 Communicability in Networks

19.3 Communicability distance


We start by recalling the results obtained by Milgram in his 1967 experiment
that gave rise to the concept of ‘small-world’ networks. In that experiment it was
clearly observed that a letter which arrives at its destination travels through a very
small number of steps. However, letters can be lost due to the fact that nodes
along the route may be involved in many transitive relations.
The first of these observations describes a situation when a letter travels along
(close to) shortest paths connecting the origin and the destination, if we assume
that the average path length in the network is very small. The second observation
is accounted for by the fact that the nodes in the network display a high clustering
coefficient. As we have seen before, these are the two main ingredients of the
Watts–Strogatz model of ‘small-world’ networks.
We can devise a simple strategy in order to optimize the chances that a letter ar-
rives at its destination in a given network. The sender is supposed to know nothing
about the global structure of the network. She only knows who her acquaintances
are in that network. Instead of selecting one of these acquaintances at random, the
sender is supposed to give her letter to one of the most connected of her ties. If
the new receiver of the letter also uses the same strategy the result is that the letter
travels through highly connected nodes with high probability. Highly connected
nodes have a large number of paths crossing through them. Thus, the probabil-
ity that the shortest path connecting the origin and the destination of the letter
passes through such highly connected nodes is very high (see Figure 19.4(a)).
Consequently, the letter should arrive in a small number of steps if it is sent via
the highly connected acquaintances of each node. However, it is also expected
that those highly connected nodes are involved in a large number of triangles due
to the transitivity of the relations. Consequently, sending the letter through these
nodes also increases the chances that the letter is lost (see Figure 19.4(b)). In
closing, sending the letter through the highly connected acquaintances of a node
both increases the chance that the letter arrives at the destination because it uses
the shortest path from the origin to the destination; and increases the chance that
the letter is lost due to the high transitivity in which those nodes are involved.
This apparent paradox in the communication between two nodes when the
shortest path route is used motivates an index that accounts for communication

Figure 19.4 (a) Shortest path between


source and target passes through the hub
(b) Walks starting and ending at the hub (a) (b)
Communicability distance 213

routes that maximize the communication between a pair of nodes but reduce the
disruption in the communication due to transitivity of the relationships.
In order to identify routes that maximise the communication between nodes p
and q, it is intuitive to think about the communicability function Gpq , which ac-
counts for the amount of information that departs from p and successfully arrives
at q. The ‘disruption’ in the communication is represented by the information
that departs from p and after wandering around the nodes of the network returns
again to p. A natural index to account for this disrupted information is Gpp . If
we assume that the communication between p and q is bidirectional, there is also
disruption in the information sent from q, accounted for by Gqq .
We define an index that accounts for the amount of information disrupted
minus the amount of information that arrives at its destination with

 def
ξpq = Gpp + Gqq – 2Gpq . (19.19)


Minimizing ξpq represents the case in which we minimize loss of information and
at the same time maximize the information that finally arrives at its destination.

We can show that ξpq can be viewed as the square of an appropriately defined
Euclidean distance between the nodes p and q of the network.
Let A = QDQT be the spectral decomposition of the adjacency matrix of a
network and denote the pth row of Q by uTp . Then we can write


& 'T & D/2 '
ξpq = (up – uq )T eD (up – uq ), = (eD/2 (up – uq ) (e (up – uq )

since eD = eD/2 eD/2 . Writing xp = eD/2 up ,


ξpq = (xp – xq )T (xp – xq ) = xp – xq 2 ,

hence
9
ξpq = ξpq
 = x – x 
p q (19.20)

is a Euclidean distance. For obvious reasons, we will call ξpq the communicability
distance between the nodes p and q of a graph.
We can define an analogue of the distance matrix for the communicability
 T
distance as follows. Let s = EE11 EE22 · · · EEnn be a column vector of
the subgraph centralities of every node in the graph and let

M = seT + esT – 2 exp(A). (19.21)

Then the communicability distance matrix of a network is given by




X (G) = M, (19.22)


where is the componentwise square root.
214 Communicability in Networks

Problem 19.2
Consider the network illustrated in Figure 19.5. Find the routes connecting nodes
1 and 6 with shortest path and communicability distances.
9 8 7 First, we identify the paths connecting 1 and 6. There are 6, namely,

3 1→2→4→6 1→2→5→4→6
1→2→3→5→4→6 1→2→5→3→4→6
1 → 2 → 3 → 4 → 6 1 → 9 → 8 → 7 → 6.
1 2 4 6
In order to find the one with the shortest path distance, we simply sum the
shortest path distances between the pairs of nodes forming the path (in this un-
5 weighted network the distance between adjacent nodes is 1). It is obvious that the
shortest path distance is 3 for 1 → 2 → 4 → 6.
Figure 19.5 Calculate the communic-
For the communicability distance we proceed in a similar way and calculate
ability of the coloured nodes
ξ12 + ξ24 + ξ46 , ξ12 + ξ23 + ξ34 + ξ46 , and so on. In order to calculate the communi-
cability distance we need
⎡ ⎤
2.482 2.597 1.625 1.704 1.625 0.465 0.299 0.709 1.628
⎢2.597 6.530 5.510 5.842 5.510 1.704 0.470 0.333 0.905⎥
⎢ ⎥
⎢ ⎥
⎢1.625 5.510 5.558 5.510 5.190 1.625 0.416 0.167 0.416⎥
⎢ ⎥
⎢1.704 5.842 5.510 6.530 5.510 2.597 0.905 0.333 0.470⎥
⎢ ⎥
G = eA = ⎢ ⎢1.625 5.510 5.190 5.510 5.558 1.625 0.416 0.167 0.416⎥ .

⎢0.465 1.704 1.625 2.597 1.625 2.482 1.628 0.709 0.299⎥
⎢ ⎥
⎢ ⎥
⎢0.299 0.470 0.416 0.905 0.416 1.628 2.285 1.594 0.704⎥
⎢ ⎥
⎣0.709 0.333 0.167 0.333 0.167 0.709 1.594 2.280 1.594⎦
1.628 0.905 0.416 0.470 0.416 0.299 0.704 1.594 2.285

Note that s is the diagonal of G and from (19.21) and (19.22) we obtain
⎡ ⎤
0.000 1.954 2.188 2.367 2.188 2.008 2.042 1.829 1.230
⎢1.954 0.000 1.033 1.173 1.033 2.367 2.806 2.854 2.647⎥
⎢ ⎥
⎢ ⎥
⎢2.188 1.033 0.000 1.033 0.858 2.188 2.648 2.739 2.648⎥
⎢ ⎥
⎢2.367 1.173 1.033 0.000 1.033 1.954 2.647 2.854 2.806⎥
⎢ ⎥
X (G) = ⎢
⎢2.188 1.033 0.858 1.033 0.000 2.188 2.648 2.739 2.648⎥ .

⎢2.008 2.367 2.188 1.954 2.188 0.000 1.230 1.829 2.042⎥
⎢ ⎥
⎢ ⎥
⎢2.042 2.806 2.648 2.647 2.648 1.230 0.000 1.174 1.778⎥
⎢ ⎥
⎣1.829 2.854 2.739 2.854 2.739 1.829 1.174 0.000 1.174⎦
1.230 2.647 2.648 2.806 2.648 2.042 1.778 1.174 0.000

In this case, ξ12 + ξ24 + ξ46 = 5.081 but ξ19 + ξ98 + ξ87 + ξ76 = 4.808. This means
that according to the communicability distance, the route 1 → 9 → 8 → 7 → 6
is shorter than 1 → 2 → 4 → 6 and it can be confirmed that no other route has
shorter communicability distance. The two routes are marked in Figure 19.6.
In order to understand the differences between the two routes we just need

to recall the definition of communicability distance ξpq = Gpp + Gqq – 2Gpq .
Communicability distance 215

9 8 7 9 8 7

3 3

1 2 4 6 1 2 4 6

5 5
Figure 19.6 A contrast between dista-
(a) Shortest path (b) Shortest communicability route nce and communicability minimization

It indicates a route that maximizes the communicability between the two nodes
and minimizes the disruption. The route 1 → 9 → 8 → 7 → 6 certainly reduces
the chances of getting lost along the way without increasing the length of the route
excessively.
A good analogy for understanding the differences between the shortest path
and the communicability distance is the following. Suppose every node is rep-
resented by a ball of mass proportional to its subgraph centrality. Then a node
participating in many small subgraphs will have a very large mass. Now place the
network onto a rubber sheet which will be deformed according to the masses of
the corresponding nodes. Then to travel from one node to another we need to
follow the ‘geodesic’ paths, which include the deformations of the sheet. Conse-
quently, as illustrated in Figure 19.7, going from one node to another by using a
route that involves nodes of large masses (large subgraph centrality) will increase
the length of the trajectory greatly in comparison with those routes involving low
mass nodes only. In the picture, there are two alternative routes between nodes
1 and 3. The route 1 → 2 → 3 involves node 2 which has very low subgraph

Figure 19.7 Representation of commu-


nicability as the level of deformation of a
surface
216 Communicability in Networks

centrality and barely deforms the rubber sheet. The route 1 → 4 → 3 involves
node 4 which has a large subgraph centrality and produces a large deformation
of the sheet. So the second route can be considered to involve a longer trajectory
due to the large deformation of the ‘space’ produced by node 4.

Example 19.2

Consider the following hypothetical situation. It is necessary to transport a


dangerous substance from the Youngstown–Warren Regional Airport located
in Northeast Ohio to the Elko Mini–JC Harris Field Airport in Nevada, USA.
The transportation should be carried out using only the available commercial
routes connecting airports in the USA.
The shortest path connecting both airports is: Youngstown–Warren (1)
to Akron–Canton Regional (8) to James M. Cox Dayton International (29)
to Dallas/Fort Worth International (118) and from there finally to Elko (17).
In parenthesis we have given the number of airports to which the corres-
ponding one is connected. The Dallas/Fort Worth International airport is the
second most connected in the USA airport transportation network after Chi-
cago O’Hare international, which has 139 connections. Consequently, if an
unfortunate accident occurs during the transportation that necessitates the
closure of Dallas/Fort Worth airport, it will have catastrophic consequences
for the normal operation of the USA transportation network.
We can calculate the minimal communicability distance path connecting
the two airports targeted in this example. The route with the minimum com-
municability distance is: Youngstown (1) to Akron (8) to Dayton (29) to
Louisville (18) to Birmingham (17) to William P. Hobby (27) to Tulsa (16)
to Elko (17). As can be seen, this route does not include any major hub of the
network, e.g., none of the airports involved in this route is among the top 40
most connected ones. Consequently, an unfortunate accident in any airport
in this route will not produce such catastrophic damage as the one produced
by the shortest path route. The shortest path route involves four steps and
that for the communicability distance involves seven steps, but the first has a
sum of communicability distances equal to 3.30 × 108 , while the second has
a sum of communicability distances less than half of the first, 1.56 × 108 .

..................................................................................................

FURTHER READING

Estrada, E., Complex networks in the Euclidean space of communicability


distances, Physics Review E 85:066122, 2012.
Estrada, E., Hatano, N., and Benzi, M., The physics of communicability in
complex networks, Physics Reports 514:89–119, 2012.
Statistical Physics Analogies
20
In this chapter
20.1 Motivation 217
We introduce the basic concepts of thermodynamics and of statistical mech-
anics. We study the micro- and macro-canonical ensembles for an isolated 20.2 Thermodynamics in a nutshell 218

network and a network in a thermal reservoir, respectively. We derive the 20.3 Micro-canonical ensembles 220
fundamental formulae for entropy, Helmholtz, and Gibbs free energies for 20.4 The canonical ensemble 221
classical and quantum systems. We make an interpretation of the concept of 20.5 The temperature in network
temperature as applied to network sciences. theory 225
Further reading 226

20.1 Motivation
In Chapters 8 and 16 we introduced classical and quantum mechanical analo-
gies for studying complex networks. Here, we develop analogies from statistical
mechanics to use in network sciences. The term ‘statistical mechanics’ was intro-
duced by Gibbs in the nineteenth century as a means to emphasize the necessity
of using statistical tools in describing macroscopic physical systems, since it is
impossible to deduce the properties of such systems by analysing the mechanical
properties of its individual constituents. It is possible to obtain some macroscopic
measurements of the system and the goal of statistical mechanics is to use stat-
istical methods to connect these macroscopic properties of the system with the
microscopic structure and dynamics occurring in them.
A fundamental concept of both thermodynamics and statistical mechanics is
that of entropy. This concept has proved useful in areas far removed from thermal
physics and it is now ubiquitous in many fields of physical, biological, and social
sciences. In this chapter we will prepare the terrain for understanding entropy
and its implications for studying networks. We will consider briefly the follow-
ing fundamental questions: What is the physical meaning of entropy? How is it
connected to the information content of a network? What is the network tempera-
ture? What is the connection between network entropy and other thermodynamic
properties?
218 Statistical Physics Analogies

20.2 Thermodynamics in a nutshell


In thermodynamics, we consider a system to be a part of the physical universe
to which we direct our study. In our particular case the system is a network, or,
as we will see later, in Section 20.4 a network and a thermal reservoir. If this
system is considered according to the laws of classical mechanics (see Chapter 8),
a state of the system corresponds to its complete description by means of the
position and momentums of the particles forming the system at a given time. In
quantum mechanics, a state is defined according to the first postulate of quantum
mechanics (see Chapter 16). That is, the quantum state is completely specified
by the wave function ψ(x, t). A microstate of the system is the state in which all
the parameters of the constituent parts of the system are specified. On the other
hand, a macrostate is a state of the system in which the distribution of particles
over the energy levels is specified.
Let us start by considering a network in which we have arbitrarily divided the
set of nodes into two groups (see Figure 20.1). First, we assume that the network
has no interactions with any other system i.e. it is isolated. A system is isolated if it
is unable to exchange energy and matter with the exterior of the system. We also
consider that the state in which the network exists does not change in time. When
the probability of finding the system in a particular microstate does not change
with time we say that the system is in equilibrium.
Label the set of black nodes V1 and the white nodes V2 , and let n1 = |V1 |
and n2 = |V2 |. Denote by U1 and U2 the internal energies of the two parts of
the network. In Figure 20.1, a wall artificially divides the nodes marked in white
from those marked in black. Let us remove this wall. The total number of nodes
is now n = n1 + n2 and (assuming that there is no energy of interaction between
the two parts) the total internal energy is U = U1 + U2 . These properties which
can be obtained as the sum of the values they have in the single systems are
called extensive. The fundamental postulate of thermodynamics states that it should

Figure 20.1 A network with two re-


gions isolated
Thermodynamics in a nutshell 219

be possible to characterize the state of a thermodynamic system by specifying the


values of a series of extensive variables (X0 , . . . , Xk ).
We wish to exploit some of the fundamental laws and principles of thermo-
dynamics. We start with the so-called laws of thermodynamics.

Zeroth law: If the systems A and B are in thermal equilibrium with a third
system C, then A and B must also be in thermal equilibrium.
First law: The internal energy U of an isolated system is constant. The change
of internal energy produced by a process that causes the system to absorb
heat Q and do work W is given by U = Q – W .
Second law: For an isolated system, the entropy S is a state function. No pro-
cess taking place in the system can decrease the entropy. That is, S ≥ 0.
If the system absorbs an infinitesimal amount of heat δQ (the system is not
isolated), the entropy changes by

δQ
dS = , (20.1)
T

where T is the temperature of the system. Notice that while dS is an exact


differential, δQ is not.
Third law: lim S = S0 , where S0 is a constant independent of all parameters
T →0
of the system.
In a system in which the work done is purely mechanical, we can
combine the first and the second laws and write

dU = TdS – δW = TdS – pdV ,

and so by the product rule,

d(U + pV ) = TdS + Vdp.

The quantity H = U + pV is known as enthalpy. This gives a


thermodynamic definition of the temperature as

" #
∂H
T= . (20.2)
∂S p

An important relation between the enthalpy, entropy, and temperature is


given by the Gibbs free energy

F = H – TS. (20.3)
220 Statistical Physics Analogies

20.3 Micro-canonical ensembles


Consider an isolated system and imagine a new system formed from an assembly
of copies of this first system. This is the so-called micro-canonical ensemble. From
the fundamental postulate of statistical mechanics, the entropy of the system is
given by

S = kB ln N , (20.4)

where N is the number of microscopic states accessible to the system, given by


the volume of the phase space in which the observables of the system have the
specified values of the extensive variables; and kB is the Boltzmann constant. This
gives a physical interpretation of entropy as a measure of the dispersion of the
distribution of microscopic states in the system.
Recall from Chapter 8 that for the classical simple harmonic oscillator (SHO)
we can write

p2 mω2 x2
E= + , (20.5)
2m 2
which can be written as the equation of an ellipse

x2 p2
1= + . (20.6)
x2max p2max

where

pmax = 2mE, (20.7)
8
2E
xmax = (20.8)
mω2

and the ‘surfaces’ of constant energy are described by ellipses with these axes. In
this case, the volume of the phase space is the area of the ellipse
8
4E 2 2π E
π pmax xmax = π = . (20.9)
ω2 ω
Suppose that the energy can only be measured with a degree of uncertainty
E. The area of the accessible phase space is given by

2πE + E 2π E E
a = – = 2π . (20.10)
ω ω ω
Thus, the number of microscopic states accessible to the system is obtained by
dividing a by h0 , the size of a cell, in which the phase space is partitioned

E
N = 2π , (20.11)
h0 ω
The canonical ensemble 221

or assuming the smallest possible length of the cell we obtain


E
N = . (20.12)
h̄ω
We can now obtain the value of the entropy for the SHO by using S = kB ln N .
However, because N uses arbitrary units, we can only define S up to an arbitrary
additive constant. We write
a
S = kB ln , (20.13)
0
where 0 has dimensions (pq)dn with d being the dimension of the space and n
the number of particles. Finally, the entropy of the classical SHO is given by
 " # 
E
S = kB ln +1 . (20.14)
h̄ω

20.4 The canonical ensemble


Now suppose that a network is submerged into a thermal bath as illustrated in
Figure 20.2. A thermal bath or reservoir is a system with dimensions much larger
than the network and the energy of interaction between the network and the res-
ervoir is negligible. That is, if ER is the energy of the reservoir and E is the energy
of the network then ER  E and the total energy of the combined network plus
reservoir, ETot = E + ER , is also much larger than E. The ensemble representing
a system which is in thermal contact with a heat reservoir, which is characterized
by a temperature T is known as the canonical ensemble.1
The probability distribution for the energy in the reservoir plus network system
is given by
(E) R (ER ) (E) R (ET – E)
P(E) = = , (20.15) 1 In the macrocanonical ensemble (not
T (ET ) T (ET )
studied here) the system can interchange
where R (ET – E) is the volume of phase space accessible to the reservoir. heat and matter with the exterior.

Figure 20.2 A network submerged into


a thermal bath
222 Statistical Physics Analogies

By taking the logarithm of this expression we get

ln P(E) = ln (E) + ln R (ET – E) – ln T (ET ). (20.16)

Due to the fact that ET  E, it is reasonable to assume that ln R (ET – E) can


be expanded in a Taylor series in powers of E in which we can neglect all terms
of order higher than one,
,
∂ ln R (E) ,,
ln R (ET – E) = ln R (ET ) – E , + ··· . (20.17)
∂E ET

Now, because S = kB ln we have


, ,
∂ ln R (E) ,, 1 ∂SR (E) ,, 1
∂E , =
k ∂E , = k T = β, (20.18)
ET B ET B

where β is the so-called inverse temperature. So we can write (20.16) as

ln P(E) = ln R (ET ) – ln T (ET ) + ln (E) – βE. (20.19)

The first two terms in (20.19) are constant and if we define the constant

T (ET )
Z= (20.20)
R (ET )

then

ln P(E) = ln (E) – βE – ln Z (20.21)

or

1
P(E) = (E) exp(–βE). (20.22)
Z

The normalization constant Z plays a very important role in statistical mech-


anics. It is known as the partition function of the system and can be expressed as

d
Z= exp(–βH) = tr(exp(–βH)). (20.23)
0

The Helmholtz free energy of the system can now be obtained from the
partition function using the expression

F = –β –1 ln Z. (20.24)
The canonical ensemble 223

For instance, for the classical SHO the partition function is written as
  " 2 #
dxdp p mω2 x2
Z= exp –β + , (20.25)
h0 2m 2

which after integration gives

kB T
Z= , (20.26)
h̄ω
so
" # " #
kB T h̄ω
F = –kB T ln = kB T ln , (20.27)
h̄ω β

and
" #  " # 
∂F kB T
S=– = kB ln +1 . (20.28)
∂T V h̄ω

Problem 20.1
Find expressions for the partition function, entropy, Helmholtz, and Gibbs free
energies of the simple quantum harmonic oscillator.
For the quantum SHO we have seen that
" #
 = –h̄ω N
H + 1 .
2

Thus the partition function is


"  " ##
  1
Z = tr(exp(–β H)) = tr exp –h̄ω N + ,
2

which can be written as

  " #   ∞
β h̄ω 

1
Z= exp –β h̄ω j + = exp – exp[–β h̄ωj].
j=0
2 2 j=0

Then,
 
β h̄ω
exp –
2 1
Z= = # " " #
1 – exp[–β h̄ω] β h̄ω β h̄ω
exp – exp –
2 2
" #
1 1 β h̄ω
= " # = csch .
β h̄ω 2 2
2 sinh
2
224 Statistical Physics Analogies

Hence, from (20.27), the Helmholtz free energy of the system is


 " #
β h̄ω
F = β –1 ln 2 sinh ,
2

and by (20.28) the entropy is


" # " #  " #
∂F h̄ω β h̄ω β h̄ω
S=– = coth – kB ln 2 sinh – .
∂T 2T 2 2

Finally, from (20.3),


" #
h̄ω β h̄ω
H= coth .
2 2

There are alternative expressions for the entropy of a system in the quantum
canonical ensemble and we now derive one. The probability of finding the system
in a quantum state with energy Ej is given by

1
pj = exp(–βEj ), (20.29)
Z

which implies that

ln pj = –(ln Z + βEj ), (20.30)


n
1
n
with the condition that pj = exp(–βEj ) = 1.
j=1
Z j=1
Then from (20.27)
" # " #
∂F 1 ∂
S=– = kB ln Z + kB T – ln Z
∂T V kB T 2 ∂β
⎡ ⎤ ⎡ ⎤
β   β 
= kB ⎣ln Z – (–Ej ) exp(–βEj )⎦ = kB ⎣ln Z pj – (–Ej ) exp(–βEj )⎦
Z j j
Z j
 
= kB pj (ln Z + βEj ) = –kB pj ln pj .
j j

This is a beautiful result in which we see that



S = –kB pj ln pj , (20.31)
j
2 You may be familiar with the Shan-
 which connects the information content of a system2 with the thermodynamics of
non formula S = – j pj ln pj which can
be applied to any probability distribution. the system.
The temperature in network theory 225

20.5 The temperature in network theory


It seems a priori very artificial to consider that we submerge a real-world network
into a thermal bath. But we argue that in reality, real-world networks are already
submerged into such ‘thermal baths’. Think, for instance, of our network of social
relations. It is contained in a reservoir of the global social, economic, and political
atmospheres of the society as a whole. Changes in these atmospheres can change
the strength of our social ties. A protein–protein interaction network is contained
in the cell reservoir and subject to any variation in the physical parameters (pH,
temperature, concentration of nutrients, etc) in that cell. A network of corpor-
ations and their business relationships is embedded into the global economic
climate at the moment the network is analysed. Thus the inverse temperature
β can play the role of the external stress to which any network is submitted due
to any change in the environment in which the network is submerged.
Consider the network illustrated in Figure 20.3, which has been submerged
into a thermal bath with inverse temperature β. After equilibration, every link of
the network is weighted by β. If the network is already weighted then every link
weight is multiplied by β.
This means that as the temperature tends to infinity, β → 0 and all links
tend to have zero weights. In other words, the network is fully disconnected. This
resembles the situation of very extreme stress which makes the network totally
dysfunctional. When the temperature tends to zero, β → ∞ and all links have
infinite weights, indicating an infinite strengthening of the connections between
bonded nodes.

Problem 20.2
Consider a tight-binding model for a network with parameters α̃ = 0 and β̃ = –1.
Define the partition function, entropy, Helmholtz, and Gibbs free energies for a
network.
The Hamiltonian for the tight-binding model is H = α̃I + β̃A and the partition
function of the network with α̃ = 0 and β̃ = –1 is

 = tr(exp(βA)).
Z = tr(exp(–β H))

Figure 20.3 Effects of equilibration of a


network in a thermal bath with inverse
temperature β
226 Statistical Physics Analogies

In terms of the eigenvalues of the adjacency matrix we have


n
Z= exp(βλj ).
j=1

Notice that the partition function of the network is just the sum of the subgraph
centralities of the network. The index tr(exp(βA)) is usually called the Estrada
index of the network. Using (20.31) we can obtain the entropy of the network in
the canonical ensemble. Since Ej = –λj , we have pj = exp(βλj )/Z and so,

 1   1 
S = –kB pj (βλj – ln Z) = –λj pj + kB ln Z pj = – λj pj + kB ln Z,
j
T j j
T j
(20.32)
which can be rearranged to give

–β –1 ln Z = – λj pj – TS.
j


Using the expression F = H – TS we obtain H = – λj pj and F = –β –1 ln Z.
j

..................................................................................................

FURTHER READING

Albert, R. and Barabási, A., Statistical mechanics of complex networks, Reviews


of Modern Physics 74:47–97, 2002.
Park, J. and Newmann, M.E.J., Statistical mechanics of networks. Physical Review
E 70:066117, 2004.
Petiti, L., Statistical Mechanics in a Nutshell, Princeton University Press, 2011.
Communities in Networks
21
In this chapter
21.1 Motivation 227
We study the organization of nodes into communities in complex networks.
Communities are groups of nodes more densely connected amongst them- 21.2 Basic concepts of communities 229

selves than with the rest of the nodes of the network. We study some of the 21.3 Network partition methods 230
methods that aim to find such structures in networks. We finish with a method 21.4 Clustering by centrality 235
to detect anti-communities (bipartitions) in networks. 21.5 Modularity 237
21.6 Communities based
on communicability 239
21.7 Anti-communities 245
Further reading 250
21.1 Motivation
In real-world networks, nodes frequently group together forming densely con-
nected clusters which are poorly connected with other parts of the network.
Clusters, also known as communities in network theory, may form for many
reasons. For instance, we all belong to clusters formed by our friends and re-
latives. Inside these groups one can find a relatively high density of ties but in
many cases these are poorly connected to others groups in society. Clusters can
also be formed due to similarities among the nodes. For instance, groups of pro-
teins with similar functions in a protein–protein interaction network may be more
densely connected with each other than with proteins which have different func-
tions. In this chapter, we study how to find these communities of nodes based on
the information provided by the topological structure of the network.

Examples 21.1

(i) In Figure 21.1 we illustrate two well-known cases of networks with communities. The first corresponds to the
friendship ties among individuals in a karate club in a USA university. At some point in time, the members of
this social network were polarized into two different factions due to an argument between the instructor and
the president. These two factions, represented in Figure 21.1(a) in different colours, act as cohesive groups
which can be considered as independent entities in the network. Another example is provided in Figure 21.1(b).

continued
228 Communities in Networks

Examples 21.1 continued

The network represents 62 bottlenose dolphins living in Doubtful Sound, New Zealand. Links are drawn between
two animals if they are seen together more frequently than expected at random. These dolphins split into groups
after one particular dolphin moved away for a period of time.

(a) Karate club (b) Social contacts amongst dolphins

Figure 21.1 Two networks with known structural communities

(ii) Political polarization in the USA leads to a number of clustered networks. We illustrate this with an example
based on political literature. Figure 21.2 represents a network of books on US politics published around the time
of the 2004 presidential election and sold on Amazon.com. There is a link between two books if they share several
common purchasers. As can be seen, there is a clear congregation into two main communities. These represent
the purchases of consumers of conservative literature on one side and of liberal literature on the other.

Figure 21.2 Political books on US politics published around 2004.


Nodes coloured according to the political leaning of the book
Basic concepts of communities 229

In these examples, the clusters were induced by empirical evidence. The ques-
tion that arises is whether we can find such partitions in networks without any
other information than that provided by the topological structure of the network.
When looking at the graph Laplacian we saw that such an approach is viable
via the Fiedler vector. We will return to that subject in Section 21.3, but we will
consider several other approaches, too.

21.2 Basic concepts of communities


Before attempting to detect communities, we will define some useful terms and
quantities. Given a subgraph G1 (C, E1 ) ⊆ G(V , E) with nC = |C| nodes, we
define the internal and external degrees to be the quantities
 
kint
i = aij , kext
i = aij ,
j∈C j∈C

respectively, where C is the complement of C and A is the adjacency matrix of G.


The number of links which connect nodes internally in the subgraph is given by

1  int
mC = k ,
2 i∈C i

and the number of links connecting C and C is



mC – C = kext
i .
i∈C

The number of links that connect nodes in C to C is called the boundary of


C (and is often written ∂C). Finally, we define intra-cluster density. A network
with n nodes can have at most n(n – 1)/2 edges and we can define the density of
a network with m edges and n nodes to be

2m
δ(G) = .
n(n – 1)

Similarly the intra-cluster density is


 int
mC i∈C ki
δint (C) = = .
nC (nC – 1)/2 nC (nC – 1)

There can be at most n1 n2 edges between two groups of nodes of size n1 and
n2 . So we define the inter-cluster density to be
 ext
mC – C i∈C ki
δext (C) = = .
nC (n – nC ) nC (n – nC )
230 Communities in Networks

Example 21.2

We can calculate the quantities defined above for the networks in Figure 21.1.
In the karate club network, the values of the internal densities of the com-
munities C1 (followers of the instructor) and C2 (followers of the president)
are δint (C1 ) = 0.26 and δint (C2 ) = 0.24, respectively, and the inter-cluster
density is δext (C1 ) = δext (C2 ) = 0.035. The total density of this network is
δ(G) = 0.14.
For the network of bottlenose dolphins, the cluster represented in blue has
δint (C1 ) = 0.26, while the one represented in red has δint (C2 ) = 0.14 and the
external density is δext (C1 ) = δext (C2 ) = 0.007, while the total density of the
network is δ(G) = 0.08.

For these clusters found experimentally the internal density of every cluster is
significantly larger than its external density, and of the total density of the net-
work. Another observation is that there is at least one path between every pair of
nodes in a community linked together by edges and nodes in the same commu-
nity. That is, communities are internally connected. We can loosely define a cluster,
or community as follows.

Definition 21.1 A cluster (or community) of a network is an internally connected set


of nodes for which the internal density is significantly larger than the external one.

This principle is the basis for the construction of networks with explicit clus-
ters, which are known as benchmark graphs. Although our two examples relate
to social networks, clustering is important in applications involving many other
types of network, too.

21.3 Network partition methods


We now consider the problem of partitioning a network G(V , E) into p disjoint
sets of nodes such that the following properties hold.

:
p

• Vi = V and Vi ∩ Vj = ∅ for i = j.
i=1

• The number of links crossing between subsets (cut-set size or boundary) is


minimized.
• |Vi | ≈ n/p for 1 ≤ i ≤ p.

The process can be generalized to weighted networks where the cut-set is de-
fined as the sum of the weights of the links crossing subsets and the final condition
Network partition methods 231

is rewritten as Wi ≈ W /p , where Wi is the sum of the weights of the nodes in Vi


and W is the total sum of weights in the whole network. Such a weighted partition
is called balanced.
We now analyse some of the most popular partitioning algorithms.

21.3.1 Local improvement methods


Local improvement methods are amongst the earliest proposed partitioning al-
gorithms and the archetypal method is due to Kernighan and Lin. In general,
these methods take a partition of the network as their input; the most simple
case is by taking a bisection. They then try to decrease the cut-set with a local
search approach. The generation of the initial partition can be carried out using
any popular bisection method, or can simply be generated at random. In the lat-
ter case, several random realizations are needed and the one with minimum cut
size is selected as the input. For the sake of simplicity we are going to consider
only unweighted networks. Given a bisection of G into subgraphs G1 (V1 , E1 ) and
G2 (V2 , E2 ) we define an internal and external weight for each node by splitting its
degree so Wiint is the number of edges from i to nodes in the same partition and
Wiext the number which leave. The cut size is given by
1  ext
C(V1 , V2 ) = W .
2 i∈V i
If we swap node i from one set to another we can measure the gain in the cut
size by gi = Wiext – Wiint , a positive gain indicating an improvement.

Example 21.3

Let us consider the network illustrated in Figure 21.3. We will return to this network repeatedly in this chapter and we
name it Gclus . A random partition of the nodes into subsets of equal size is represented by the dotted line.

V1

V2

V3

Figure 21.3 A random partition of a network


continued
232 Communities in Networks

Example 21.3 continued

Note that gv1 = Wvext


1
– Wvint
1
= 3 – 1 = 2. The gains for nodes v2 and v3 are 1 and –3, respectively. The cut size for this
partition is C = 5.
We reduce the cut size by moving nodes with positive gain across to the other side and the reduction in C is equal
to the gain. Moving node v1 reduces C to 3, as seen in Figure 21.4.

V1

V3

V2

Figure 21.4 An improvement to the partition in


Figure 21.3

Moving v1 unbalances the number of nodes in each partition. Rather than simply analysing the gain obtained by
moving one node from one partition to another, we need to quantify the gain produced by swapping two nodes in
opposite partitions. If v1 ∈ V1 and v2 ∈ V2 then the gain of interchanging is given by

g(v1 ) + g(v2 ) – 2, v1 ∼ v2 ,
g(v1 , v2 ) =
g(v1 ) + g(v2 ), otherwise.

In our example, since v1 and v2 are not adjacent, g(v1 , v2 ) = 3 and the cut size is reduced from 5 to 2. The new
partition is shown in Figure 21.5.

V1

V3
V2

Figure 21.5 An even better partition of Figure 21.3


Network partition methods 233

This node swapping process forms the basis for the Kernighan–Lin algo-
rithm. We start with a balanced bisection {V1 , V2 } and we compute the cut size,
C0 (V1 , V2 ), of the bisection. Then for k = 1, 2, . . . , r, where r = min(|V1 |, |V2 |)
we carry out the following steps.

• Find the pair of nodes v1 ∈ V1 and v2 ∈ V2 which give the biggest value of
g(v1 , v2 ) (which may be negative).
• Label these nodes vk1 and vk2 .
• For any node u adjacent to either vk1 or vk2 we update the value of g(u).
• Calculate

Ck (V1 , V2 ) = Ck–1 (V1 , V2 ) – g(vk1 , vk2 ), k = 1, . . . , r,

and note Cj (V1 , V2 ), the minimum of all these values.


• Update the partitions by moving the sets

{v11 , v21 , . . . , vj1 } and {v12 , v22 , . . . , vj2 }

between V1 and V2 .
• Repeat the whole process until no further improvement is achieved for the
cut size.

The algorithm requires a time proportional to the third power of the number of
nodes in the network, O(n3 ), but improvements can be made to reduce this cost
significantly. In brief, these changes improve the process for swapping nodes;
use a fixed number of iterations; and only evaluate gain for nodes close to the
partition boundary. We introduce this algorithm here not because it is used today
for detecting communities in networks but because it helps us to understand the
intuition behind the partitioning methods for detecting communities.

21.3.2 Spectral partitioning


We have seen that the Fiedler vector can be used to partition a connected network
into two. This technique, which stems from the use of an eigenvector, is an
example of spectral partitioning. In this section we will develop these ideas and see
that it is possible to use spectral information to partition a network into several
pieces.
Recall that the Fiedler vector is the eigenvector, x, associated with the smallest
nontrivial eigenvalue of the graph Laplacian. We associate the ith component of
this vector with the ith node and pick a cut-off value r. The nodes are then split
according to whether xi is bigger or smaller than r. The choice r = 0 is often good
in practice.
234 Communities in Networks

In 1988, Powers showed that eigenvectors of the adjacency matrix can also
be used to find partitions in a network. The idea behind these methods is
that the second largest eigenvector qA2 has both positive and negative compo-
nents, allowing a partition of the network according to the sign pattern of this
eigenvector.

Example 21.4

The eigenvector associated with the second smallest eigenvalue of the


Laplacian matrix of Gclus is

qL2 = [–0.30 – 0.13 0.18 0.29 0.28 0.32 0.34 0.22 – 0.12 – 0.22
–0.33 – 0.51]T .

Similarly, the eigenvector associated with the second largest eigenvalue of


the adjacency matrix of this network is

qA2 = [–.50 – 0.34 0.08 0.13 0.24 0.23 0.16 0.16 – 0.31
–0.42 – 0.37 – 0.20]T .

Both eigenvectors produce the same bipartition of this network as the one
produced by the Kernighan–Lin algorithm.

Sometimes we might expect there to be more than two communities within a


network. Spectral techniques can easily be extended by using extra eigenvectors.
We describe the process with respect to the Laplacian matrix, but there is an
analogous approach for the adjacency matrix.
Suppose, then, that we split the network into two according to the signs of the
Fiedler vector. We now look at the eigenvector, y, associated with the next smallest
eigenvalue. For each of our original clusters we look at the signs of the elements in
the corresponding positions of y and split these clusters into those with positive
and those with negative components. We have now produced four clusters and
we can carry on dividing by looking at further eigenvectors. If we are convinced
a cluster should be divided no further, then we can only apply the information
provided by these eigenvectors to sections of the network. If we go too far down
this route there is less and less mathematical justification, but applied carefully it
can provide convincing clusters.
Clustering by centrality 235

Example 21.5

In Figure 21.6 we show the clusters induced in the karate club by using two eigenvectors. In (a) we have used the
Laplacian matrix and in (b) the adjacency matrix. Notice that in picture (b) not all of the clusters are connected.

(a) Laplacian eigenvector clusters (b) Adjacency matrix eigenvector clusters

Figure 21.6 Two partitions of the Zachary karate club induced by multiple eigenvectors

21.4 Clustering by centrality


Any of our centrality measures can be used to divide a network into clusters.
The idea is to identify those links which are central for the inter-cluster commu-
nication. By removing them, we can find the best partition of the network into
communities. The first such algorithm, proposed by Girvan and Newman, used
edge betweenness centrality. This is a simple extension of the idea of node be-
tweenness centrality that we have seen previously: the betweenness centrality of
an edge a is found by dividing the number of shortest paths between two nodes
that pass through a by the total number of such shortest paths. The expectation
is that links connecting nodes which are located in different communities display
the highest edge betweenness.

Example 21.6

In Figure 21.7 we highlight the two links with highest edge betweenness
centrality for Gclus .

continued
236 Communities in Networks

Example 21.6 continued

Figure 21.7 Selecting links according


to highest betweenness centrality

Removing these links partitions the network in the same way as by using
the Kernighan–Lin approach but does not require an initial bisection.

The Girvan–Newman algorithm for partitioning by edge betweenness can be


described as follows.

• Calculate the edge betweenness centrality for all links in the network.
• Remove all links with the largest edge betweenness.
• Repeat these steps on the new network and continue until all links have been
removed.

We can use the information provided by this algorithm to provide a hier-


archical picture for analysing the community structure of the network. We can
picture this information in a dendrogram which we build from the bottom to
the top.
In a network with n nodes we start by drawing n points to representing these
nodes. Then we find n – 1 clusters by joining one pair of nodes and leaving the
rest isolated. We add links one at a time, pairing up clusters until we have brought
the network back together. We link clusters by replacing edges in the network in
the reverse order of their removal in the Girvan–Newman algorithm.
A significant characteristic of this method is that it gives a range of possible
partitions of the network instead of a fixed one. The authors of the algorithm
state that ‘it is up to the user to decide which of the many divisions represented
is most useful for their purposes’. For complex networks arising in real-world
situations it is preferable to automate this decision making process. This requires
us to develop a quality criterion in order to select one partition over the others.
We will return to this idea shortly.
Modularity 237

Example 21.7

The dendrogram for Gclus is illustrated in Figure 21.8. The top dotted line indicates the division of the network into
two communities, each formed by six nodes. The second dotted line indicates a division into four communities having
3, 3, 5, and 1 nodes, respectively.

Figure 21.8 Dendrogram obtained by the Girvan–Newman


algorithm

21.5 Modularity
Girvan and Newman proposed modularity as a measure of quality of clusters.
They start with the assumption that a cluster is a structural element of a network
that has been formed in a far from random process. If we consider the actual
density of links in a community, it should be significantly larger than the density
we would expect if the links in the network were formed by a random process.

Definition 21.2 Let G(V , E) be a network of n nodes and m edges with ad-
jacency matrix A and suppose we have divided the nodes into nC clusters
V1 , V2 , . . . , VnC . Define sir to equal 1 if node i is in cluster r and 0 otherwise.
Then the modularity of the partitioning is given by

n " #
1  
nC
ki kj
Q= aij – sir sjr ,
4m r=1 i, j=1 2m

where ki is the degree of node i.

Modularity can be interpreted as the sum over all partitions of the difference
between the fraction of links inside each partition and the expected fraction, by
considering a random network with the same degree for each node, giving
⎡ ⎛ ⎞2 ⎤

nC
⎢ k
|E | 1  ⎥
Q= ⎣ – ⎝ kj ⎠ ⎦ , (21.1)
m 4m 2
k=1 j∈V k
238 Communities in Networks

where |Ek | is the number of links between nodes in the kth partition of the net-
work. If the number of intra-cluster links is no bigger than the expected value for
a random network then Q = 0. The maximum modularity is one and the more
positive the value of Q, the more convincing is the community structure.

Example 21.8

We label the two communities in Gclus found with the Kernighan–Lin


algorithm as C1 and C2 as illustrated in Figure 21.9.

C2

C1

Figure 21.9 Clusters induced by the Kernighan–


Lin algorithm

The number of edges and sum of degree for the communities are

 
|EC1 | = 7, kj = 16, |EC2 | = 9, kj = 20,
j∈VC j∈VC
1 2

and the total number of edges is m = 18. From (21.1),

" #2 " #2
7 16 9 20
Q= – + – = 0.383.
18 36 18 36
; <= > ; <= >
C1 C2

The goal of finding this value is to compare it with the modularity of other
partitions of the same network, as we will see later, in Section 21.6.
Communities based on communicability 239

21.5.1 The problem of resolution

Example 21.9

A network is pictured in Figure 21.10 and the partitioning that maximizes Q


is shown.

Figure 21.10 A manifestation of the resolution


problem of modularity

In this case a more natural clustering would be achieved by splitting the


network into ten communities.

The problem highlighted in the last example is a phenomenon called resolution


limit. The modularity index can be fooled if there are some well-defined com-
munities adjacent to bigger ones or where many small communities are circularly
connected. Modularity tends to group together pairs of the small communities in-
stead of identifying them as independent ones. This suggests that it may be better
to use more than one quality measure in determining optimal partitions.

21.6 Communities based


on communicability
In Chapter 19 we introduced a communicability function which accounts for
the volume of information transmitted from one node to another in a network
by using all possible routes between them. The concept of communicability
240 Communities in Networks

introduces an intuitive way for finding the structure of communities in complex


networks: a community is a group of nodes which have better communicability
among themselves than with the rest of the nodes in the network.
We start by considering the communicability function between a pair of nodes
p and q written in terms of the eigenvalues and eigenvectors of the adjacency
matrix as


n
Gpq = q1 ( p)q1 (q) exp(λ1 ) + qj ( p)qj (q) exp(λj ). (21.2)
j=2

Recall from Chapter 18 that if a network has a large spectral gap (λ1  λ2 )
then we can expect it to be homogeneous, which indicates that it contains no
communities or clusters due to the lack of any cut-set in the network. If this is
the case, the term q1 ( p)q1 (q) exp(λ1 ) dominates (21.2). Whether there is a large
spectral gap or not, we can interpret this term as the extent to which the network
forms a single cohesive unit and use the other terms to indicate potential clusters.
One approach is to make use of the signs of the elements of eigenvectors. This
is illustrated in Figure 21.11 where we show the sign of the eigenvectors corres-
ponding to the four largest eigenvalues of Gclus . As we have seen, the sign pattern
of the second largest eigenvector of the adjacency matrix induces a partition of
the network. We represent this by using two different kinds of arrows for positive
and negative contributions for λ2 , λ3 , and λ4 .

(a) λ1 = 3.4165 (b) λ2 = 2.4562

Figure 21.11 Using eigenvector com-


ponents to induce spin orientations in a
(c) λ3 = 1.4288 (d) λ4 = 0.6868
network
Communities based on communicability 241

We can use this sign pattern of the eigenvectors to express the sum of the
contributions of the non-principal eigenvalues and eigenvectors to the commu-
nicability function as


n 
n 
n
qj ( p)qj (q) exp(λj ) = q+j ( p)q+j (q) exp(λj ) + q–j ( p)q–j (q) exp(λj )
j=2 j=2 j=2

, ,
, n ,
, + n
,
– ,, –
qj ( p)qj (q) exp(λj ) + qj ( p)qj (q) exp(λj ),, ,
– +

, j=2 j=2 ,

(21.3)

where q+j (p) indicates that the entry corresponding to the node p of the jth
eigenvector is positive.
Now, consider a cluster formed by nodes which have the same sign contribu-
tion to the communicability function. A physical interpretation is that two nodes
with the same sign in one eigenvector are ‘vibrating’ in the same direction, so
should be coupled together. On the contrary, if two nodes have different signs
for the same eigenvector they are in different clusters because they are vibrating
in different phases. This analogy allows us to group the two contributions of this
term to the communicability in the intra- and inter-cluster communicabilities,


n
intra–cluster inter–cluster
qj ( p)qj (q) exp(λj ) = Gpq – |Gpq |. (21.4)
j=2

This gives another way of defining a cluster.

Definition 21.3 A community is a subset of nodes C ⊂ V for which the intra-cluster


communicability among every pair of nodes is larger than the inter-cluster one.

The difference between intra- and inter-cluster communicability is given by

Gpq = Gpq – q1 ( p)q1 (q) exp(λ1 ) = Gpq


intra–cluster inter–cluster
– |Gpq |, (21.5)

which means that we only need to calculate Gpq – q1 ( p)q1 (q) exp(λ1 ) in order
to determine the difference between intra- and inter-cluster communicabilities.
However, because of the definition of a community, we need to check Gpq for
every pair of nodes. In order to do that we can use the following algorithm.

• Form the communicability matrix G = exp(A).


• Find the largest eigenvalue λ1 of the adjacency matrix and its corresponding
eigenvector q1 .
242 Communities in Networks

• Form G = G – q1 qT1 exp(λ1 ) and then G  whose entries are defined by



 1, Gpq > 0,
Gpq =
0, Gpq ≤ 0 or p = q.

• Define the communicability graph to be the network with adjacency matrix



Gpq .
• Find the cliques in the communicability graph. Each clique represents a
community in the network.

Example 21.10

We return to Gclus , labelled as in Figure 21.12.


The communicability matrix for this network is
⎡ ⎤
4.48 3.05 1.64 0.57 1.01 0.68 0.52 1.75 3.07 2.77 2.68 1.94
⎢3.05 3.76 3.01 1.36 1.96 1.08 0.72 2.28 2.35 2.80 1.84 0.95⎥
⎢ ⎥
⎢1.64 0.38⎥
⎢ 3.01 6.14 3.84 5.75 3.61 2.34 5.70 2.25 1.57 0.74 ⎥
⎢ ⎥
⎢0.57 1.36 3.84 3.31 3.98 2.18 1.25 3.24 0.97 0.55 0.21 0.11⎥
⎢ ⎥
⎢1.01 1.96 5.75 3.98 6.82 4.87 3.08 6.53 2.11 0.97 0.38 0.19⎥
⎢ ⎥
⎢0.68 1.08 3.61 2.18 4.87 4.99 3.54 5.66 1.81 0.66 0.25 0.13⎥
G=⎢
⎢0.52
⎥,
⎢ 0.72 2.34 1.25 3.08 3.54 3.31 4.26 1.42 0.50 0.19 0.10⎥⎥
⎢1.75
⎢ 2.28 5.70 3.24 6.53 5.66 4.26 8.04 3.47 1.67 0.78 0.39⎥⎥
⎢ ⎥
⎢3.07 2.35 2.25 0.97 2.11 1.81 1.42 3.47 3.85 2.82 1.85 0.95⎥
⎢ ⎥
⎢2.77
⎢ 2.80 1.57 0.55 0.97 0.66 0.50 1.67 2.82 3.61 2.44 0.74⎥⎥
⎣2.68 1.84 0.74 0.21 0.38 0.25 0.19 0.78 1.85 2.44 2.69 0.87⎦
1.94 0.95 0.38 0.11 0.19 0.13 0.10 0.39 0.95 0.74 0.87 1.71

3 4

2 5 6

12 1 8 7

11 9
10

Figure 21.12 A labelling of the network Gclus


Communities based on communicability 243

and
⎡ ⎤
3.49 1.84 –0.58 –0.78 –1.37 –1.18 –0.81 –0.93 1.73 1.86 2.12 1.65
⎢ 1.84 2.29 0.30 –0.28 –0.93 –1.19 –0.90 –0.97 0.72 1.70 1.16 0.59⎥
⎢ ⎥
⎢–0.58 –0.27⎥
⎢ 0.30 1.16 0.83 0.43 –0.57 –0.64 –0.29 –0.75 –0.47 –0.51 ⎥
⎢ ⎥
⎢–0.78 –0.28 0.83 1.49 0.76 –0.35 –0.55 –0.38 –0.84 –0.68 –0.54 –0.29⎥
⎢ ⎥
⎢–1.37 –0.93 0.43 0.76 1.15 0.41 –0.10 0.14 –1.09 –1.20 –0.95 –0.50⎥
⎢ ⎥
⎢–1.18 –1.19 –0.57 –0.35 0.41 1.48 1.05 0.63 –0.70 –1.04 –0.80 –0.42⎥
G = ⎢
⎢–0.81
⎥.
⎢ –0.90 –0.64 –0.55 –0.10 1.05 1.53 0.68 –0.37 –0.71 –0.55 –0.29⎥⎥
⎢–0.93 –0.39⎥
⎢ –0.97 –0.29 –0.38 0.14 0.63 0.68 0.84 –0.13 –0.78 –0.72 ⎥
⎢ ⎥
⎢ 1.73 0.72 –0.75 –0.84 –1.09 –0.70 –0.37 –0.13 2.05 1.60 1.10 0.56⎥
⎢ ⎥
⎢ 1.86 1.70 –0.47 –0.68 –1.20 –1.04 –0.71 –0.78 1.60 2.78 1.93 0.47⎥
⎢ ⎥
⎣ 2.12 1.16 –0.51 –0.54 –0.95 –0.80 –0.55 –0.72 1.10 1.93 2.38 0.71⎦
1.65 0.59 –0.27 –0.29 –0.50 –0.42 –0.29 –0.39 0.56 0.47 0.71 1.63

In Figure 21.13 we have reordered the nodes of G in order to illustrate the existence of a main diagonal positive
block formed by nodes 1, 2, 9, 10, 11, 12, which has a negative level of communication with the rest of the nodes. This
contour plot fails to highlight the community formed by nodes 3, 4, 5, 6, 7, 8.

8 3
7 2.5
6
2
5
4 1.5
Node

3 1
12 0.5
11
0
10
9 −0.5
2 −1
1
1 2 9 10 11 12 3 4 5 6 7 8
Node

Figure 21.13 Contour plot of G for Gclus

We transform G into a communicability graph by replacing nonpositive values by zeroes and positive values by
ones to give

continued
244 Communities in Networks

Example 21.10 continued


⎡ ⎤
0 1 0 0 0 0 0 0 1 1 1 1
⎢1 0 1 0 0 0 0 0 1 1 1 1⎥
⎢ ⎥
⎢0 0⎥
⎢ 1 0 1 1 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢0 0 1 0 1 0 0 0 0 0 0 0⎥
⎢ ⎥
⎢0 0 1 1 0 1 0 1 0 0 0 0⎥
⎢ ⎥
⎢0 0 0 0 1 0 1 1 0 0 0 0⎥
G  = ⎢
⎢0
⎥.
⎢ 0 0 0 0 1 0 1 0 0 0 0⎥⎥
⎢0 0⎥
⎢ 0 0 0 1 1 1 0 0 0 0 ⎥
⎢ ⎥
⎢1 1 0 0 0 0 0 0 0 1 1 1⎥
⎢ ⎥
⎢1 1 0 0 0 0 0 0 1 0 1 1⎥
⎢ ⎥
⎣1 1 0 0 0 0 0 0 1 1 0 1⎦
1 1 0 0 0 0 0 0 1 1 1 0

This (0, 1)-matrix represents a new graph consisting of the same set of nodes as the network under analysis, but in
which two nodes are connected if and only if their intra-cluster communicability is larger than the inter-cluster one.
This communicability graph is illustrated in Figure 21.14.

12 1 4
6

11 2 3 5 7

8
10 9

Figure 21.14 The communicability graph for Gclus

We now find the cliques in the communicability graph, which correspond to the following groups of nodes:

C1 = {1, 2, 9, 10, 11, 12}, C2 = {3, 4, 5}, C3 = {5, 6, 8}, C4 = {6, 7, 8}.

These cliques correspond to the communities identified by the communicability method and they can be represented
by overlapping sets as shown in Figure 21.15.

Figure 21.15 Overlapping communities


in Gclus
Anti-communities 245

We want to analyse whether this partition of Gclus into four (overlapping) communities is a better representation than
those obtained previously which partition the network into two communities, so we calculate the modularity of the new
partition. The number of edges and sum of degree in each community are |EC1 | = 7, |EC2 | = |EC3 | = |EC4 | = 3,
   
kj = 16, kj = 10, kj = 12, kj = 10.
j∈VC j∈VC j∈VC j∈VC
1 2 3 4

Substituting these values into (21.1) gives


" #2 " #2 " #2 " #2
7 16 3 10 3 12 3 10
Q= – + – + – + – = 0.426,
18 36 18 36 18 36 18 36
; <= > ; <= > ; <= > ; <= >
C1 C2 C3 C4

which is larger than that of Q = 0.383 previously found by the partitions introduced by the methods considered earlier:
Kernighan–Lin, Girvan–Newman, and the two spectral clustering techniques. Consequently, in this particular case,
the partition introduced by communicability is, at least in terms of modularity, better than the bipartition previously
considered.

21.7 Anti-communities
In Chapter 18 we showed how to measure the degree of bipartivity in a network.
We now show how to find bipartitions in complex networks. For obvious reasons
we call these clusters anti-communities in the network.

Examples 21.11

(i) In Figure 21.16 we illustrate a protein–protein interaction (PPI) network


in which a certain level of bipartivity is expected, since some proteins
can act as locks and others as keys in the formation of noncovalent
interactions.
(ii) In Figure 21.17 we illustrate a network of employees in a work place,
some of whom can be classified as advisers and others as advisees. A
degree of bipartivity structure is expected: most (but not all) interactions
will between advisors and advisees.

continued
246 Communities in Networks

Examples 21.11 continued

(a) PPI network (b) Lock-and-key interactions


Figure 21.16 The PPI network of A. fulgidis and a schematic lock-and-key
representation

Figure 21.17 A social network of advisors and


advisees in a saw mill

To find such anti-communities, we start by defining an anti-communicability


function by

pq = [exp(–A)]pq = [cosh(A) – sinh(A)]pq .


G (21.6)

Consider a bipartite network and let p and q be two nodes which are in the
two different disjoint partitions of the network. Since there are no walks of even
length starting at p and ending at q,

pq = [– sinh(A)]pq < 0.


G (21.7)
Anti-communities 247

However, if p and q are in the same partition then since there are no walks of
odd length connecting them, due to the lack of odd cycles in the bipartite graph,
pq = [cosh(A)]pq > 0.
G (21.8)

Consequently the sign of Gpq determines whether the corresponding pair of


pq > 0 if and only if p and q are
nodes are in the same partition or not. That is, G
in the same partition.

Definition 21.4 Let C ⊆ V be a cluster of nodes in the network. Then C is a


quasi-bipartite cluster if and only if [cosh(A)]pq > [sinh(A)]pq for all p, q ∈ C.

We can adapt the procedure we developed for detecting communities based on


the communicability function, but this time using anti-communicability replacing

G with G.

•  = exp(–A).
Form the anti-communicability matrix G
• Define

1, pq > 0,
G
 =
Gpq
0, pq ≤ 0 or p = q.
G

•   represent the adjacency matrix of an anti-communicability graph.


Let G
Find the cliques in this graph.
• Each clique in the anti-communicability graph is an anti-community in the
network.

Example 21.12

Consider the network illustrated in Figure 21.18. This graph has spectral bipartivity bS = 0.64. Then
⎡ ⎤
10.15 6.83 7.09 7.11 6.66 8.19 –5.62 –7.98 –7.29 –7.28 –9.02 –8.73
⎢ 6.83 8.03 4.17 5.87 5.43 6.72 –6.31 –4.32 –6.56 –5.65 –7.28 –7.04⎥
⎢ ⎥
⎢ 7.09 –7.05⎥
⎢ 4.17 8.05 5.62 5.50 6.73 –5.42 –6.62 –4.30 –6.56 –7.29 ⎥
⎢ ⎥
⎢ 7.11 5.87 5.62 8.17 3.93 6.80 –5.68 –5.96 –6.62 –4.32 –7.98 –7.10⎥
⎢ ⎥
⎢ 6.66 5.43 5.50 3.93 7.37 6.36 –5.39 –5.68 –5.42 –6.31 –5.62 –6.77⎥
⎢ ⎥
⎢ –7.28⎥
 = ⎢ 8.19
G
6.72 6.73 6.80 6.36 9.38 –6.77 –7.10 –7.05 –7.04 –8.73 ⎥,
⎢ –5.62 –6.31 –5.42 –5.68 –5.39 –6.77 7.37 3.93 5.50 5.43 6.66 6.36⎥
⎢ ⎥
⎢ –7.98 6.80⎥
⎢ –4.32 –6.62 –5.96 –5.68 –7.10 3.93 8.17 5.62 5.87 7.11 ⎥
⎢ ⎥
⎢ –7.29 –6.56 –4.30 –6.62 –5.42 –7.05 5.50 5.62 8.05 4.17 7.09 6.73⎥
⎢ ⎥
⎢ –7.28 –5.65 –6.56 –4.32 –6.31 –7.04 5.43 5.87 4.17 8.03 6.83 6.72⎥
⎢ ⎥
⎣ –9.02 –7.28 –7.29 –7.98 –5.62 –8.73 6.66 7.11 7.09 6.83 10.15 8.19⎦
–8.73 –7.04 –7.05 –7.10 –6.77 –7.28 6.36 6.80 6.73 6.72 8.19 9.38

continued
248 Communities in Networks

Example 21.12 continued

9 4

3 11

10 2

6 7

12 1

5 8

Figure 21.18 A network with anti-communities

and
⎡ ⎤
0 1 1 1 1 1 0 0 0 0 0 0
⎢1 0 1 1 1 1 0 0 0 0 0 0⎥
⎢ ⎥
⎢1 0⎥
⎢ 1 0 1 1 1 0 0 0 0 0 ⎥
⎢ ⎥
⎢1 1 1 0 1 1 0 0 0 0 0 0⎥
⎢ ⎥
⎢1 1 1 1 0 1 0 0 0 0 0 0⎥
⎢ ⎥

  = ⎢1 1 1 1 1 0 0 0 0 0 0 0⎥⎥.
G ⎢0
⎢ 0 0 0 0 0 0 1 1 1 1 1⎥⎥
⎢0 1⎥
⎢ 0 0 0 0 0 1 0 1 1 1 ⎥
⎢ ⎥
⎢0 0 0 0 0 0 1 1 0 1 1 1⎥
⎢ ⎥
⎢0 0 0 0 0 0 1 1 1 0 1 1⎥
⎢ ⎥
⎣0 0 0 0 0 0 1 1 1 1 0 1⎦
0 0 0 0 0 0 1 1 1 1 1 0

The anti-communicability graph with adjacency matrix Ĝ  is the disconnected network illustrated in Figure 21.19.
This anti-communicability graph has only the two cliques

C1 = {1, 2, 3, 4, 5, 6}, C2 = {7, 8, 9, 10, 11, 12}.

These two cliques are the anti-communities in the network. This can be emphasized by redrawing the network as
in Figure 21.20.
Anti-communities 249

1 7

6 2 12 8

5 3 11 9

4 10

Figure 21.19 The anti-communicability graph with adjacency matrix Ĝ 

1 2 3 4 5 6

7 8 9 10 11 12

Figure 21.20 The best partition into anti-


communities of the network in Figure 21.18

Using the approach detailed in this section, we can find the best bipartitions
for the PPI and the advisors–advisees networks illustrated in Figure 21.16. The
results are shown in Figure 21.21.

(a)

Figure 21.21 Anti-community parti-


tions of (a) the PPI network in Fig-
ure 21.16 and (b) the sawmill network
(b) in Figure 21.17
250 Communities in Networks
..................................................................................................

FURTHER READING

Estrada, E., Community detection based on network communicability, Chaos


21:016103, 2011.
Estrada, E., Higham, D.J., and Hatano, N., Communicability and multipar-
tite structure in complex networks at negative absolute temperature, Physics
Review E 78:026102, 2008.
Fortunato, S., Community detection in graphs, Physics Reports 486:75–174,
2010.
Leskovec, J., Lang, K.J., and Mahoney, M., Empirical comparison of algorithms
for network community detection, WWW’10 Proc. 19th Int. Conf. on World
Wide Web, ACM, New York, 2010, 631–640.
Ma, X., Gao, L., and Yong, X., Eigenspaces of network reveal the overlapping
and hierarchical community structure more precisely, Journal of Statistical
Mechanics: Theory and Experiment PO8012, 2010.
Porter, M.A., Onnela, J.-P., and Mucha, P., Communities in networks, Notices of
the AMS 56:1,082–1,097, 2009.
Index

A centrality, 8 communicability distance D


acyclic graphs, 18 see also classical node matrix, 213 data analysis and
adjacency matrix, 16, 20 centrality; spectral communicability in networks, manipulation, 39
eigenvalues, 37 node centrality 206–11 data processing, 42–6
eigenvectors, 75–7 centre, 8 distance, 212–16 data sources, 48
see also spectra of adjacency characteristic polynomial communities in networks, data types, 48–50
matrices (c.p.), 54–5, 66, 69, 70, 227–9 data uncertainty, 40
algebraic multiplicity, 55 71, 82, 83 anti-communities, 245–9 error sources, 40–1
annihilation operator, 173, 174 circuits, 8, 18 basic concepts, 229–30 experimental tools, 48–51
anti-communicability, 247–9 fundamental set, 8 clustering by centrality, statistics and random
anti-communities, 245–9 Hamiltonian circuits, 7 235–7 variables, 46–8
anti-motifs, 141 independent circuits, 8 communicability, 239–45 definition of networks, 12
antireciprocal networks, 187–8 modularity, 237–9
sum of, 8 degree centrality, 143–6
network partition methods,
assortativity coefficient, 181 see also cycles degree distribution, 43, 95
structural interpretation, 230–5
classical node centrality, 143 general degree
182–6 computational error, 40–1
betweenness centrality distributions, 95–7
assortative network, conductance, 91
(BC), 152–6 scale-free networks,
179–80, 183 connectivity, 21–6
closeness centrality (CC), 97–100
core–periphery networks, 197
146–51 degree of a node, 14
counting subgraphs in
degree centrality, 143–6 in-degree, 14
B networks
classical physics analogies, 86 out-degree, 14
Barabási–Albert (BA) model, closed walks (CWs), 131–3
classical mechanical degree–degree correlation,
108–10 combined techniques,
analogies, 87–90 179–82
Bessel functions, 209 133–5
networks as electrical structural interpretation
betweenness centrality, 152–6 counting stars, 130
circuits, 90–3 of assortativity
bipartitivity, 28–30 formulae for small
cliques, 25 coefficient, 182–6
bipartivity measures, 200–5 subgraphs, 136–9
closed walks (CWs), 18, 131–3 determinant of a matrix, 53
bridge, 23 other techniques, 135–6
closeness centrality (CC), covariance, 47 diamond graph, 135
Brown, Alexander Crum, 9
146–51 cricket graph, 134–5 digraphs, 12
clustering by centrality, 235–7 cumulative distribution Dirac notation, 172
C clustering coefficients, 101 function (CDF), directed networks, 12
canonical ensembles, 221–4 Newman clustering 97, 114 eigenvector centrality, 162
Cauchy integral formula, coefficient, 102–4 cycle graph, 15, 16, 19, 20, Katz centrality, 159
123–4 Watts–Strogatz cluster- 67, 72, 75–6, 82, 193 motifs, 140–1
Cayley, Arthur, 7–9 ing coefficient, cycles, 18, 26–7, 29, 74 subgraph centrality, 168–9
Cayley–Hamilton theorem, 116 101–2, 104 triangles, 18 disassortative networks, 184–6
252 Index

E fragments, 129, 165–6, Kirchhoff and graph irreducible matrices, 64


eigenvalues, 37, 54–8 see also network motifs; algebra, 8–9 Ising model, 177–8
Fiedler vector, 84–5 subgraphs knight’s tour, 7 isolated vertex, 15
Gershgorin’s theorem, fragment-based measures, trees, 7–8 isomorphism, 14
60–2 129–39
graph Laplacian, 80–4 J
H
Perron–Frobenius fully indecomposable matrices, Jordan, Camille, 8
Hamilton, William, 7
theorem, 62–4 64 Jordan canonical form of a
Hamiltonian, 171
spectral radius, 56 fundamental postulate of matrix, 58
simple harmonic oscillator
spectral theorem, 58–60 thermodynamics, 218 Jordan decomposition, 59
(SHO), 172–4
eigenvector centrality, 160–2
tight-binding models, 175
directed networks, 162
G Hamiltonian function, 87–8 K
eigenvectors, 54–8 Helmholtz free energy, 222–4
geometric multiplicity, 55 k-cube, 29
adjacency matrices, 75–7 Hermite polynomial, 173
Gershgorin’s theorem, 60–2 Katz centrality, 158
Fiedler vector, 84–5 heterogeneity of networks, 193
Gibbs free energy, 219 directed networks, 159
graph Laplacian, 80–4 Hilbert space, 171, 175 Kernighan–Lin algorithm,
Girvan–Newman algorithm,
end vertex, see pendant vertex homogeneity of networks, 233, 238
236
entropy, 217, 219 192–4 Kirchhoff, Gustav, 8–9
graph invariant, 66, 84
Erdös–Rényi (ER) model, spectral scaling method, Kirchhoff’s laws of electrical
graph Laplacian, 75, 78, 79
105–8, 199 196 networks, 9, 91
eigenvalues and
Estrada index, 226 Hubbard model, 177
eigenvectors, 80–4
Euclidean distance, 45, Hückel molecular orbital
Fiedler vector, 84–5, 233–5 L
92, 213 method, 175
in classical physics, 89–93 ladder operators, 173–4, 175
Euler, Leonhard, 5–7 hyperbolic functions of
see also Laplacian matrix Lagrangian function, 87, 88
Euler’s formula, 127 matrices, 128
graphs Laplacian matrix, 89–90
Euler–Lagrange equations, 88 and anti-communities,
acyclic, 18 Laplacian operator, 78
Euler–Mascheroni constant, 246–7
complete, 15, 22–3, 67, laws of thermodynamics, 219
106, 109 and bipartivity, 201, 202
71, 80, 106, 188, local improvement methods,
expansion constant, 193 and centrality, 166, 167
202, 208 231–3
expansion properties of and communicability,
counting subgraphs in loop, 14
networks, 192–4 207, 211
networks, 129–39
experimental tools, 48
cricket graph, 134–5 M
data sources, 48
cycle, 15 I Markov chains, 163
data types, 48–50
diamond graph, 135 incidence matrix, 17, 78 matrices, 16–18
software tools, 50–1
digraphs, 12 infinite series, 116–18, 121, adjacency, 16, 20, 37, 62
exponential distribution, 98
path, 15 126–8, 158, 164 characteristic equation, 54
extrapolating data, 46
star, 30, 211 Maclaurin series, 123, characteristic polynomial,
tadpole graph, 133 125, 127 54
F wheel, 15 Taylor series, 222 covariance, 47
Fiedler vector, 84–5, 233 graphs, history of, 5 inter-cluster density, 229 defective, 58
filtering data, 43–5 chemistry, 9–10 internal energy, 219 determinants, 53
fitting data, 45–6 Euler and the Königsberg intra-cluster density, 229 fully indecomposable, 64
forest, 26 bridges, 5–7 inverse temperature, 222 incidence, 17, 78
Index 253

irreducible, 64 Moore–Penrose pseudoinverse, homogeneous, 192–4, 196 induction, 34–5


nonnegative, 62 91, 92 mixed-topology, 197 other advice, 38
orthogonal, 52 regular, 15 protein–protein interaction
permutation, 21 N scale-free, 97–100 (PPI) networks, 43
positive, 62 network global properties, simple, 12
principal minor, 53 179, 192 union of, 14 Q
principal square root, 122 bipartivity measures, 200–5 weighted, 16 quantum physics analogies,
resolvent, 116–17 degree–degree correlation, Newman clustering coefficient, 170
Schur canonical form, 60 179–86 102–4 Hubbard model, 177
similarity transforms, 57 expansion properties, null graphs, 15 Ising model, 177–8
simple, 58 192–4 quantum mechanical
spectral radius, 56 reciprocity, 187–8 analogies, 171–4
returnability, 189–90 O
spectrum, 56 tight-binding models,
spectral scaling method, Ohm’s law, 91
stochastic, 163 174–6
symmetric, 52 195–200 quasi-bipartite clusters, 247
network motifs, 140
trace, 52 P
directed networks, 140–1
see also spectra of adjacency PageRank centrality, 163–4 R
undirected networks,
matrices partition function, 222, 223 random models of networks,
141–2
matrix exponential, 118–21, path graphs, 15 105
see also fragments;
166–7 paths, 18–21 Barabási–Albert (BA)
subgraphs
and anti-communities, circuits, 18 model, 108–10
network partition methods,
246, 247 cycles, 18 Erdös–Rényi (ER) model,
230–1
and bipartivity, 201, 202 length, 18 105–8
local improvement
and communicability, 207, trails, 18 Watts–Strogatz (WS)
methods, 231–3
213, 241, 242 Pearson correlation coefficient, model, 110–13
spectral partitioning,
and returnability, 189 47, 180 Rayleigh quotient, 62
233–5
see also Estrada index; pendant vertex, 15 reciprocity of networks, 187–8
network theory
subgraph centrality permutation matrix, 21 resolution limit, 239
connectivity, 21–6
matrix functions, 114 Perron–Frobenius theorem, resolvent, 116, 117, 124, 158
elementary concepts,
extending powers, 121 62–4 returnability of networks,
13–15
general functions, 123–8 Petersen graph, 194 189–90
formal definition, 12
powers, 114–17 graph structures, 26–30 Poisson distribution, 95–6,
square root, 122 matrices, 16–18 106 S
see also hyperbolic functions temperature, 225–6 positive matrices, 62 scale-free networks, 97–100
of matrices walks and paths, 18–21 power-law relationships, 98, Schrödinger equation, 171
micro-canonical ensembles, networks 109–10 Schur decomposition, 59–60
220–1 as electrical circuits, 90–3 principal minor of a matrix, 53 self-similarity, 98
microstates of a system, 218 complement, 14, 15 proofs, 31–2 similarity transform, 57
missing data, dealing with, complete, 15 connections between simple harmonic oscillator
42–3 core–periphery, 197 concepts, 37–8 (SHO), 172–4,
mixed-topology networks, 197 directed, 12, 140–1, 159, contradiction, 36–7 220–1, 223
modularity, 237–8 162, 168–9 counterexamples, 35 small-world networks,
resolution, 239 heterogeneous, 193 diagrams/pictures, 32–4 110–11, 212
254 Index

spectra of adjacency matrices, mixed-topology networks, T V


66 197 tadpole graph, 133 Vandermonde, Alexandre-
eigenvalues and eigen- spectral theorem, 58–60 temperature, 219 Théophile, 7–8
vectors of graph spectrum of a matrix, 56 network theory, 225–6 variables, random, 47–8
Laplacian, 80–4 statistical physics analogies, thermal equilibrium, 219
eigenvectors, 75–7 217 thermodynamics, 218–19
Fiedler vector, 84–5 canonical ensembles, tight-binding models, 174–6 W
simple networks, 66–70 221–4
trails, 18 walks, 18–21, 121
spectra and structure, 70–4 micro-canonical
transitivity, 101–4, 212–13 and centrality, 158, 161,
ensembles, 220–1
spectral node centrality, 157 transitivity index, see Newman 166, 195
temperature in network
eigenvector centrality, clustering coefficient and communicability, 207,
theory, 225–6
160–2 trees, 7–8 211–12
thermodynamics, 218–19
Katz centrality, 158–9 definition, 26 closed walks (CWs), 18,
subgraph centrality, 164–7,
PageRank centrality, 163–4 forest, 26 72, 73–4, 131–3,
195, 198–9, 207, 215,
subgraph centrality, 164–9 rooted trees, 8 201–3
226
spectral partitioning, 233–5 directed networks, 168–9 spanning, 27 enumerating, 19, 71
spectral radius of a matrix, 56 subgraphs, 14, 19, 23, 84, triangles, 18, 25–6, 71–3, 93, length, 18
spectral scaling method, 195–6 165, 184, 187, 215, 101–3, 131–5, 189, 212 trails, 18
core–periphery networks, 229, 231 Watts–Strogatz clustering
197 enumerating, 129–39 coefficient, 101–2, 104
holed networks, 196 see also fragments; network U Watts–Strogatz (WS) model,
homogeneous networks, motifs uncertainty of data, 40 113–16, 212
196 Sylvester, James, 8–9 union of networks, 14 wheel graphs, 15

You might also like