Brain Network Analysis 1st Edition Moo K. Chung

K. Chung
Brain Network Analysis

This tutorial reference serves as a coherent overview of various statistical and math-
ematical approaches used in brain network analysis, where modeling the complex
structures and functions of the human brain often poses many unique computational
and statistical challenges. This book fills a gap as a textbook for graduate students while
simultaneously articulating important and technically challenging topics. Whereas most
available books are graph theory centric, this text introduces techniques arising from
graph theory and expands to include other advanced models in its discussion on
network science, regression, and algebraic topology. Links are included to the sample
data and codes used in generating the book’s results and figures, helping empower
methodological understanding in a manner immediately usable to both researchers and
M o o K . C h u n g is an Associate Professor in the Department of Biostatistics and
Medical Informatics at the University of Wisconsin–Madison and is also affiliated with
the Department of Statistics and Waisman Laboratory for Brain Imaging and Behavior.
He has received the Vilas Associate Award for his research in applied topology to
medical imaging, the Editor’s Award for best paper published in the Journal of Speech,
Language, and Hearing Research for a paper that analyzed computed tomography (CT)
images, and a National Institutes of Health (NIH) Brain Initiative Award for work
on persistent homological brain network analysis. He has written numerous papers in
computational neuroimaging and two previous books on computation on brain image
Brain Network Analysis

University of Wisconsin–Madison
Preface page xi

1 Statistical Preliminary 1
1.1 General Linear Models 1
1.2 Logistic Regression 6
1.3 Random Fields 9
1.4 Statistical Inference on Fields 16
2 Brain Network Nodes and Edges 27
2.1 Brain Templates 27
2.2 Brain Parcellations 28
2.3 Deterministic Connectivity 34
2.4 Probabilistic Connectivity 46
2.5 Parcellation-Free Brain Network 50
2.6 Structural Covariates 55
3 Graph Theory 61
3.1 Trees and Graphs 61
3.2 Minimum Spanning Trees 62
3.3 Node Degree 65
3.4 Shortest Path Length 70
3.5 Clustering Coefficient 71
3.6 Small-Worldness 72
3.7 Fractal Dimension 73
4 Correlation Networks 76
4.1 Pearson Correlations 76
4.2 Partial Correlations 78
4.3 Averaging Correlations 79

viii Contents

4.4 Correlation as Metric 85

4.5 Statistical Inference on Correlations 87
4.6 Cosine Series Representation 89
4.7 Correlating Functional Signals 103
4.8 Thresholding Correlation Networks 106
5 Big Brain Network Data 108
5.1 Big Data 108
5.2 Sparsity 112
5.3 Hierarchy 114
5.4 Computing Large Correlation Matrices 120
5.5 Online Algorithms 123
6 Network Simulations 129
6.1 Multivariate Normal Distributions 129
6.2 Multivariate Linear Models 136
6.3 Mixed Effects Models 143
6.4 Simulating Dependent Images 149
6.5 Dependent Correlation Networks 152
7 Persistent Homology 156
7.1 Simplicial Homology 157
7.2 Morse Filtrations 163
7.3 Graph Filtrations 168
7.4 Betti Plots 173
8 Diffusions on Graphs 180
8.1 Diffusion as a Cauchy Problem 180
8.2 Finite Difference Method 184
8.3 Laplacian on Planner Graphs 188
8.4 Graph Laplacian 189
8.5 Fiedler Vectors 193
8.6 Heat Kernel Smoothing on Graphs 196
8.7 Laplace Equation 204
9 Sparse Networks 207
9.1 Why Sparse Models? 207
9.2 Sparse Likelihood 210
9.3 Sparse Correlation Network 213
9.4 Partial Correlation Network 222
10 Brain Network Distances 226
10.1 Matrix Norms 227
Contents ix

10.2 Bottleneck Distance 229

10.3 Gromov–Hausdorff Distance 231
10.4 Kolmogorov–Smirnov Distance 233
10.5 Performance Analysis 236
10.6 Comparisons on Modules 238
10.7 Hypernetworks 241
11 Combinatorial Inferences for Networks 246
11.1 Permutation Test 246
11.2 Exact Combinatorial Inference 253
11.3 Bootstrap 263
12 Series Expansion of Connectivity Matrices 269
12.1 Spectral Decomposition 269
12.2 Iterative Residual Fitting 271
12.3 Spectral Decomposition with Different Bases 278
12.4 Spectral Permutation 279
12.5 Karhunen–Loève Expansion 280
12.6 Vandermonde Matrix Expansion 283
12.7 The Space of Positive Definite Symmetric Matrices 287
13 Dynamic Network Models 292
13.1 Dynamic Causal Model 293
13.2 Dynamic Time Series Models 295
13.3 Persistent Homological Dynamic Network Model 298

Bibliography 302
Index 326

Brain network analysis is an emerging field that utilizes various noninvasive

brain imaging modalities such as magnetic resonance imaging (MRI), func-
tional MRI (fMRI), positron emission tomography (PET), diffusion tensor
imaging (DTI), and electroencephalography (EEG) in mapping out the four-
dimensional (4D) spatiotemporal dynamics of the human brain networks in
both normal and clinical populations at the macroscopic level. There has been
substantial progress in the past decade on this topic. A major challenge in
the field is caused by the massive amount of nonstandard high-dimensional
network data that are difficult to analyze using available standard techniques.
This requires new computational approaches and solutions.
The main goals of this book are to provide a coherent overview of various
statistical and mathematical approaches used in brain network analysis to
a wide range of researchers and students, and to articulate important yet
technically challenging topics further. It is hoped that the book presents the
coherent mathematical treatment of underlying methods. The book is mainly
focused on methodological issues beyond widely used graph theory–based
approaches. We wish to provide methodological understanding in a manner
immediately usable to researchers and students. Concepts and methods are
illustrated with brain imaging applications and examples. Some of the brain
network data sets along with MATLAB and R codes used in the book can
be downloaded from the author’s website. The web links are provided in
appropriate places. By making some of the data and codes available, we tried
to make the book more accessible to a wide range of readers.
Although I am indebted to many colleagues and students in writing this
book, I would particularly like to thank the following individuals, in no
particular order. Richard Davidson, Andrew Alexander, Seth Pollak, Hill
Goldsmith of the University of Wisconsin–Madison; David Zald of Vanderbilt
University; and Benjamin Lahey of the University of Chicago provided various

xii Preface

brain imaging data used in illustrating the methods. Hyekyoung Lee of Seoul
National University and Yuan Wang of University of South Carolina helped me
write chapters related to persistent homology and topological distances. Her-
nando Ombao of the King Abdullah University of Science and Technology and
Dustin Pluta of the University of California–Irvine helped me write chapters
related to the dynamic network models. Andrey Gritsenko of the University of
Wisconsin–Madison performed some of basic image processing on the resting-
state fMRI from Human Connectome Project data and helped compile the
list of Automatic Anatomical Labeling (AAL) parcellation. Although most
figures are produced by myself using MATLAB, some figures are generated
by my current and former students, postdocs, and colleagues. Such figures
are identified in figure captions and the proper credits are given. I am also
indebted to Fred Boehm of the University of Wisconsin–Madison and Feng
Liu of Harvard University for proofreading a few chapters.
Statistical Preliminary

This chapter covers the basic statistical methods that are mostly used in univari-
ate voxel-level approaches. However, these basic methods are equally useful
in brain network analysis as well. Most of network modeling techniques are
based on the voxel-level methods. Readers familiar with univariate statistical
methods can skip this chapter.

1.1 General Linear Models

General linear models (GLM) have been widely used in brain imaging and
network studies. The GLM is a very flexible and general statistical framework
encompassing a wide variety of fixed-effect models such as multiple regres-
sions, the analysis of variance (ANOVA), the multivariate analysis of variance
(MANOVA), the analysis of covariance (ANCOVA), and the multivariate
analysis of covariance (MANCOVA) (Timm and Mieczkowski, 1997). More
complex multilevel or hierarchical models such as the mixed-effects models
and structural equation models (SEM) are also viewed as special cases of
general linear models.
GLM provides a framework for testing various associations and hypotheses
while accounting for nuisance covariates in the model in a straightforward
fashion. The effect of age, sex, brain size, and possibly IQ may have severe
confounding effects on the final outcome of many brain network studies.
Older populations’ reduced functional activation could be the consequence of
age-related atrophy of neural systems (Mather et al., 2004). Brain volumes
are significantly larger for children with autism 12 years old and younger
compared with normally developing children (Aylward et al., 1999). Therefore,
it is desirable to account for various confounding factors such as age and
sex. This can be done using GLM automatically. The parameters of GLM are

2 Statistical Preliminary

mainly estimated by the least squares estimation and have been implemented
in many statistical packages such as R1 (Pinehiro and Bates, 2002), statistical
parametric mapping (SPM)2 and fMRI-STAT.3
We assume there are n subjects. Let yi be the response variable at a node
or edge, which is mainly coming from images and xi = (xi1, · · · ,xip ) to
be the variables of interest and zi = (zi1, · · · ,zik ) to be nuisance variables
corresponding to the ith subject. Then we have GLM
yi = zi λ + xi β + i ,
where λ = (λ1, · · · ,λk ) and β = (β1, · · · ,βp ) are unknown parameter
vectors to be estimated. We assume  to be the usual zero mean Gaussian
The significance of the variable of interests xi is determined by testing the
null hypothesis
H0 : β = 0 vs. H1 : β = 0.
The fit of the reduced model corresponding to β = 0, i.e.,
yi = zi λ, (1.1)
is measured by the sum of the squared errors (SSE):

SSE0 = (yi − zi
λ0 )2,

where λ0 is the least squares estimation obtained from the reduced model. The
reduced model (1.1) can be written in a matrix form
⎛ ⎞ ⎛ ⎞⎛ ⎞
y1 z11 · · · z1k λ1
⎜ .. ⎟ ⎜ .. .. .. ⎟ ⎜ .. ⎟ .
⎝ . ⎠=⎝ . . . ⎠⎝ . ⎠
yn zn1 ··· znk λn
y Z λ

By multiplying Z on the both sides, we obtain

Z y = Z Zλ.
Now the matrix Z Z is a full rank and can be invertible if n ≥ k, i.e., there are
more subjects than the number of parameters. The matrix equation then can be
solved by performing a matrix inversion

λ0 = (Z Z)−1 Z y.
1 www.r-project.org
2 www.fil.ion.ucl.ac.uk/spm
3 www.math.mcgill.ca/keith/fmristat
1.1 General Linear Models 3

Similarly the fit of the full model corresponding to β = 0, i.e.,

yi = zi λ + xi β

is measured by

SSE1 = λ1 − xi 
(yi − zi β 1 )2 ,

where λ1 and β 1 are the least squares estimation from the full model. The full
model can be written in a matrix form by concatenating the row vectors zi and
xi into a larger row vector (zi ,xi ), and the column vectors λ and β into a larger
column vector (λ,β  ) , i.e.,
yi = (zi ,xi ) .
Then the parameters of the full model can be estimated in the least squares
fashion. Note that

SSE1 = min (yi − zi λ1 − xi β 1 )2
λ1,β 1

≤ min (yi − zi λ0 )2 = SSE0 .

So the larger the value of SSE0 − SSE1 , more significant the contribution of
the coefficients β is. Under the assumption of the null hypothesis H0 , the test
statistic is the ratio
(SSE0 − SSE1 )/p
F = ∼ Fp,n−p−k . (1.2)
SSE0 /(n − p − k)
The larger the F value, it is more unlikely to accept H0 .

1.1.1 T-Statistic
When p = 1, the test statistic F is distributed as F1,n−1−k , which is the square
of the student t-distribution with n − 1 − k degrees of freedom, i.e., tn−1−k
2 . In
this case, it is better to use t-statistic. The advantage of using the t-statistic is
that the test statistic can provide the direction of the group difference that the
F -statistic cannot provide.
c = (0, · · · ,0 ,1, 0, · · · ,0)
k p−1
4 Statistical Preliminary

be the contrast vector of size k + p. The incorporation of the contrast vector

makes the algebraic derivation straightforward. Consider testing the signifi-
cance of H0 : β1 = 0. The least squares estimation of β1 can be written as


β1 = c  .
Under the assumption i ∼ N (0,σ 2 ),
1 = β1 .

Further, the variance

1 = cV
Vβ c = σ 2 c [ZX] ZX c.

Thus, the unbiased estimator of σ 2 is given by
SSE1 /(n − 1 − k).
We plug this estimator into σ 2 . Then the test statistic under the null
hypothesis is
T = ∼ tn−1−k .

1.1.2 R-Square
The R-square of a model explains the proportion of variability in measurement
that is accounted by the model. Sometime R-square is called the coefficient
of determination and it is given as the square of a correlation coefficient for a
very simple model. For a linear model involving the response variable yi , the
total sum of squares (SST) measures total total variation in response yi and is
defined as
SST = (yi − ȳ)2,

where ȳ is the sample mean of yi .

On the other hand, SSE measures the amount of variability in yi that is not
explained by the model. Note that SSE is the minimum of the sum of squared
residual of any linear model, SSE is always smaller than SST. Therefore, the
amount of variability explained by the model is SST-SSE. The proportion of
variability explained by the model is then
R2 = ,
1.1 General Linear Models 5

which is the coefficient of determination. The R-square ranges between 0 and

1 and the value larger than 0.5 is usually considered significant.

1.1.3 Sum of T-Statistics

Often there is a situation such as a meta-analysis, where we have to sum the
t-statistic images or networks (Chung et al., 2017b). Note that a t-statistic for
large degrees of freedom (above 30) is very close to standard normal, i.e.,
N (0,1). For n identically distributed possibly dependent t-statistics t 1, · · · ,t n ,

the variance of sum nj=1 t j is approximately given by (Billingsley, 1995)
⎛ ⎞

V⎝ tj ⎠ ≈ n + E(t i t j ),
j =1 i=j

Figure 1.1 (a)–(c) t-statistic results of group difference between maltreated

children and normal controls for three different connectivity methods (Chung
et al., 2017b). Only the connections at the p-value less than 0.01 (uncorrected) are
shown. (d) The three t-statistic maps are aggregated to form a single t-statistic.
6 Statistical Preliminary

where E(t i t j ) is the correlation between t i and t j . We used the fact Et j = 0.

Then, we have the aggregated t-statistic given by
n j
j =1 t
T =  ∼ N (0,1).
n + i=j E(t i t j )

If the statistics t j are all independent, since t j are close to standard normal,
E(t i t j ) ≈ 0. The dependency increases the variance estimate and reduces
the aggregated t-statistic value. Unfortunately, it is difficult to estimate the
correlations directly since only one t-statistic map is available for each t j .
E(t i t j ) can be empirically estimated by computing correlations over the entries
of t-statistic maps t i and t j (see Figure 1.1).

1.2 Logistic Regression

Logistic regression is useful for setting up a probabilistic model on the strength
of connectivity and performing classification (Subasi and Ercelebi, 2005).
Suppose k regressors X1, · · · ,Xk are given. These are both imaging and
nonimaging biomarkers such as gender, age, education level, and memory test
score. Let xi1, · · · ,xik denote the measurements for the ith subject. Let the
response variable Yi be the probability of connection modeled as a Bernoulli
random variable with parameter πi , i.e.,
Yi ∼ Bernoulli(πi ).
Yi = 0,1 indicates the edge connected (assigned number 1) or disconnected
(assigned number 0) respectively. πi is then the likelihood (probability) of the
edge connected, i.e., πi = P (Yi = 1).
Now consider linear model
Yi = x
i β + i , (1.3)
where x 
i = (1,xi1, · · · ,xik ) and β = (β0, · · · ,βk ). We may assume

Ei = 0, Vj = σ 2 .
However, linear model (1.3) is no longer appropriate since
EYj = πi = x
i β

but xi β may not be in the range [0,1]. The inconsistency is caused by trying
to match continuous variables xij to categorical variable Yi directly. To address
this problem, we introduce the logistic regression function g:
1.2 Logistic Regression 7

i βi )
πi = g(xi ) = . (1.4)
1 + exp(x
i βi )
Using the logit function, we can write (1.4) as
logit(πi ) = log = x
i βi .
1 − πi

1.2.1 Maximum Likelihood Estimation

The unknown parameters β are estimated via the maximum likelihood estima-
tion (MLE) over n subjects at each edge. The likelihood function is

L(β|y1, · · · ,yn ) = πi i (1 − πi )1−yi
 yi  1−yi

i βi ) 1
= .
1 + exp(x
i βi ) i=1
1 + exp(x
i β)

The loglikelihood function is given by

log L(β) = const. + yi log πi + (1 − yi ) log(1 − πi )
= const. + yi x
i β + log(1 − πi )

and its maximum is obtained when

∂ log L(β) 
= xi (yi − πi ) = 0.

In simplifying the expression, we used the following identities

= πi (1 − πi )
= xi πi (1 − πi ).
Since the logistic regression function π is in complicated form, the maximum
is obtained numerically. Define the information matrix I (β) to be

∂ 2 log L(β) 
I (β) = − − πi (1 − πi )xi x
i .
∂β  ∂β
8 Statistical Preliminary

Then the Newton–Raphson algorithm is used to find the MLE in an iterative

fashion. Starting with an arbitrary initial vector β 0 , we estimate iteratively
∂ log L(β) j
β j +1 = β j + I (β j )−1 (β ).
Many computational packages such as R and MATLAB have the logistic
regression model fitting procedure.
Although we do not have the explicit formulas for the MLE, using the
asymptotic normality of the MLE, the distributions of the estimators can be
approximately determined. For large sample size n, the distribution of β
approximately multivariate normal with means β with the covariance matrix
)−1 .
I (β

1.2.2 Best Model Selection

Consider following full model:
logit(πi ) = β0 + β1 x1 + β2 x2 + · · · + βp xp .
Let β (1) = (β0, · · · ,βq ) and β (2) = (βq+1, · · · ,βp ) . The parameter β (1)
corresponds to the parameters of the reduced model. Then we are interested in
H0 : β (2) = 0.
Define the deviance D of a model as D = −2 log L( π ), which is distributed
asymptotically as χn−p−1 . Let 
π (p) and 
π (q) be the estimated success proba-
bilities for the full and reduced models, and let Dp and Dq be the associated
deviances. Then the log-likelihood ratio statistic for testing β (2) = 0 is
π (p) ) − log L(
2[log L( π (q) )] = Dq − Dp ∼ χp−q

1.2.3 Logistic Discriminant Analysis

Discriminant analysis resulting from the estimated logistic model is called the
logistic discrimination. We classify the ith subject according to a classification
rule. The simplest rule is to assign the ith subject as group 1:
P (Yi = 1) > P (Yi = 0).
This statement is equivalent to πi > 1/2. Depending on the bias and the error
of the estimation, the value 1/2 can be adjusted. For the fitted logistic model,
we classify the ith subject as group 1 if x 
i βi > 0 and as 0 if xi βi < 0.

The plane xi β = 0 is the classification boundary that separates two groups.
1.3 Random Fields 9

The performance of classification technique is measured by the error rate γ ,

the overall probability of misclassification. The cross-validation is used to
estimate the error rate. This is done by randomly partitioning the data into
the training and the testing sets. In the leave-one-out scheme, the training set
consists of n−1 subjects, while the testing set consists of one subject. Suppose
the ith subject is taken as the test set. Then using the training set, we determine
the logistic model. Using the predicted model, we test if the ith subject is
correctly classified. The error rate obtained in this fashion is denoted as e−i .
Note that e−i = 0 if the subject is classified correctly while e−i = 1 if the
subject is misclassified. The leave-one-out error rate is then given by
γ =
 e−i .
To formally test the statistical significance of the discriminant power, we
use Press’s Q statistic (Hair et al., 1998), which is given by
n(2γ − 1)2 ∼ χ12 .
Press’s Q statistic is asymptotically distributed as χ 2 with one degree of

1.3 Random Fields

At the voxel level, it is often necessary to model measurements at each voxel as
a random field. For instance, the deformation field of warping a brain to another
brain is often modeled as a continuous random field (Chung et al., 2001b).
The generalization of a continuous stochastic process defined in R to a higher
dimensional abstract space is called a random field. For an introduction to
random fields, see (Yaglom, 1987; Dougherty, 1999; Adler and Taylor, 2007).
In the random field theory as introduced in (Worsley, 1994; Worsley et al.,
1996b), measurement Y at voxel position x ∈ M is modeled as
Y (x) = μ(x) + (x),
where μ is the unknown functional signal to be estimated and  is the
measurement error, which is modeled as a random variable at each fixed x.
Then the collection of random variables {(x) : x ∈ M} is called a stochastic
process or random field. The more precise measure-theoretic definition can
be found in (Adler and Taylor, 2007). Random field modeling can be done
beyond the usual Euclidean space to curved cortical and subcortical manifolds
(Joshi, 1998; Chung et al., 2003a). Most of concepts in random fields are the
continuous generalization of random vectors.
10 Statistical Preliminary

Definition 1.1 Given a probability space, a random field T (x) defined in Rn

is a function such that for every fixed x ∈ Rn , T (x) is a random variable on
the probability space.
Definition 1.2 The covariance function R(x,y) of a random field T is
defined as
R(x,y) = E T (x) − ET (x) T (y) − ET (y) .
If the joint distribution of T at points x1, · · · ,xm
P T (x1 ) ≤ z1, · · · ,T (xm ) ≤ zm
is invariant under the translation
(x1, · · · ,xm ) → (x1 + τ, · · · ,xm + τ ),
T is said to be stationary or homogeneous.
For a stationary random field T , its covariance function is
R(x,y) = f (x − y)
for some function f . A special case of stationary fields is an isotropic field,
which requires the covariance function to be rotation invariant, i.e.,
R(x,y) = f (|x − y|)
for some function f (Yaglom, 1987).

1.3.1 Gaussian Fields

The most important class of random fields is Gaussian fields. A more rigorous
treatment can be found in Adler and Taylor (2007). Let us start defining a
multivariate normal distribution from a Gaussian random variable.
Definition 1.3 A random vector T = (T1, · · · ,Tm ) is multivariate normal if
i=1 ci Ti is Gaussian for every possible ci ∈ R.

Then a Gaussian random field can be defined from a multivariate normal

Definition 1.4 A random field T is a Gaussian random field if T (x1 ), · · · ,
T (xm ) are multivariate normal for every (x1, · · · ,xm ) ∈ Rm .
An equivalent definition to Definition 1.4 is as follows. T is a Gaussian random
field if the finite joint distribution
P (T (x1 ) ≤ z1, · · · ,T (xm ) ≤ zm )
is a multivariate normal for every (x1, · · · ,xm ).
1.3 Random Fields 11

T is a mean zero Gaussian field if ET (x) = 0 for all x. Because any mean
zero multivariate normal distribution can be completely characterized by its
covariance matrix, a mean zero Gaussian random field T can be similarly
determined by its covariance function R. Two fields T and S are independent
if T (x) and S(y) are independent for every x and y. For mean zero Gaussian
fields T and S, they are independent if and only if the covariance function
R(x,y) = E T (x)T (y)

vanishes for all x and y, which is a very strong assumption.

The Gaussian white noise is a Gaussian random field with the Dirac-delta
function δ as the covariance function. Note the Dirac delta function is defined
as δ(x) = ∞ if x = 0 and δ(x) = 0 if x = 0. Further, δ(x) = 1. Numerically
we can simulate the Dirac-delta function as the limit of the sequence of
Gaussian kernel Kσ when σ → ∞. The Gaussian white noise is simulated
as an independent and identical Gaussian random variable at each voxel.

1.3.2 Derivative of Gaussian Fields

Suppose G is a collection of Gaussian random fields. For given X,Y ∈ G, we
have c1 X + c2 Y ∈ G again for all c1 and c2 . Therefore, G forms an infinite-
dimensional vector space. Any linear combination of Gaussian fields is again
a Gaussian field. We can show that the derivatives of Gaussian fields are also
Gaussian. To see this, we define mean-square convergence.

Definition 1.5 A sequence of random fields Th , indexed by h, converges to T

as h → 0 in mean-square if
lim ETh − T  = 0.

We will denote the mean-square convergence using the usual limit notation:

lim Th = T .

The convergence in mean square implies the convergence in mean. This can be
seen from
 2  2  2
ETh − T  = V Th − T + E|Th − T | .

Now let Th → T in mean square. Each term in the right-hand side should also
converge to zero, proving the statement.
Now we define the derivative of the field in the mean-square sense as
dT (x) T (x + h) − T (x)
= lim .
dx h→0 h
12 Statistical Preliminary

If T (x) and T (x + h) are Gaussian, T (x + h) − T (x) is again Gaussian. Thus,

the limit on the right-hand side is also Gaussian. If R is the covariance function
of the mean zero Gaussian field T , the covariance function of its derivative field
is given by
 dT (x) dT (y)  ∂ 2 R(x,y)
E = .
dx dy ∂x∂y

1.3.3 Integration of Gaussian Fields

The integration of Gaussian fields is also Gaussian. To see this, define the
integration of a random field as the limit of Riemann sum. Let ∪ni=1 Mi be a
partition of M, i.e.,
M = ∪ni=1 Mi and Mi ∩ Mj = ∅ if i = j .
Let xi ∈ Mi and μ(Mi ) be the volume of Mi . Then we define the integration
of field T as
T (x) dx = lim T (xi )μ(Mi ),
M n→∞

where the limit is taken as n → ∞ and μ(Mj ) → 0 for all j . When we

integrate a Gaussian field, it is the limit of a linear combination of Gaussian
random variables so it is again a Gaussian random variable. In general, any
linear operation on Gaussian fields will result in Gaussian fields with different
covariance structures.
We can use a collection of Gaussian fields to construct χ 2 -, t-, F -fields
(Worsley, 1994; Worsley et al., 1996b, 2004; Cao and Worsley, 1999a). The
χ 2 -field with m degrees of freedom is defined as

T (x) = Xi2 (x),

where X1, · · · ,Xm are independent, identically distributed Gaussian fields

with zero mean and unit variance. Similarly, we can define t and F fields as
well as Hotelling’s T 2 field (Thompson et al., 1997; Collins et al., 1998; Joshi,
1998; Cao and Worsley, 1999a; Gaser et al., 1999).

1.3.4 Simulating Gaussian Fields

We show how to simulate smooth Gaussian fields by performing Gaussian
kernel smoothing on white noise. This is perhaps the easiest way of simulating
Gaussian fields.
1.3 Random Fields 13

White noise is defined as a random field whose covariance function is

proportional to the Dirac-delta function δ, i.e.,

R(x,y) ∝ δ(x − y).

For instance, we may take

R(x,y) = lim Kσ (x − y),

σ →0

the limit of the usual isotropic Gaussian kernel. White noise is usually
characterized via generalized functions. One example of white noise is the
generalized derivative of Brownian motion (Wiener process) called Gaussian
white noise.

Definition 1.6 Brownian motion (Wiener process) B(x),x ∈ R+ is a zero

mean Gaussian field with covariance function

RB (x,y) = min(x,y).

Following Definition 1.6, we have VB(x) = x. The increments of Wiener

processes in nonoverlapping intervals are independent identically distributed
(i.i.d.) Gaussian. Further, the paths of the Wiener process are continous while
they are not differentiable (Øksendal, 2010). Higher-dimensional Brownian
motion can be generalized by taking each component of vector fields to be
i.i.d. Brownian motion.
Although the path of the Wiener process is not differentiable, we can define
the generalized derivative via integration by parts with a smooth function f
called a test function in the following way
x dB(y) x f (y)
f (x)B(x) = f (y) dy + B(y) dy.
0 dy 0 dy
Taking the expectation on both sides, we have
x dB(y)
f (y)E dy = 0.
0 dy
It should be true for all smooth f so E dB(y)
dy = 0. Further, it can be shown that
the covariance function of process

dB(y)/dy ∝ δ(x − y).

The Gaussian white noise can be used to construct smooth Gaussian random
fields of the form
X(x) = K ∗ W (x) = K ∗ ,
14 Statistical Preliminary

where K is a Gaussian kernel and W is the generalized derivative of Brownian

motion. Since Brownian motion is a zero-mean Gaussian process, X(x) is
obviously a zero-mean field with the covariance function
RX (x,y) = E[K ∗ W (x)K ∗ W (y)] (1.5)
∝ K(x − z)K(y − z) dz. (1.6)

The case when K is an isotropic Gaussian kernel was investigated by Sieg-

mund and Worsley with respect to optimal filtering in scale space (Siegmund
and Worsley, 1996).
In numerical implementation, we use the discrete white Gaussian noise,
which is simply a Gaussian random variable.
Example 1.1 Let w be a discrete version of white Gaussian noise given by

w(x) = Zi δ(x − xi ),

where i.i.d. Zi ∼ N (0,σw2 ). Note that

K ∗ w(x) = Zi K(x − xi ). (1.7)

The collection of random variables K ∗ w(y1 ), · · · ,K ∗ w(yl ) forms a

multivariate normal at arbitrary points y1, · · · ,yl . Hence, the field K ∗ w(x)
is a Gaussian field.
The covariance function of the field (1.7) is given by

R(x,y) = E(Zi Zj )K(x − xi )K(y − xj ) (1.8)
i,j =1
= σw2 K(x − xi )K(y − xi ). (1.9)

As usual, we may take K to be a Gaussian kernel. Let us simulate some

Gaussian fields.
Example 1.2 Gaussian white noise is generated using w ∼ N (0,0.42 ),
which is shown in the top-left of Figure 1.2. With a Gaussian kernel with
bandwidth 1, iteratively smoother versions of Gaussian random fields are
constructed by
1.3 Random Fields 15


for i=1:10
figure; imagesc(smooth_w)

Figure 1.2 shows one, four, and nine iterations.

Figure 1.2 Gaussian random field simulation. Starting with Gaussian white noise
N (0,0.42 ) (top-left), we iteratively apply Gaussian kernel smoothing one, four,
