Spatial Interpolation: A Brief: Eugene Brusilovskiy
Spatial Interpolation: A Brief: Eugene Brusilovskiy
Spatial Interpolation: A Brief: Eugene Brusilovskiy
Introduction
Eugene Brusilovskiy
General Outline
Introduction to interpolation
Deterministic interpolation methods
Some basic statistical concepts
Autocorrelation and First Law of Geography
Geostatistical Interpolation
Introduction to variography
Kriging models
2
What is Interpolation?
Assume we are dealing with a variable which has meaningful values at every point
within a region (e.g., temperature, elevation, concentration of some mineral).
Then, given the values of that variable at a set of sample points, we can use an
interpolation method to predict values of this variable at every point
For any unknown point, we take some form of weighted average of the values at
surrounding points to predict the value at the point where the value is unknown
In other words, we create a continuous surface from a set of points
As an example used throughout this presentation, imagine we have data on the
concentration of gold in western Pennsylvania at a set of 200 sample locations:
3
Appropriateness of Interpolation
Interpolation should not be used when there isnt a
meaningful value of the variable at every point in space
(within the region of interest)
That is, when points represent merely the presence of events
(e.g., crime), people, or some physical phenomenon (e.g.,
volcanoes, buildings), interpolation does not make sense.
Whereas interpolation tries to predict the value of your
variable of interest at each point, density analysis (available,
for instance, in ArcGISs Spatial Analyst) takes known
quantities of some phenomena and spreads it across the
landscape based on the quantity that is measured at each
location and the spatial relationship of the locations of the
measured quantities.
Source:
http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Un
derstanding_density_analysis
4
Interpolation vs. Extrapolation
Interpolation is prediction within the range of our data
E.g., having temperature values for a bunch of
locations all throughout PA, predict the temperature
values at all other locations within PA
Note that the methods we are talking about are strictly
those of interpolation, and not extrapolation
Extrapolation is prediction outside the range of our data
E.g., having temperature values for a bunch of
locations throughout PA, predict the temperature
values in Kazakhstan
5
First Law of Geography
Everything is related to everything else, but
near things are more related than distant
things.
Waldo Tobler (1970)
Reference: http://www.crwr.utexas.edu/gis/gishydro04/Introduction/TermProjects/Peralvo.pdf 7
Exact vs. Inexact Interpolation
Interpolators can be either exact or inexact
At sampled locations, exact interpolators yield values identical to the
measurements.
I.e., if the observed temperature in city A is 90 degrees, the point
representing city A on the resulting grid will still have the temperature of
90 degrees
At sampled locations, inexact interpolators predict values that are
different from the measured values.
I.e., if the observed temperature in city A is 90 degrees, the inexact
interpolator will still create a prediction for city A, and this prediction will
not be exactly 90 degrees
The resulting surface will not pass through the original point
Can be used to avoid sharp peaks or troughs in the output surface
Model quality can be assessed by the statistics of the differences between
predicted and measured values
Jumping ahead, the two deterministic interpolators that will be briefly
presented here are exact. Kriging can be exact or inexact.
Reference: Burrough, P. A., and R. A. McDonnell. 1998. Principles of geographical information systems. Oxford University Press, 8
Oxford. 333pp.
Part 1. Deterministic Interpolation
9
Inverse Distance Weighted (IDW)
IDW interpolation explicitly relies on the First Law of
Geography. To predict a value for any unmeasured location,
IDW will use the measured values surrounding the prediction
location. Measured values that are nearest to the prediction
location will have greater influence (i.e., weight) on the
predicted value at that unknown point than those that are
farther away.
Thus, IDW assumes that each measured point has a local influence
that diminishes with distance (or distance to the power of q > 1), and
weighs the points closer to the prediction location greater than those
farther away, hence the name inverse distance weighted.
Inverse Squared Distance (i.e., q=2) is a widely used interpolator
For example, ArcGIS allows you to select the value of q.
Weights of each measured point are proportional to the
inverse distance raised to the power value q. As a result, as
the distance increases, the weights decrease rapidly. How fast
the weights decrease is dependent on the value for q.
10
Source: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=How_Inverse_Distance_Weighted_(IDW)_interpolation_works
Inverse Distance Weighted - Continued
Because things that are close to one another are more alike
than those farther away, as the locations get farther away, the
measured values will have little relationship with the value of
the prediction location.
To speed up the computation we might only use several points that
are the closest
As a result, it is common practice to limit the number of measured
values that are used when predicting the unknown value for a location
by specifying a search neighborhood. The specified shape of the
neighborhood restricts how far and where to look for the measured
values to be used in the prediction. Other neighborhood parameters
restrict the locations that will be used within that shape.
The output surface is sensitive to clustering and the presence
of outliers.
11
Search Neighborhood Specification
5 nearest neighbors
with known values
(shown in red)
of the unknown point
(shown in black)
will be used to
determine its value
Points with known values of elevation that are outside the circle are just too far from the
target point at which the elevation value is unknown, so their weights are pretty much 0.
12
The Accuracy of the Results
One way to assess the accuracy of the interpolation is known
as cross-validation
Remember the initial goal: use all the measured points to create a
surface
However, assume we remove one of the measured points from our
input, and re-create the surface using all the remaining points.
Now, we can look at the predicted value at that removed point and
compare it to the points actual value!
We do the same thing for all the points
If the average (squared) difference between the actual value and the
prediction is small, then our model is doing a good job at predicting
values at unknown points. If this average squared difference is large,
then the model isnt that great. This average squared difference is
called mean square error of prediction. For instance, the Geostatistical
Analyst of ESRI reports the square root of this average squared
difference
Cross-validation is used in other interpolation methods as well
13
A Cross-Validation Example
Assume you have measurements at 15 data points,
from which you want to create a prediction surface
The Measured column tells you the measured value
at that point. The Predicted column tells you the
prediction at that point when we remove it from the
input (i.e., use the other 14 points to create a
surface). The Error column is simply the difference
between the measured and predicted values.
Because we can have an over-prediction or under-
prediction at any point, the error can be positive or
negative. So averaging the errors wont do us much
good if we want to see the overall error well end
up with a value that is essentially zero due to these
positives and negatives
Thus, in order to assess the extent of error in our
prediction, we square each term, and then take the
average of these squared errors. This average is called
the mean squared error (MSE)
For example, ArcGIS reports the square root of this
mean squared error (referred to as simply Root-
Mean-Square in Geostatistical Analyst). This root
mean square error is often denoted as RMSE.
14
Examples of IDW with Different qs
q=1
Gold concentrations at locations
in western PA
q=2
q=3
16
Before we do any Geostatistics
17
Normality
A lot of statistical tests including many in
geostatistics rely on the assumption that the
data are normally distributed
When this assumption does not hold, the
results are often inaccurate
18
N=140
19
Data Transformations
Sometimes, it is possible to transform a variables distribution by
subjecting it to some simple algebraic operation.
The logarithmic transformation is the most widely used to achieve
normality when the variable is positively skewed (as in the image on
the left below)
Analysis is then performed on the transformed variable.
20
The Mean and the Variance
The mean (average) of a variable is also known as the
expected value
Usually denoted by the Greek letter
As an aside, for a normally distributed variable, the mean
is equal to the median
The variance is a measure of dispersion of a variable
Calculated as the average squared distance of the possible
values of the variable from mean.
Standard deviation is the square root of the variance
Standard deviation is generally denoted by the Greek letter
, and variance is therefore denoted by
21
Example: Calculation of Mean and Variance
Person Test Score Distance from the Mean (Distance from the Mean) Squared
1 90 15 225
2 55 -20 400
3 100 25 625
4 55 -20 400
5 85 10 100
6 70 -5 25
7 80 5 25
8 30 -45 2025
9 95 20 400
10 90 15 225
Variance: 445 (Average of the
Mean: 75
entries in this column)
Standard deviation (Square root of
the variance): 21.1
22
Covariance and Correlation
Defined as a measure of how much two variables X
and Y change together
The units of Cov (X, Y) are those of X multiplied by those of Y
The covariance of a variable X with itself is simply the
variance of X
Since these units are fairly obscure, a dimensionless
measure of the strength of the relationship between
variables is often used instead. This measure is known
as the correlation.
Correlations range from -1 to 1, with positive values close to
one indicating a strong direct relationship and negative
values close to -1 indicating a strong inverse relationship
23
Spatial Autocorrelation
Sometimes, rather than examining the association
between two variables, we might look at the relationship
of values within a single variable at different time points
or locations
There is said to be (positive) autocorrelation in a variable
if observations that are closer to each other in space
have related values (recall Toblers Law)
As an aside, there could also be temporal autocorrelation
i.e., values of a variable at points close in time will be
related
24
Examples of Spatial Autocorrelation
(Source: http://image.weather.com/images/maps/current/acttemp_720x486.jpg)
25
Examples of Spatial Autocorrelation (Contd)
(Source: http://capita.wustl.edu/CAPITA/CapitaReports/localPM10/gifs/elevatn.gif) 26
Regression
A statistical method used to examine the relationship
between a variable of interest and one or more
explanatory variables
Strength of the relationship
Direction of the relationship
Often referred to as Ordinary Least Squares (OLS)
regression
Available in all statistical packages
Note that the presence of a relationship does not
imply causality
27
For the purposes of demonstration, lets focus
on a simple version of this problem
28
But what does a regression do? An example with a single predictor
29
The example on the previous page can be
easily extended to cases when we have
more than one predictor
When we have n>1 predictors, rather than getting a line in 2 dimensions,
we get a line in n+1 dimensions (the +1 accounts for the dependent
variable)
Each independent variable will have its own slope coefficient which will
indicate the relationship of that particular predictor with the dependent
variable, controlling for all other independent variables in the regression.
The equation of the best fit line becomes
Dep. Variable = m1*predictor1 + m2*predictor2 + + m3*predictor 3 + b + residuals
where the ms are the coefficients of the corresponding predictors and b is
the y-intercept term
The coefficient of each predictor may be interpreted as the amount by
which the dependent variable changes as the independent variable
increases by one unit (holding all other variables constant)
30
Some (Very) Basic Regression Diagnostics
R-squared: the percent of variance in the dependent variable that
is explained by the independent variables
The so-called p-value of the coefficient
The probability of getting a coefficient (slope) value as far from zero as we
observe in the case when the slope is actually zero
When p is less than 0.05, the independent variable is considered to be a
statistically significant predictor of the dependent variable
One p-value per independent variable
The sign of the coefficient of the independent variable (i.e., the
slope of the regression line)
One coefficient per independent variable
Indicates whether the relationship between the dependent and
independent variables is positive or negative
We should look at the sign when the coefficient is statistically significant
31
Some (but not all) regression assumptions
32
Part 3. Geostatistical Interpolation
33
Origins
Involve a set of statistical techniques called
Kriging (there are a bunch of different Kriging
methods)
Kriging is named after Danie Gerhardus Krige, a
South African mining engineer who presented
the ideas in his masters thesis in 1951. These
ideas were later formalized by a prominent
French mathematician Georges Matheron
For more information, see: Danie Gerhardus Krige
Krige, Danie G. (1951). "A statistical
approach to some basic mine valuation
problems on the Witwatersrand". J. of the Georges Matheron
Chem., Metal. and Mining Soc. of South
Africa 52 (6): 119139.
Matheron, Georges (1962). Trait de
gostatistique applique, Editions Technip,
France
Kriging has two parts: the quantification of the
spatial structure in the data (called variography)
and prediction of values at unknown points
35
Global and Local Structure
Without any a priori knowledge about the distribution of gold in Western PA,
we have no theoretical reason to expect to find different concentrations of
gold at different locations in that region.
I.e., theoretically, the expected value of gold concentration should not vary with
latitude and longitude
In other words, we would expect that there is some general, average, value of
gold concentration (called global structure) that is constant throughout the region
(even though we assume its constant, we do not know what its value is)
Of course, when we look at the data, we see that there is some variability in
the gold concentrations at different points. We can consider this to be a local
deviation from the overall global structure, known as the local structure or
residual or error term.
In other words, geostatisticians would decompose the value of gold Y(p) into
the global structure (p) and local structure (p).
Y(p) = (p) + (p)
36
(p)
As per the First Law of Geography, the local
structures (p) of nearby observations will often be
correlated. That is, there is still some meaningful
information (i.e., spatial dependencies) that can be
extracted from the spatially dependent component
of the residuals.
So, our ordinary kriging model will:
Estimate this constant but unknown global structure (p),
and
Incorporate the dependencies among the residuals (p).
Doing so will enable us to create a continuous surface of
gold concentration in western PA.
37
Assumptions of Ordinary Kriging
For the sake of the methods that we will be employing, we need to
make some assumptions:
Y(p) should be normally distributed
The global structure (p) is constant and unknown (as in the
gold example)
Covariance between values of depends only on distance
between the points,
To put it more formally, for each distance h and each pair of
locations p and t within the region of interest that are h units are
apart, there exists a common covariance value, C(h), such that
covariance [(p), (t)] = C(h).
This is called isotropy
38
Covariance and Distance
From the First Law of Geography it would then follow that as distance
between points increases, the similarity (i.e., covariance or correlation)
between the values at these points decreases
If we plot this out, with inter-point distance h on the x-axis, and
covariance C(h) on the y-axis, we get a graph that looks something like
the one below. This representation of covariance as a function of distance
is called as the covariogram
Alternatively, we can plot correlation against distance (the correlogram)
39
Covariograms and Weights
Geostatistical methods incorporate this covariance-
distance relationship into the interpolation models
More specifically, this information is used to calculate the
weights
As IDW, kriging is a weighted average of points in the
vicinity
Recall that in IDW, in order to predict the value at an unknown
point, we assume that nearer points will have higher weights (i.e.,
weights are determined based on distance)
43
Mathematical definition of a variogram
(h) average Y (i ) Y ( j ) 2
1
2
In other words, for each distance h between 0 and the bandwidth
Find all pairs of points i and j that are separated by that distance h
For each such point pair, subtract the value of Y at point j from the
value of Y at point i, and square the difference
Average these square distances across all point pairs and divide the
average by 2. Thats your variogram value!
Division by 2 -> hence the occasionally used name semi-variogram
However, in practice, there will generally be only one pair of points that
are exactly h units apart, unless were dealing with regularly spaced
samples. Therefore, we create bins, or distance ranges, into which we
place point pairs with similar distances, and estimate only for midpoints
of these bins rather than at all individual distances.
These bins are generally of the same size
Its a rule of thumb to have at least 30 point pairs per bin
We call these estimates of (h) at the bin midpoints the empirical
variogram 44
Fitting a Variogram Model
Now, were going to fit a variogram model (i.e., curve) to the
empirical variogram
That is, based on the shape of the empirical variogram,
different variogram curves might be fit
The curve fitting generally employs the method of least
squares the same method thats used in regression analysis
48
The Spherical Model
The spherical model is the most widely used variogram model
Monotonically non-decreasing
I.e., as h increases, the value of (h) does not decrease - i.e., it goes up (until hr) or
stays the same (h>r)
(hr)=s and C(hr)=0
That is, covariance is assumed to be exactly zero at distances hr
49
The Exponential Model
The exponential variogram looks very similar to the spherical model, but assumes
that the correlation never reaches exactly zero, regardless of how great the
distances between points are
In other words, the variogram approaches the value of the sill asymptotically
Because the sill is never actually reached, the range is generally considered to be
the smallest distance after which the covariance is 5% or less of the maximum
covariance
The model is monotonically increasing
I.e., as h goes up, so does (h)
50
The Wave (AKA Hole-Effect) Model
On the picture to the left, the waves exhibit a
periodic pattern. A non-standard form of spatial
autocorrelation applies. Peaks are similar in values
to other peaks, and troughs are similar in values to
other troughs. However, note the dampening in the
covariogram and variogram below: That is, peaks
that are closer together have values that are more
correlated than peaks that are father apart (and
same holds for troughs).
51
Variograms and Kriging Weights
52
Reviewing Ordinary Kriging
Again, ordinary kriging will:
Give us an estimate of the constant but unknown global
structure (p), and
Use variography to examine the dependencies among the
residuals (p) and to create kriging weights.
We calculate the distances between the unknown point at which
we want to make a prediction and the measured points that are
nearby and use the value of the covariogram for those distances to
calculate the weight of each of these surrounding measured
points.
The end result is, of course, a continuous prediction
surface
Prediction standard errors can also be obtained this
is a surface indicating the accuracy of prediction
53
Universal Kriging
Now, take another example: imagine we have data on the
temperature at 100 different weather stations (call them
w1..w100) throughout Florida, and we want to predict the
values of temperature (T) at every point w in the entire state
using these data.
Notation: temperature at point w is denoted by T(w)
We know that temperature at lower latitudes are expected to
be higher. So, T(w) will be expected to vary with latitude
Ordinary kriging is not appropriate here, because it assumes that the
global structure is the same everywhere. This is clearly not the case
here.
A method called universal kriging allows for a non-constant global
structure
We might model the global structure as in regression:
w 0 1 latitude ( w)
Everything else in universal kriging is pretty much the same as in ordinary
kriging (e.g., variography)
54
Some More Advanced Techniques
Indicator Kriging is a geostatistical interpolation
method does not require the data to be normally
distributed.
Co-kriging is an interpolation technique that is used
when there is a second variable that is strongly
correlated with the variable from which were trying to
create a surface, and which is sampled at the same set
of locations as our variable of interest and at a number
of additional locations.
For more details on indicator kriging and co-kriging,
see one of the texts suggested at the end of this
presentation
55
Isotropy vs. Anisotropy
When we use isotropic (or omnidirectional)
covariograms, we assume that the covariance
between the point values depends only on distance
Recall the covariance stationarity assumption
Anisotropic (or directional) covariograms are used
when we have reason to believe that direction plays
a role as well (i.e., covariance is a function of both
distance and direction)
E.g., in some problems, accounting for direction is
appropriate (e.g., when wind or water currents might be a
factor)
57
These are interpolation results using the gold data in Western PA (IDW vs. Ordinary Kriging)
Some Widely Used Texts on Geostatistics
Bailey, T.C. and Gatrell, A.C. (1995) Interactive
Spatial Data Analysis. Addison Wesley Longman,
Harlow, Essex.
Cressie, N.A.C. (1993) Statistics for Spatial Data.
(Revised Edition). Wiley, John & Sons, Inc.,
Isaaks, E.H. and Srivastava, R.M. (1989) An
Introduction to Applied Geostatistics. Oxford
University Press, New York, 561 p.
58