5 Soberon 2009 Niches and Distributional Areas - Concepts, Methods, and Assumptions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Niches and distributional areas: Concepts, methods,

and assumptions
Jorge Soberóna,1 and Miguel Nakamurab
aBiodiversity Institute, University of Kansas, Dyche Hall, 1345 Jayhawk Boulevard, Lawrence, KS 66045; and bCentro de Investigación en Matemáticas,
A. C. Jalisco s/n, Col. Valenciana, Guanajuato, 36240, México

Edited by Elizabeth A. Hadly, Stanford University, Stanford, CA, and accepted by the Editorial Board August 28, 2009 (received for review March 31, 2009)

Estimating actual and potential areas of distribution of species via aged to develop and apply sophisticated software to widely
ecological niche modeling has become a very active field of available databases. The resulting explosion of work focused on
research, yet important conceptual issues in this field remain applying these methods to a large number of species and
confused. We argue that conceptual clarity is enhanced by adopt- problems and addressing important, but mostly methodological,
ing restricted definitions of ‘‘niche’’ that enable operational defi- issues, like sensitivity to number of occurrence records (12), ratio
nitions of basic concepts like fundamental, potential, and realized of presences to absences (13), grain of the environmental layers
niches and potential and actual distributional areas. We apply (14), different types of absences (15), and other technical points
these definitions to the question of niche conservatism, addressing (16). A smaller number of papers have focused on the funda-
what it is that is conserved and showing with a quantitative mental ecological and mathematical issues underlying the work-
example how niche change can be measured. In this example, we ing of ENM in any of its forms (8, 10, 11, 17–20).
display the extremely irregular structure of niche space, arguing In this article we focus explicitly on concepts. First, we argue
that it is an important factor in understanding niche evolution. in favor of restricted definitions of niche, inspired by Grinnell’s
Many cases of apparently successful models of distributions ignore early use of the term, because, at the cost of losing some
biotic factors: we suggest explanations to account for this paradox. generality, this step leads to operational and straightforward
Finally, relating the probability of observing a species to ecological concepts for niches and corresponding distributional areas.
factors, we address the issue of what objects are actually calculated Second, we use these operational niche concepts to clarify the
by different niche modeling algorithms and stress the fact that term ‘‘niche conservatism’’ and propose ways of measuring it.
methods that use only presence data calculate very different Third, we discuss the role of biotic interactions as factors that
quantities than methods that use absence data. We conclude that need to be understood thoroughly for a correct appreciation of
the results of niche modeling exercises can be interpreted much the potential and limitations of ENM in estimating distributional
better if the ecological and mathematical assumptions of the areas. Finally, using a probabilistic approach, we discuss major
modeling process are made explicit. differences between several main types of ENM algorithms.

distributions modeling 兩 Grinnellian niches 兩 niche modeling 兩 A Tale of Two Niches


scenopoetic variables The first step is to state explicitly what is meant by the word
niche. Niche concepts are numerous (21). To offer two examples

S ome of the most fundamental ideas about distributional


areas of species were presented by Joseph Grinnell ⬎90 years
ago (1–3), in a series of papers describing the contrasting roles
from recent literature, one view (21, 22) sees niche as the joint
specification of requirements of resources that permit positive
growth rate of a population, together with its impacts. Chase and
of different types of environmental factors acting at different Leibold (21) also included effects of predators, parasites, and
scales. More recently, George Evelyn Hutchinson developed noninteractive stressors in their definition. This niche is defined
seminal ideas about ecological niches and their relation with by sets of zero-growth isoclines in resource space, together with
areas of distribution (4, 5). However, it took much more time to impact vectors and resource supply points. The other view (23)
develop the data and analytical techniques required to put these is niche as a subset of environmental conditions under which
ideas into widespread practice. The suite of methods variously populations of a species have positive growth rates. These
called species distribution modeling, habitat modeling, or eco- environmental dimensions mostly characterize climatic or other
logical niche modeling (ENM) (6–9) all have a similar purpose: physical factors.
to identify places suitable for the survival of populations of a Many other meanings have been applied to the term (21). In
species via identification of their environmental requirements. their broadest sense, most definitions of niche intend to specify
Although, as we will discuss, strictly speaking, modeling a habitat the environments that allow a population to survive, but they
or a distribution is not synonymous with modeling a niche, for differ in the emphasis placed on key points. For example, the two
the sake of brevity we will refer to all of these methods as ENM niche concepts cited above differ in types of variables used
and will make the appropriate distinctions when necessary. [resources or other dynamically linked requirements, vs. rela-
ENM has received greatly increased attention in the last 10–15 tively static conditions, which are the bionomic and scenopoetic
years. Essentially, it is a technique used to estimate actual or variables of Hutchinson (5), respectively]; the abstract objects
potential areas of distribution, or sets of favorable habitats for a
given species, on the basis of its observed presences and (some-
This paper results from the Arthur M. Sackler Colloquium of the National Academy of
times) absences. These methods relate ‘‘niches’’ to ‘‘areas of
Sciences, ‘‘Biogeography, Changing Climates and Niche Evolution,’’ held December 12–13,
distribution.’’ The quotes are used to indicate that rigorous 2008, at the Arnold and Mabel Beckman Center of the National Academies of Sciences and
definitions of those concepts have not as yet been presented. Engineering in Irvine, CA. The complete program and audio files of most presentations are
Although in the past few years the field has matured consid- available on the NAS web site at www.nasonline.org/Sackler㛭Biogeography.
erably, several conceptual problems still remain. It is not an Author contributions: J.S. and M.N. performed research and J.S. wrote the paper.
exaggeration to say that no consensus exists about what it is that The authors declare no conflict of interest.
the different methods model (10, 11). Even without widespread This article is a PNAS Direct Submission. E.A.H. is a guest editor invited by the Editorial
agreement about terminology and concepts, researchers in the Board.
field of modeling niches, habitats, and distributions have man- 1To whom correspondence should be addressed. E-mail: [email protected].

19644 –19650 兩 PNAS 兩 November 17, 2009 兩 vol. 106 兩 suppl. 2 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0901637106
lation growth rate (the growth rate at low densities of itself and of
negative interactors) of a species would be positive (24). This
meaning is identical to Hutchinson’s FN, if restricted to scenopoetic
variables. FNs can best be calculated experimentally, or on the basis
of biophysical first principles (28). The subset E 艚 NF was called by
Jackson and Overpeck (23) the potential niche (PN). It is simply the
part of the FN that actually exists in a given region and time,
represented in Fig. 1 A by the points within the ellipse. It is clear that
a PN may often be substantially smaller than its corresponding FN.
Finally, the realized niche (RN) is the part of the PN that the species
would actually use, after the effects of competitors and predators
are taken into account: it is the subset of E in which the species
would have source populations even in the presence of competitors
and other negative interactors. In other words, the RN corresponds
to areas of existing source populations (24, 29). Besides interactions,
dispersal disequilibrium can also prevent a species from fully
occupying its PN (23).
A duality (see ref. 77) of environmental and geographic spaces
exists that was first formally expressed by Hutchinson (4). By
restricting discussion to Grinnellian niches, the duality immedi-
ately becomes operational, because scenopoetic variables are
easily made to correspond to cells in geographic grids. Hence, in
Fig. 1 each point in the graph corresponds to a single cell in a grid
(denoted by G) of 10 min of arc resolution covering the entirety
of the Americas. In general, every cell in geographic space can
be characterized uniquely by using enough environmental vari-
ables, so it is possible to establish one-to-one relationships
Fig. 1. The duality of environmental and geographical spaces. (A) An
between G and E. However, geographic projections of the
example of an E-space in two dimensions (the two first principal components
of 19 bioclimatic variables across the Americas) at a resolution of 10 min of arc.
environmental subsets can have complicated structures (8, 24).
Each of 156,932 black dots represents an existing combination of principal This structure is illustrated in Fig. 1, where the blue regions in
components. Notice the irregular shape and structure of the E-space. The blue the map correspond to the climates enclosed in the blue ellipse,
ellipse represents a hypothetical FN. The set of blue dots inside the ellipse is the which by definition constitute the PN. Note that regular shapes
PN, which in this instance contains 2,232 elements. (B) The projection of the PN in E may correspond to rather irregular and fragmented shapes
in A to geographical space. The environmental combinations contained in the in geographical space.
PN project to four disjoint geographic areas, Mexico, Brazil, Ecuador–Peru, A brief digression is needed to highlight the point that scaling
and Colombia. A species with FN as depicted in A would have potentially presents some thorny issues. Ideally, grid resolution should be
favorable conditions in every blue region in the map in B.
established by biological considerations of the size, mobility, and
ecology of the species. However, considerations of data avail-
constituting niches (sets of vectors vs. regions in phase-spaces); ability often become dominant (30). Also, changing the resolu-
and the spatial and temporal scales at which the definitions are tion of the geographic grid creates an instance of the modifiable
meaningful (local, and commensurate with the activities of areal unit problem (31, 32), a difficult conceptual problem in
geography that in our context means that varying resolution may
individuals vs. biogeographic, and commensurate with species’
lead to different estimates of niches. Given this general problem,
distributions). Using a single term to denote such disparate
niche modelers should always report the specifics of the grids
meanings considerably hinders understanding.
they use and the precision at which variables are measured.
In this article we distinguish between Eltonian niches (24), which
A heuristic scheme useful for analyzing the interplay between
are spatially fine-grained and based on variables related to ecolog- movements, abiotic, and scenopoetic environments is the BAM
ical interactions and resource consumption, and Jackson and Over- (biotic, abiotic, and movements) diagram (19), shown in Fig. 2.
peck’s meaning (23), the Grinnellian niches (24, 25). Using only We denote by A the region in the geographic space where the PN
noninteractive, nonconsumable scenopoetic variables as axes in the scenopoetic conditions occur. B is the region where biotic
multidimensional niche space is what characterizes Grinnellian conditions would allow existence of viable populations, deter-
niches. Not mixing scenopoetic and bionomic axes is a simplification mined mainly by Eltonian factors. Finally, M is the region that
that departs from tradition (4, 26) and requires qualifications and has been accessible to dispersal or colonization by the species
caveats. We refer the reader to published literature about these over some relevant time interval. The intersection G0 ⫽ A 艚 B
details (19, 23, 24), but we stress that restricting niche definitions by 艚 M represents the area actually occupied by the species, and
type of variable leads to comparative simplicity of definitions and G1 ⫽ A 艚 B 艚 Mc represents a potentially invasible area (it has
operations and permits the use of terabytes of freely accessible the correct abiotic and biotic conditions but remains outside
datasets characterizing scenopoetic variables. reach). The regions in the BAM diagram represent a static view
In Fig. 1, we illustrate some important concepts of Grinnellian of a complicated, spatially explicit, mutispecies model (24), but
niche theory. The cloud of points represents combinations of the their simplicity is helpful in discussing several conceptual prob-
two first principal components extracted from the 19 bioclimatic lems. The different subfields of niche or distributions modeling
variables of the project WorldClim (27) across North and South represent different approaches to estimating the regions in the
America, estimated at a resolution of 10 min of arc. This set of BAM diagram and/or their corresponding environmental
points is a 2D view of the environmental space of Jackson and features.
Overpeck (23), denoted by E. It is important to notice how irregular
it is in both shape and internal structure. The blue ellipse represents Evolving the Niche, or Not
a hypothetical fundamental niche (FN), NF, namely the set of Evolutionary factors are not included in the previous framework.
combinations of those two variables for which the intrinsic popu- Still, a fundamental assumption underlying many applications of

Soberón and Nakamura PNAS 兩 November 17, 2009 兩 vol. 106 兩 suppl. 2 兩 19645
PA(g) G A 0.002

B
A GI
0.001
GO

Precipitation
PB(g)

0.000

M
PM(g)
-0.001
Fig. 2. A BAM diagram (24), which is an abstract representation of geo-
graphic space. Set A represents regions in space where the FN (or PN) occurs.
The probability PA(g) is high for cells belonging to A. Region B represents 0.002
regions where the biological conditions (competitors, predators, diseases) is B
favorable, and the value of PB(g) would be high for cells within B. The M region
represents regions to which the species has access because of its movement
and colonizing capacities and the structure of barriers and distances, within a
specified period, with corresponding high values of PM(g). GO represents the 0.001
actual area of distribution of the species, where abiotic and biotic conditions
are favorable and within reach to dispersing individuals. GI is a potential area

Precipitation
of distribution, invasible if the structure of M changes. F, observations of
presence; E, observations of true absences of the species.
0.000

ENM, and specifically in transferring niche predictions across


space or in time, is that niches are ‘‘conserved’’ (33). Niche
conservatism refers to the empirical evidence (34) and theoret-
ical arguments (35, 36) showing that, to some extent, niches -0.001
appear to evolve relatively slowly within lineages. Evidence
includes phylogenetic inertia in ecological characters and the
capacity to predict the geographic potential of invasions of -0.004 -0.002 0.000 0.002 0.004 0.006 0.008
species using data on the environments used by ancestral pop- Temperature
ulations (37–42). The term ‘‘niche conservatism’’ is, unfortu- Fig. 3. The same niche in two regions of E-space. (A) A subset of E-space is
nately, too vague: what features of niche are being conserved, shown in light gray. The two dimensions are standardized mean annual temper-
and what is the meaning of conservatism? One way of making the ature and annual precipitation, centered on the Argentine region where the
term specific and quantifiable is by measuring features of specific moth C. cactorum occurs. The blue squares are reported occurrences of C. cacto-
types of niches and studying their rates of change through time. rum. The blue ellipse is a hypothetical representation of the FN of C. cactorum. (B)
Focus on Grinnellian niches permits these steps. Both RNs and Another subset of the same E-space, now centered on the region of northern
Florida that C. cactorum has invaded in the last 10 years. The units and the scale
FNs can change, but because the causes of change in RN may be
of A and B are the same. The hypothetical FN is placed exactly in its original
ecological (i.e., release from competitors) rather than evolution- Argentine position. The red diamonds are reported occurrences of C. cactorum in
ary (43) conservatism in a strict sense should refer to the FN. Florida. It is apparent that the structure of the E-space in the two regions is very
Because Grinnellian niches are subsets of a multidimensional different. C. cactorum in Florida occupies regions of similar temperature but
space, several things can change independently in them, for higher precipitation than in Argentine. Whether this difference reflects different
example, position, size, and shape (23, 40, 42). Niche conserva- availability of climates or a true evolution in the FN of the moth cannot be
tism means ‘‘slow’’ temporal changes in the position, size, or determined with the available data.
shape of the FN. We do not know the shapes of FNs, but it may
be reasonable to hypothesize that they are convex in the existing
idea by using data of the prickly pear moth, Cactoblastis cactorum,
multidimensional space; hence, as a preliminary hypothesis, we
visualize a FN as a multidimensional ellipsoid of the form a native of southern South America, which has now established
NF(t) ⫽ [x ⫺ ␮(t)]M⫺1(t)[x ⫺ ␮(t)]T ⫺ 1, where the x is a vector populations in northern Florida (45). We use standardized mean
of values of the scenopoetic variables being used, the vector ␮(t) annual temperature and annual precipitation at 10 min of arc of
represents the centroid (the position) of NF(t), the matrix M(t) spatial resolution (27) to represent the niches of C. cactorum.
is the variance-covariance matrix of NF(t), and T indicates matrix Merely for the purpose of illustration, a simple enclosing of the
transposition. The determinant of M⫺1(t) would be a measure of observed data are used to estimate the position of C. cactorum’s
niche size, because it is proportional to the volume of the niche in the E-space in Argentina, ca. 1920. The same ellipsoid is
ellipsoid. With these conventions, change in size of the FN in a then plotted in the E-space in Florida, with data of C. cactorum
time interval ⌬t is proportional to a number (ⱍM⫺1(t)ⱍ ⫺ ⱍM⫺1(t ⫹ presences taken there in 2000. The position of the niche is seen to
⌬t)ⱍ)/⌬t. The change in position per unit time is a vector: have changed with the invasion to the new distributional area (see
[␮(t) ⫺ ␮(t ⫹ ⌬t)]/⌬t, displaying the magnitude of change along Fig. 3). Notice that, for the purpose of illustration, we are using a
each dimension. very simple approximation to the RN of C. cactorum, rather than
2It would desirable to define units by which to measure niche an unknown FN.
evolution (see ref. 78). Following Haldane (44), in principle, it is The proportional rate of change in these two scenopoetic vari-
feasible to measure the rate of change of both size and position of ables (standardized mean temperature and yearly precipitation) is
the FN in terms of a proportional change per year. We illustrate this {ⱍTemp2000 ⫺ Temp1920ⱍ/(80 years ⫻ Temp1920)} ⫽ 1.4 ⫻ 10⫺3/year,

19646 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0901637106 Soberón and Nakamura


and the change in the mean precipitation is {ⱍPrecip2000 ⫺ esis. For example, for some species, interactions may be ex-
Precip1920ⱍ/(80 years ⫻ Precip1920)} ⫽ 1.29 ⫻ 10⫺2/year. Change tremely important determinants of abundance at spatial resolu-
has thus been 10 times faster in one of the dimensions of the tions much smaller than the coarse-grained scales typically used
niche (precipitation) than in temperature. These rates were in ENM, but at these coarse resolutions the effect may be
estimated for the RN by using the centroid in E of observations averaged-out, leading to simple ‘‘presence’’ of the species in the
in the Argentine range in 1920 versus the centroid of observa- much larger cells of a distribution map (24, 56). In cases in which
tions in the Florida area of expansion in the decade of 2000s. Is local interactions do have impacts on distributions at geographic
this shift an example of niche evolution (33, 46)? In this example, extents (57, 58), the Eltonian Noise Hypothesis is falsified.
only the RN was estimated, and it may be difficult to disentangle It is perfectly feasible to map parts of A based on species’
ecological and evolutionary factors affecting changes in it (33, distributional characteristics in relation to the values of coarse-
40, 46). Moreover, the same FN may yield quite different PNs in grained scenopoetic variables. Such maps mostly show smoothly
different regions of the planet, even in the absence of compet- changing patterns, with obvious effects of elevation, slope ori-
itive or predator release. In the case in C. cactorum actual entation, climate patterns, etc. How would a map of B look? In
distributions of values of precipitation and temperature in one of the scarce works reporting on the spatial structure of
northern Florida and northern Argentina are radically different, mortality causes over the entire distribution of a species, Brewer
as displayed in Fig. 3. Therefore, the same FN (i.e., same position
and Gaston (54) showed that biotic mortality factors affecting
and size), hypothetically illustrated by the blue ellipses in Fig. 3,
populations of a leaf-miner varied significantly across localities.
is expressed as quite different PNs in Argentina (Fig. 3A) versus
Thompson (48) reviewed other examples, confirming the pos-
Florida (Fig. 3B). The position of the estimated niche has indeed
shifted over the span of time and space (blue diamonds, Fig. 3A sibility that the details and effects of bionomic interactions, such
versus red squares, Fig. 3B), but this change may result simply as the presence and impact of mutualists, competitors, and
from the fact that available E-space in Florida has a structure predators can change dramatically across the geographic distri-
very different from that of the original in Argentina. This effect bution of a species. These findings suggest that the BAM
should be borne in mind when interpreting niche shifts based on diagram’s abstract representation of B as a compact circle may
correlative ENMs, because, as we will see, they estimate niches be misleading. The sets A and B probably have rather contrasting
that only in very specific cases can be unequivocally identified as spatial structures, A with long-ranged autocorrelations, and B
PN or RN. requiring fine-grained spatial characterization. Without a much
The evolutionary meaning of changes in position versus larger empirical database about such factors, the relative roles of
changes in size of fundamental Grinnellian niches is different. If Grinnellian and Eltonian factors in determining distributions
the centroid of the FN changes very slowly, predictions of would be difficult to assess.
geographic potential of invasion of new zones become more
feasible. If it is just the size of the FN that changes slowly, No Silver Bullets
however, and the centroid is mobile, then prediction of the A number of recent papers have set out to compare the
potential geography of invasions becomes problematic. Stating performance of different algorithms (among the dozens avail-
the problem of measuring niche evolution in Grinnellian terms able) for estimating ecological niches (15, 59, 60). Such analysis
clarifies concepts significantly. Unfortunately, studying relative generally resort to a few measures of performance, like the
rates of change in niche size versus niche position of FNs is receiver operating characteristic (61), which are applied indis-
fraught with difficulties because FNs can only be estimated by criminately to diverse niche modeling methods. However, dif-
experimental methods (28). ferent methods calculate different objects, have different as-
sumptions, and may use different kinds of data. The depth of
The Eltonian Noise Hypothesis these differences is perhaps not widely appreciated. In correla-
For many decades after Hutchinson (4) first proposed the tive approaches to modeling distributions, or their corresponding
distinction between the FNs and the RNs, it was almost axiom- niches, the input data consist of observations of presences of the
atic in ecological theory to assume that competitive interactions species and sometimes its absence. The lack of absence data is
(more generally, negative interactions) would reduce FN to the often regarded as simply a case of low-quality data (15); how-
RN. Is this assumption valid for Grinnellian niches (17)? There ever, as we will see, the lack of absence data fundamentally alters
is a complicating factor. The milieu (47) of Eltonian factors the nature of the problem. What can and cannot be estimated is
represented by B is in practice very difficult to represent as static different when absence data are lacking. Moreover, there are
values assigned to a grid, the way that scenopoetic variables are different types of absences that should not be treated symmet-
used to construct the space E. For most species, their biological
rically. An absence from an inaccessible area with suitable
milieu is simply too fine-grained and dynamic in time and space
environment is not the same as one from the reciprocal situation.
(48) as to permit mapping it at high sampling density over an
If one is modeling an area of distribution (GO in Fig. 2), all
entire geographic distribution.
The immediate question, then, is how feasible is predicting absences are informative. If the objective is to model niches,
distributional areas without resorting to data pertinent to B? absences from the region GI constitute incomplete data. There-
Lack of documentation of the role of biotic interactions in ENM fore, presence-only and presence–absence problems are rather
has been often mentioned as an important limitation to reliable different, and the algorithms (7) applied to them should be
predictions of species’ distributions (49) and examples exist in conceptually appropriate to their particularities (15).
which their inclusion improves predictions of distributions (50). To clarify this we resort to a probabilistic interpretation of the
Still, ENMs based entirely on scenopoetic variables have dem- BAM diagram. Denote by P(Y ⫽ 1ⱍ g) the probability of
onstrated considerable predictive value in a variety of cases (37, presence, interpreted as presence of source populations in a
51–53). To explain this apparent paradox, two extreme expla- randomly chosen cell g. We model presence of sources (that is,
nations come to mind. First, perhaps Eltonian factors correlate Y ⫽ 1) by using three binary random variables called I, J, K that
closely with scenopoetic variables, which thus capture an impor- represent random access of a site by the species, abiotic suit-
tant part of the biotic signature (54). Alternatively, in some cases, ability, and biotic suitability, respectively. The event {Y ⫽ 1} is
Eltonian factors like competition may not affect distributions at equivalent to {I ⫽ 1} 艚 {J ⫽ 1} 艚 {K ⫽ 1}. We use the following
the large extents and low resolutions characteristic of geographic succinct notation for conditional probabilities that label specific
distribution maps (55). We call this the Eltonian Noise Hypoth- segments in the BAM diagram:

Soberón and Nakamura PNAS 兩 November 17, 2009 兩 vol. 106 兩 suppl. 2 兩 19647
PM共g兲 ⫽ P共I ⫽ 1兩g兲, to suppose that Maxent estimates a quantity proportional to the
probability of presence P(Y ⫽ 1ⱍg) (64).
PA共g兲 ⫽ P共J ⫽ 1兩I ⫽ 1, g兲, and Another method that resorts to use background absences is
genetic algorithm for rule set production (GARP) (70), a
P B共 g兲 ⫽ P共K ⫽ 1兩J ⫽ 1, I ⫽ 1, g兲. machine learning method that does not estimates probabilities.
Thus, PM(g) is the probability that cell g belongs to the area that, When multiple GARP models are generated and combined via
in an appropriate time interval, has been accessible for the consensus approaches (71) an estimate of the concordance
species. It can in principle be estimated by classifying cells in between different stochastic solutions to an optimization prob-
terms of ecological barriers, remoteness, and capacities for lem is obtained. This number sometimes but not always corre-
dispersal of a species over a given time period. For highly lates well with the Maxent-estimated probability P(gⱍY ⫽ 1) (72).
dispersive species in small regions, it may be the case that Ideally, one would expect the outputs of both Maxent and GARP
PM(g) ⫽ 1 for every cell g. PA(g) depends on the scenopoetic to be high when the environments in a cell are similar to those
variables characterizing g, and can, in principle, be estimated in presence-observed cells, which by hypothesis should also have
experimentally (28). The values of PA(g) will be high for cells large values of PA(g). How well presence-only methods approx-
with environments within the PN. Finally, PB(g) depends on the imate PA(g), however, is determined by the unknown form of the
Eltonian factors of g. This probability would be very difficult to sampling bias term Pv(g). Perhaps these algorithms characterize
estimate for large numbers of cells (17), but as we saw, what ‘‘lower bounds’’ to the PN, so the areas of distribution modeled
evidence exists indicates that it may vary dramatically over space by them are intermediate to GO and A (11). Unfortunately,
(48, 54, 62). The Eltonian Noise Hypothesis means that PA(g) is without further information, presence-only algorithms alone do
uncorrelated with PB(g). not specify exactly what area was estimated.
By a multiplication rule for conditional probabilities, we Finally, presence-only data (without background absences)
propose the following equation: can be used by envelope techniques like BIOCLIM (73), support
vector machines (74), or similarity methods like Mahalanobis
P共Y ⫽ 1兩g兲 ⫽ PB共g兲PA共g兲PM共g兲. [1] distance classification (75), which simply surround presence
points in environmental space with different geometrical shapes
Eq. 1 relates what may be called a statistical representation of and assume implicitly that points within the shape are also
probability of presence (the left side) to a more ecological favorable to the species. Although capable of producing indices
representation that is based on causal factors (the right side). of similarity to observed environments, these methods most
Godsoe (personal communication) has arrived to a similar often just identify a subset of E that is regarded as a niche. Which
equation using a different reasoning. niche? Probably, again, something in between the RN and the
Now, it is well known that the probability P(Y ⫽ 1ⱍg) can be PN, which is related to the probability PA(g). In other words,
estimated directly if true absences are available (6, 63). In this presence-only envelope methods classify cells in ways that prob-
case any of many multivariate regression methods (generalized ably would have a large intersection with a classification based
linear models, generalized additive models, regression trees, on PA(g). Similarly to presence-background methods then, they
logistic regression, etc.) will estimate that probability as a predict areas likely to be bounded by GO and A.
function of the environment in g. From P(Y ⫽ 1ⱍg) the area of We see that different classes of methods estimate different
distribution (region GO in Fig. 2) is immediately available. By Eq. terms of Eq. 2. It is unadvisable therefore to treat them as
1, therefore, incorporation of true-absence data allows estima- conceptual equivalents, to be tested only in terms of their
tion of the combined effects of scenopoetic and Eltonian vari- capacity to predict independent datasets. Different methods are
ables, and dispersal, namely PB(g)PA(g)PM(g). differently suited to different biological problems, an idea that
However, if absence information is missing, then P(Y ⫽ 1ⱍg) can be stated explicitly by using Eq. 2 and a BAM diagram.
cannot be estimated reliably. Application of Bayes’ rule to P(Y ⫽
1ⱍg) allows obtaining a second equation: Conclusions
Grinnell was among the first to speak of niches as related to areas
P共Y ⫽ 1兲 of distribution of species (2). He also was among the first to
P共g兩Y ⫽ 1兲 ⫽ PA共g兲PB共g兲PM共g兲 ⫽ P共Y ⫽ 1兩g兲. [2]
Pv共g兲 discuss factors affecting the shape of distributions of species (1).
His analysis provides many of the elements we have discussed
P(gⱍY ⫽ 1) is the probability of the observer being at g given that here, including a hierarchical view of processes, the importance
the species is present (63, 64), which essentially provides infor- of climatic variables in defining coarse-grained features of
mation on how to classify sites by their similarity to those already distributions, and finer-grained habitat structure and biotic
known as containing the species. Some methods can estimate interactions determining the details of the whereabouts of
P(gⱍY ⫽ 1), but the relationship between P(gⱍY ⫽ 1) and the organisms. As we have seen, by defining Grinnellian niches
actual probability of presence is obscured by the term P(Y ⫽ according to this general philosophy, it is possible to make many
1)/Pv(g). The so-called prevalence, P(Y ⫽ 1), cannot be estimated concepts operational and visualize with great agility the niche-
without absence data (61, 65), and the term Pv(g) (the probability distribution duality anticipated by Hutchinson (4).
of an observer randomly visiting cell g), is not only is seldom We extract several lessons from the above discussion. First, the
known, but in general should have strong spatial biases, because niche-distributional area duality (77) is composed of several
most biological exploration is concentrated along roads, rivers, related, but quite distinct, objects. The FN, PN, and RN are
around biological stations, etc (66, 67). Biases in visitation and different entities, and they correspond in explicit but compli-
detection probabilities can alter interpretation of modeling cated ways to different actual and potential distributional areas
results significantly (63, 64, 68), but reasons of space prevent (A, GO, GI). Being specific about what niches and what areas are
further discussion of this problem here. being studied and modeled is not pedantic nit-picking, but a
To estimate P(gⱍY ⫽ 1) Maxent and other methods resort to simple consequence of the complexity of the subject. This lesson
so-called background absences (65, 69), which are randomly carries over to discussions about niche conservatism, as we saw
sampled pseudoabsences taken from the region G. However, the that the term may refer to very different features of the FN, with
existence of the term P(Y ⫽ 1)/Pv(g) prevents us from simplis- different ecological and evolutionary properties.
tically assuming that P(gⱍY ⫽ 1) estimates the probability of The second lesson derives from the fact that different mod-
presence. Only by assuming that Pv(g) is unbiased is it possible eling algorithms estimate different parts of Eq. 2 and thus

19648 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0901637106 Soberón and Nakamura


different sectors of the BAM diagram. Methods that estimate GO eling. The structure of environmental space is hugely irregular,
(these are species distribution models, in the strict sense), also in both its boundaries and the density of points inside, and
permit estimating the RN and therefore may be inappropriate changes in time. Grinnellian niches, being subsets of these spaces
for study of issues of FN evolution, unless it is proven that for that as defined by the activities and physiology of species, inherit
particular case the potential and actual areas coincide (19), these irregularities, and interpretation of ENM exercises ignores
making FN and RN similar. Other methods, or different con- them at its peril. Similarly, expressing the M and B regions of Fig.
figurations of factors represented in the BAM diagram, may 2 realistically is seldom done, hindering interpretation of the
estimate environmental subsets closer to the FN. Therefore, results of species distributions modeling. In particular, docu-
specification of the ecological assumptions of the problem and menting the structure of B empirically remains a serious meth-
selection of the modeling method should go hand by hand, as odological challenge.
Austin (76), has suggested in a slightly different context. Ideally,
independent estimations of A (mechanistically) and M (from ACKNOWLEDGMENTS. We thank Alma Solis for providing data on the Smith-
sonian Institution Argentine specimens of C. cactorum; John Madsden and
considerations about history and/or movement patterns of the Clifton Abbot for data on the observations of C. cactorum in Florida; David
species) in tandem with ENM can lead to more rigorous Wake, Elizabeth Hadly, and David Ackerly for inviting J.S. to speak at the
estimation of the different actual and potential areas and Sackler Colloquium, providing the opportunity to express the ideas in this
environments in Hutchinson’s duality. article to an expert audience; A. Townsend Peterson for reading the manu-
script and making many thoughtful comments; and two anonymous referees
Finally, the actual, physical structure of both the environmen- for positive criticism that allowed us to improve the article. J.S. was partially
tal and geographic spaces, in the present and the past, should be supported by grants from the National Science Foundation–Experimental
taken into account when interpreting the results of niche mod- Program to Stimulate Competitive Research and Microsoft Research.

1. Grinnell J (1914) Barriers to distribution as regards birds and mammals. Am Nat 30. Guisan A, Thuiller W (2005) Predicting species distribution: Offering more than simple
48:248 –254. habitat models. Ecol Lett 8:993–1009.
2. Grinnell J (1917) The niche-relationships of the California Thrasher. Auk 34:427– 433. 31. Openshaw S, Taylor PJ (1981) The modifiable areal unit problem. Quantitative Geog-
3. Grinnell J (1917) Field tests of theories concerning distributional control. Am Nat raphy: A British View, eds Wrigley N, Bennet RJ (Routledge, London).
51:115–128. 32. Fotheringham SA, Brundson C, Charlton M (2000) Quantitative Geography (SAGE,
4. Hutchinson GE (1957) Concluding remarks. Cold Spring Harbor Symp Quant Biol London).
22:415– 427. 33. Broennimann O, et al. (2007) Evidence of climatic niche shift during biological invasion.
5. Hutchinson GE (1978) An Introduction to Population Ecology (Yale Univ Press, New Ecol Lett 10:701–709.
Haven, CT). 34. Huntley B, Bartlein PJ, Prentice IC (1989) Climatic control of the distribution and
6. Pearce J, Boyce MS (2006) Modeling distribution and abundance with presence-only abundance of beech (Fagus L.) in Europe and North America. J Biogeogr 16:551–560.
data. J Appl Ecol 43:405– 412. 35. Holt RD, Gaines MS (1992) Analysis of adaptation in heterogeneous landscapes:
7. Guisan A, Zimmermann N (2000) Predictive habitat distribution models in ecology. Ecol Implications for the evolution of fundamental niches. Evol Ecol 6:433– 447.
Model 135:147–186. 36. Holt RD, Gomulkiewicz R (1997) in Case Studies in Mathematical Modeling: Ecology,
8. Hirzel AH, Le Lay G (2008) Habitat suitability modeling and niche theory. J Appl Ecol Physiology, and Cell Biology, eds Othmer HG, Adler F, Lewis M, Dillon JM (Prentice–
45:1372–1381. Hall, Englewood Cliffs, NJ), pp 25–50.
9. Peterson AT (2001) Predicting species’ geographic distributions based on ecological 37. Peterson AT (2003) Predicting the geography of species’ invasions via ecological niche
niche modeling. Condor 103:599 – 605. modeling. Q Rev Biol 78:419 – 433.
10. Kearney M (2006) Habitat, environment, and niche: What are we modeling? Oikos 38. Peterson AT, Soberón J, Sánchez-Cordero V (1999) Conservatism of ecological niches in
115:186 –191. evolutionary time. Science 285:1265–1267.
11. Jiménez-Valverde A, Lobo JM, Hortal J (2008) Not as good as they seem: The impor- 39. Losos JB (2008) Phylogenetic niche conservatism, phylogenetic signal, and the rela-
tance of concept in species distribution modeling. Diversity Distributions 14:885– 890. tionship between phylogenetic relatedness and ecological similarity among species.
12. Wisz MS, et al. (2008) Effects of sample size on the performance of species distribution Ecol Lett 11:995–1007.
40. Pearman PB, Guisan A, Broennimann O, Randin C (2007) Niche dynamics in space and
models. Diversity Distributions 14:763–773.
time. Trends Ecol Evol 23:149 –158.
13. Jiménez-Valverde A, Lobo JM (2006) The ghost of unbalanced species distribution data
41. Wiens J, Graham C (2005) Niche conservatism: Integrating evolution, ecology, and
in geographical model predictions. Diversity Distributions 12:521–524.
conservation biology. Annu Rev Ecol Syst 36:519 –539.
14. Araújo MB, Thuiller W, Williams PH, Reginster I (2005) Downscaling European species
42. Ackerly DD (2003) Community assembly, niche conservatism, and adaptive evolution in
atlas distributions to a finer resolution: Implications for conservation planning. Global
changing environments. Int J Plant Sci 164:S165–S184.
Ecol Biogeogr 14:17–30.
43. Colwell RK, Futuyma D (1971) On the measurement of niche breadth and overlap.
15. Brotons L, Thuiller W, Araújo M, Hirzel A (2004) Presence-absence versus presence-only
Ecology 52:567–576.
modeling methods for predicting bird habitat suitability. Ecography 27:437– 448.
44. Haldane JBS (1949) Suggestions as to a quantitaive measurment of rates of evolution.
16. Peterson AT, Papes M, Soberón J (2008) Rethinking receiver operating characteristic
Evolution (Lawrence, Kans) 3:51–56.
analysis applications in ecological niche modeling. Ecol Model 213:63–72.
45. Hight SD, et al. (2002) Expanding geographical range of Cactoblastis cactorum (Lep-
17. Araújo MB, Guisan A (2006) Five (or so) challenges for species distribution modelling.
idoptera: Pyralidae) in North America. Florida Entomol 85:527–529.
Global Ecol Biogeogr 33:1677–1688.
46. Randin CF, et al. (2006) Are niche-based models transferable in space? J Biogeogr
18. Austin M (1996) An ecological perspective on biodiversity investigations: Examples
33:1689 –1703.
from Australian eucalypt forests. Ann Mo Bot Gard 85:2–17.
47. McGill BJ, Enquist BJ, Weiher E, Westoby M (2006) Rebuilding community ecology from
19. Soberón J, Peterson AT (2005) Interpretation of models of fundamental ecological functional traits. Trends Ecol Evol 21:179 –185.
niches and species’ distributional areas. Biodiversity Informatics 2:1–10. 48. Thompson JN (2005) The Geographic Mosaic of Coevolution (Univ Chicago Press,
20. Pulliam HR (2000) On the relationship between niche and distribution. Ecol Lett Chicago).
3:349 –361. 49. Davis AJ, Jenkinson LS, Lawton JH, Shorrocks B, Wood S (1998) Making mistakes when
21. Chase JM, Leibold M (2003) Ecological Niches: Linking Classical and Contemporary predicting shifts in species range in response to global warming. Nature 391:783–786.
Approaches (Univ Chicago Press, Chicago). 50. Heikkinen RK, Luoto M, Virkkala R, Pearson RG, Korber J-H (2007) Biotic interactions
22. Leibold M (1996) The niche concept revisited: Mechanistic models and community improve prediction of boreal bird distributions at macro scales. Global Ecol Biogeogr
context. Ecology 76:1371–1382. 16:754 –763.
23. Jackson ST, Overpeck JT (2000) Responses of plant populations and communities to 51. Feria P, Peterson AT (2002) Prediction of bird community composition based on
environmental changes of the late Quaternary. Paleobiology 26(Suppl):194 –220. point-occurrence data and inferential algorithms: A valuable tool in biodiversity
24. Soberón J (2007) Grinnellian and Eltonian niches and geographic distributions of assessments. Diversity Distributions 8:49 –56.
species. Ecol Lett 10:1115–1123. 52. Raxworthy CJ, et al. (2003) Predicting distributions of known and unknown reptile
25. James FC, Johnston RF, Warner NO, Niemi G, Boecklen W (1984) The Grinnellian niche species in Madagascar. Nature 426:837– 841.
of the Wood Thrush. Am Nat 124:17– 47. 53. Sanchez-Cordero V, Martínez-Meyer E (2000) Museum specimen data predict crop
26. Colwell RK, Fuentes E (1975) Experimental studies of the niche. Annu Rev Ecol Syst damage by tropical rodents. Proc Natl Acad Sci USA 97:7074 –7077.
6:281–310. 54. Brewer A, Gaston KJ (2003) The geographical range structure of the holly leaf-miner.
27. Hijmans RJ, Cameron S, Parra J, Jones PG, Jarvis A (2005) Very high-resolution inter- II. Demographic rates. J Anim Ecol 72:82–93.
polated climate surfaces for global land areas. Int J Climatol 25:1965–1978. 55. Prinzing A, Durka W, Klotz S, Brandl R (2002) Geographic variability of ecological niches
28. Kearney M, Porter WP (2009) Mechanistic niche modeling: Combining physiological of plant species: Are competition and stress relevant? Ecography 25:721–729.
and spatial data to predict species’ ranges. Ecol Lett 12:334 –350. 56. Pearson RG, Dawson TP (2003) Predicting the impacts of climate change on the
29. Pulliam R (2000) On the relationship between niche and distribution. Ecol Lett 3:349 – distribution of species: Are bioclimatic envelopes useful? Global Ecol Biogeogr 12:361–
361. 371.

Soberón and Nakamura PNAS 兩 November 17, 2009 兩 vol. 106 兩 suppl. 2 兩 19649
57. Leathwick JR, Austin M (2001) Competitive interactions between tree species in New 68. Argaez J, Christen A, Nakamura M, Soberón J (2005) Prediction of potential areas of
Zealand’s old-growth indigenous forest. Ecology 82:2560 –2573. species distributions based on presence-only data. Environ Ecol Stat 12:27– 44.
58. Bullock JM, Edwards RJ, Carey PD, Rose RJ (2000) Geographical separation of two Ulex 69. Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species
species at three spatial scales: Does competition limit species’ ranges? Ecography geographic distributions. Ecol Model 190:231–259.
23:257–271. 70. Stockwell DRB, Peters DP (1999) The GARP modeling system: Problems and solutions to
59. Lawler JJ, White D, Neilson RP, Blaustein AR (2006) Predicting climate-induced range automated spatial prediction. Int J Geogr Information Syst 13:143–158.
shifts: Model differences and model reliability. Global Change Biol 12:1568 –1584. 71. Anderson RP, Lew D, Peterson AT (2003) Evaluating predictive models of species’
60. Elith J, Burgman M (2002) in Predicting Species Occurrences: Issues of Scale and distributions: Criteria for selecting optimal models. Ecol Model 162:211–232.
Accuracy, eds Scott JM, Heglund PJ, Morrison ML (Island, Washington, DC), pp 72. Papes M, Gaubert P (2007) Modelling ecological niches from low numbers of occur-
303–313.
rences: Assessment of the conservation status of poorly known viverrids. Diversity
61. Elith J, et al. (2006) Novel methods improve prediction of species’ distributions from
Distributions 13:890 –902.
occurrence data. Ecography 29:129 –151.
73. Busby JR, Margules CR, Austin MP (1991) in Nature Conservation: Cost Effective
62. Sagarin R, Gaines S, Gaylord B (2006) Moving beyond assumptions to understand
Biological Surveys and Data Analysis, eds Margules CR, Austin MP (CSIRO, Melbourne,
abundance distributions across ranges of species. Trends Ecol Evol 21:524 –530.
Australia), p 64.
63. Phillips S, et al. (2009) Sample selection bias and presence-only distribution models:
74. Guo Q, Kelly M, Graham CH (2005) Support vector machines for predicting distribution
Implications for background and pseudo-absence data. Ecol Appl 19:181–197.
64. Phillips S, Dudík M (2008) Modeling of species distributions with Maxent: New exten- of Sudden Oak Death in California. Ecol Model 182:75–90.
sions and a comprehensive evaluation. Ecography 31:161–175. 75. Farber O, Kadmon R (2003) Assessment of alternative approaches for bioclimatic modeling
65. Ward G, Hastie T, Barry S, Elith J, Leathwick JR (2009) Presence-only data and the EM with special emphasis on the Mahalanobis distance. Ecol Model 160:115–130.
algorithm. Biometrics 65:554 –563. 76. Austin MP (2002) Spatial prediction of species distribution: An interface between
66. Graham C, Ferrier S, Huettman F, Moritz C, Peterson AT (2004) New developments in ecological theory and statistical modeling. Ecol Model 157:101–118.
museum-based informatics and applications in biodiversity analysis. Trends Ecol Evol 77. Colwell R, Rangel T (2009) Hutchinson’s duality: The once and future niche. Proc Natl
19:497–503. Acad Sci USA 106:19651–19658.
67. Bojórquez-Tapia LA, Balvanera P, Cuaron AD (1994) Biological inventories and computer 78. Ackerley D (2009) Conservatism and diversification of plant functional traits: Evolu-
databases: Their role in environmental assessments. Environ Manage 18:775–785. tionary rates versus phylogenetic signal. Proc Natl Acad Sci USA 106:19699 –19706.

19650 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0901637106 Soberón and Nakamura

You might also like