Jakeman Etal 2006 Model Development and Evaluation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Environmental Modelling & Software 21 (2006) 602e614

www.elsevier.com/locate/envsoft
1
Position Paper

Ten iterative steps in development and evaluation of


environmental models
A.J. Jakeman a,b,*, R.A. Letcher a,c, J.P. Norton a,c
a
Integrated Catchment Assessment and Management Centre, Building 48A, The Australian National University, Canberra, ACT 0200, Australia
b
Centre for Resource and Environmental Studies, The Australian National University, Canberra, ACT 0200, Australia
c
Department of Mathematics, The Australian National University, Canberra, ACT 0200, Australia
Received 5 January 2005; accepted 10 January 2006
Available online 20 March 2006

Abstract

Models are increasingly being relied upon to inform and support natural resource management. They are incorporating an ever broader range
of disciplines and now often confront people without strong quantitative or model-building backgrounds. These trends imply a need for wider
awareness of what constitutes good model-development practice, including reporting of models to users and sceptical review of models by users.
To this end the paper outlines ten basic steps of good, disciplined model practice. The aim is to develop purposeful, credible models from data
and prior knowledge, in consort with end-users, with every stage open to critical review and revision. Best practice entails identifying clearly the
clients and objectives of the modelling exercise; documenting the nature (quantity, quality, limitations) of the data used to construct and test the
model; providing a strong rationale for the choice of model family and features (encompassing review of alternative approaches); justifying the
techniques used to calibrate the model; serious analysis, testing and discussion of model performance; and making a resultant statement of model
assumptions, utility, accuracy, limitations, and scope for improvement. In natural resource management applications, these steps will be a learn-
ing process, even a partnership, between model developers, clients and other interested parties.
Ó 2006 Elsevier Ltd. All rights reserved.

Keywords: Model testing; Verification; Uncertainty; Sensitivity; Integrated assessment; System identification

1. Motivation science and decision-making. The complexity and uncertainty


inherent in management for better sustainability outcomes
The pursuit of good practice in model development and ap- make the pursuit of good practice especially important, in
plication deserves thorough and sustained attention, whatever spite of limited time and resources. Natural resource manage-
the field. Good practice increases the credibility and impact of ment confronts a complex set of issues, usually with environ-
the information and insight that modelling aims to generate. It mental, social and economic trade-offs. These trade-offs are
is crucial for model acceptance and is a necessity for long- characterised by interactions at many scales and often by scar-
term, systematic accrual of a good knowledge base for both city of good observed data. Thus natural resource managers
commonly have to trade uncertain outcomes to achieve equi-
table results for various social groups, across spatial and tem-
* Corresponding author. Integrated Catchment Assessment and Management poral scales and across disciplinary boundaries. This must be
Centre, Building 48A, The Australian National University, Canberra, ACT achieved on the basis of information that varies in relevance,
0200, Australia. completeness and quality.
E-mail address: [email protected] (A.J. Jakeman).
1 The complexity of these situations has led to model-based
Position papers aim to synthesise some key aspect of the knowledge plat-
form for environmental modelling and software issues. The review process is approaches for examining their components and interactions,
twofold e a normal external review process followed by extensive review by and for predicting management outcomes. There is wide
EMS Board members. See the Editorial in this issue. agreement on the potential of models for revealing the

1364-8152/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.envsoft.2006.01.004
A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614 603

implications of assumptions, estimating the impact of interac- Australia provide another example. Our purpose, by contrast,
tions, changes and uncertainties on outcomes, and enhancing is to point to considerations and practices that apply in a broad
communication between researchers from different back- range of natural resource modelling situations.
grounds and between researchers and the broader community. It is hoped that this paper will prompt modellers to codify
Managers and interest groups can also potentially benefit their practices and to be more creative in their examination of
from use of a model to define the scope of a problem, to alternatives and rigorous in their model testing. It is intended
make assumptions explicit, to examine what is known and to provide a synoptic view for model builders and model users,
what is not, and to explore possible outcomes beyond the ob- applying to both integrated models and models within distinct
vious ones. If models are accessible enough, they can act as disciplines. It does not deal with the surrounding issue of the
a medium for wider participation in environmental manage- appropriate development and use of environmental decision
ment. However, the pressing need to use models in managing support systems (e.g. Denzer, 2005), which in addition involve
complex situations, rather than in sharply defined areas of re- issues of user interfacing, software usability and software and
search, has resulted in people with little modelling or quanti- data integration. The paper discusses good practice in con-
tative background having to rely on models, while not being struction, testing and use of models, not in their imbedding
in a position to judge their quality or appropriateness. Caminiti and use in decision support systems or with software interfaces
(2004) provides a resource manager’s perspective on the diffi- more widely.
culties of choosing the best modelling approach for catchment As already indicated, the idea of guidelines for good model
management, concluding that ‘‘[m]odellers can help by trying practice is not new. Parker et al. (2002) call for the develop-
to understand the needs and expectations of the resource man- ment of guidelines for situations where formal analysis and
ager, who may not have the technical knowledge or language testing of a model may be difficult or unfeasible. They state
to express them.’’ Managers may also not initially understand that ‘‘the essential, contemporary questions one would like
their own needs fully, so modelling must be an iterative learn- to have answered when seeking to evaluate a model (are):
ing process between modeller and manager.
The uses of models by managers and interest groups, as i) Has the model been constructed of approved materials i.e.,
well as by modellers, bring dangers. It is easy for a poorly in- approved constituent hypotheses (in scientific terms)?
formed non-modeller to remain unaware of limitations, uncer- ii) Does its behaviour approximate well that observed in re-
tainties, omissions and subjective choices in models. The risk spect of the real thing?
is then that too much is read into the outputs and/or predictions iii) Does it work i.e. does it fulfil its designated task, or serve
of the model. There is also a danger that a model is used for its intended purpose?’’
purposes different from those intended, making invalid con-
clusions very likely. Taking a longer-term perspective, such in- Risbey et al. (1996) call for the establishment of quality-
advertent abuses detract from and distort the understanding on control measures in the development of Integrated Assessment
which science and decision-making are built. (IA) models for climate change, and suggest several features
The only way to mitigate these risks is to generate wider that must be considered:
awareness of what the whole modelling process entails, what
choices are made, what constitutes good practice for testing  a clear statement of assumptions and their implications;
and applying models, how the results of using models should  a review of ‘anchored’ or commonly accepted results and
be viewed, and what sorts of questions users should be asking the assumptions that created them;
of modellers. This amounts to specifying good model practice,  transparent testing and reporting of the adequacy of the
in terms of development, reporting and critical review of whole model, not only each of the component parts;
models.  inclusion of the broadest possible range of diverse per-
As a move in that direction, this paper outlines ten steps in spectives in IA development;
model development, then discusses minimum standards for  supply of instructions to model end-users on the appropri-
model development and reporting. The wide range of model ate and inappropriate use of results and insights from the
types and potential applications makes such an enterprise analysis;
prone to both over-generalisation and failure to cover all cases.  ‘A place for dirty laundry’, that is, for open discussion of
So the intention is to name the main steps and give examples problems experienced in constructing complex integrative
of what each includes, without attempting the impossible task modelling, in order for solutions to these problems to be
of compiling a comprehensive checklist or map of the model- found, and to facilitate the appropriate level of trust in
development process. Such checklists have been developed model results.
within certain modelling communities where particular para-
digms are dominant. Thus the Good Modelling Practice Hand- Ravetz (1997), considering integrated models, argues for
book (STOWA/RIZA, 1999), financed by the Dutch validation (or evaluation) of the process of development rather
government and executed by Wageningen University, has than the product, stating that in such circumstances ‘‘the inher-
a well developed checklist for deterministic, numerical ently more difficult path of testing of the process may actually
models. The guidelines for modelling groundwater flow devel- be more practical’’. Ravetz finds that in general ‘‘the quality of
oped by the Murray-Darling Basin Commission (2000) in a model is assured only by the quality of its production’’.
604 A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614

However, he does not define the essential components or steps involving comparison of traditional and novel models and da-
in model development that would make up such a quality-as- taset benchmarking across a range of hydroclimatologies. The
surance process, nor does he discuss how far the quality of Top-Down Working Group expects to improve understanding
production can be assessed without assessing the quality of of the drivers of catchment processes and how they relate to
the product. fluxes from river basins. Its success will depend on attention
Caminiti (2004) outlines a number of potential pitfalls in to the areas outlined below.
using models for management, and proposes steps that re-
source managers should take to avoid them. 2.2. Proper definition of scope and objectives of the
Refsgaard et al. (2005) address the issue of quality assur- model
ance (QA), defined as protocols and guidelines to support
the proper application of models. They argue that ‘‘Model In making a case for modelling to help managers respond to
credibility can be enhanced by a proper modeller-manager a problem in natural resources, it is all too easy:
dialogue, rigorous validation tests against independent data,
uncertainty assessments, and peer reviews of a model at various  to extend the scope beyond what is needed to answer the
stages throughout its development.’’ questions at hand;
In promoting responsible and effective use of model infor-  to promise more than can be delivered in the time
mation in policy processes, Van der Sluijs et al. (2005) discuss available;
four case-study experiences with the NUSAP system for  to ignore or underestimate the difficulties and the limita-
uncertainty assessment. This system, due to Funtowicz and tions in data and techniques;
Ravetz (1990), offers analysis and diagnosis of uncertainty  to oversimplify or overelaborate;
in the knowledge base of complex policy problems. Van der  to push a particular approach not well suited to the job;
Sluijs et al. (2005) show that extending the scheme beyond  to rely too much on existing, familiar but less-than-ideal
mainstream technical methods of sensitivity and uncertainty models, and conversely;
analysis, by complementing it with qualitative approaches,  to overlook existing knowledge and previous experience;
further promotes reflection and collective learning. Thus  to take too little note of the need for consultation and
they cover societal aspects such as differences in framing of cooperation;
the problem, inadequacy of institutional arrangements at the  to commit to a time scale preventing unforeseen factors
science-policy interface, and controversy. from being adequately dealt with, and, most crucially;
These authors argue that good practice in the development  to obfuscate the objectives, knowingly or inadvertently.
of integrated models is made all the more necessary by the in-
herent difficulties in validating them. As implied in the open- How often does one see objectives explicitly stated and
ing paragraph, many disciplinary modelling studies lack iterated upon? Refinement of an objective can lead to a simpler
elements of good model practice, such as a clear statement task, as some factors are found to be unimportant, others crit-
of modelling objectives, adequate setting out of model ical, and the available information becomes clearer. Assess-
assumptions and their implications, and reporting of model ment of uncertainty plays a crucial role in such refinement;
results, including validation/evaluation. Cross-disciplinary better a useful answer to a simple question than too uncertain
models for influencing management should be tested against an answer to a more ambitious question.
additional criteria such as fitness for purpose, flexibility to re-
spond to changing management needs, and transparency so 2.3. Stakeholder participation in model development
that stakeholders can see how the results were derived.
Stakeholders comprise all those with an interest. For natural
2. Improving the modelling process resources, this is especially the managers and the various sec-
toral interests. Stakeholder participation is a key requirement
2.1. Introduction of good model development, particularly when models are to
address management questions. Aside from equity and justice,
Broad areas where better modelling practice can improve there are two main reasons for increased stakeholder participa-
models and their adoption are suggested below, before more tion in model development. The first is to improve the model-
detailed discussion of ten steps in model development. ler’s understanding, allowing a broader and more balanced
Wider and more strategic application of good models, com- view of the management issue to be incorporated in the model.
parison of models and associated long-term data acquisition The second is to improve adoption of results from the assess-
can assist not only in exploiting existing knowledge but also ment, increasing the likelihood of better outcomes, as model
in accruing new knowledge. An example is the current Predic- development becomes an opportunity for stakeholders to learn
tion in Ungauged Basins program of the International Associ- about interactions in their system and likely consequences of
ation of Hydrological Sciences. It has several groups, one the their decisions. Both reasons work iteratively. That is, contin-
Top-Down Working Group (http://www.stars.net.au/tdwg/). ued involvement is necessary because neither the modeller nor
The groups are tackling questions of how to predict streamflow the manager usually has a clear and comprehensive idea at the
in ungauged catchments through systematic studies, typically outset of what the model must do.
A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614 605

Stakeholder participation in the past has often been limited unsatisfactory when model-based conclusions are susceptible
to researchers wishing to exploit the results of the modelling to gross error through lack of good practice. In some areas
exercise. A better approach, increasingly employed, is to in- where there is a consensus on modelling issues but not solu-
volve all stakeholders throughout model development in a part- tions, a remedy may be to seek more collaborative and strate-
nership, actively seeking their feedback on assumptions and gic science, funded to bring groups together internationally to
issues and exploiting the model results through feedback and execute comparative studies. The EU Research Frameworks
agreed adoption. This approach is expensive in effort, time have such aims among others and are beginning to take a wider
and resources, but the aim of modelling is often to achieve perspective outside Europe, but there is a need for more flex-
management change, and the learning process for modellers, ible, rapidly responding, heterogeneous, informal yet long-
managers and other stakeholders inherent in this approach is term arrangements. Long-term, consistent collaboration is
essential to achieving change. Examples of such participation needed across a range of modelling communities, to generate
in model development can be found in Fath and Beck (2005), systematic knowledge representation and testing, gradually de-
Hare et al. (2003) and Letcher and Jakeman (2003). Beck veloping a widely understood and accepted methodological
(2005) ‘‘examines the implications of the ongoing shift e platform on which to build and test models.
from the technocracy of the past century to the democracy
of stakeholder participation in the present century e for the
more widespread use of information and technologies in man- 2.6. More comprehensive testing of models
aging water quality in urban environments.’’ An excellent
overview of participation as part of integrated assessment Environmental models can seldom be fully analysed, if
can be found in Mostert (in press). only because of the heterogeneity of their data and the range
of factors influencing usefulness of their outputs. In the case
2.4. Conceptualising the system of groundwater models, Konikow and Bredehoeft (1992) argue
from a philosophical and practical viewpoint that the strong
Consideration and justification of options in defining the term ‘‘validation’’ has no place in hydrology. They indicate
system warrant attention by modellers and their clients. that Hawking (1988) has generalised this further to state that
What to include and what not to incorporate in a modelling ac- ‘‘Any physical theory is always provisional, in the sense that
tivity should be addressed explicitly at the outset and itera- it is only a hypothesis: you can never prove it.’’ Oreskes
tively revisited as far as resources allow. The system being et al. (1994) examine the philosophical basis of the terms
modelled should be defined clearly, including its boundaries ‘‘verification’’ and ‘‘validation’’ as applied to models. What
(e.g. physical, socioeconomic and institutional). Boundary typically passes for these terms is at best confirmation to
conditions can then be modelled as constraints or as input sce- some degree. The two terms imply a stark choice between
narios, whose values can be perturbed in line with stipulated acceptance and rejection. On the contrary we recognise that
assumptions. model performance may be assessed against many criteria,
and that often no sharp acceptance threshold exists. We urge
2.5. Embracing alternative model families and structures discussion of performance, recommending that a wide range
of performance indicators be examined. The problem-depen-
Comparisons between alterative model families and struc- dent indicators selected may include:
tures are sometimes advocated (as above), but seldom per-
formed systematically against specified criteria or, indeed, at  satisfactory reproduction of observed behaviour;
all in environmental modelling. Failure to carry out compari-  high enough confidence in estimates of model variables
sons is understandable, given that most modellers have strong and parameters, taking into account the sensitivity of the
preferences for particular model structures and model- outputs to all the parameters jointly, as well as the param-
development approaches. Such preferences may be built on eter uncertainties;
experience and constrained by resource limitations or lack of  plausibility of the model properties, e.g. values which con-
open-mindedness. In an ideal world, a modelling project form with experience for biophysical and socioeconomic
would be let out to two or more groups to encourage rigorous parameters and means or extremes of associated variables;
comparison. In the real world, with limited resources, sponsors  absence of correlation between model residuals (output er-
of modelling could have a strong influence by demanding rors) and observed inputs, since correlation indicates un-
comparisons, if they took the view that a limited but thorough modelled input-output behaviour;
exercise is preferable to a more ambitious but less well tested  time- and space-invariance of parameter estimates, since
one. variation may indicate poorly or incompletely specified
A growing risk is that the wider community, decision- parameters (unmodelled behaviour again);
makers and politicians are effectively disfranchised by inabil-  satisfactory properties of the residuals, such as absence of
ity to weigh up conclusions drawn from models. Inadequate significant structure over time and space, e.g. constant
reporting and absence of discussion of alternatives can result mean and variance;
in unsystematic, specialised representation of accrued knowl-  consistency of the model in cross-validation against differ-
edge, not open to challenge. This becomes profoundly ent sections of the input-output records (Janssen et al.,
606 A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614

1988) and perhaps also against perturbations of the data types. It would be futile to try to categorise families of models
typical of their errors; comprehensively, but the list below serves to illustrate the
 along with these technical aspects, a range of model char- breadth of choice. In the main we also avoid reference to
acteristics important to managers and stakeholders, includ- real-life examples. Model families and their features include:
ing transparency and flexibility.
 empirical, data-based, statistical models, with structures
One could take this a step further by not only performing chosen primarily for their versatility and assuming little
and reporting on model checks, but also asking for indepen- in advance, e.g. data-mined clusters, parametric or
dent model auditing to provide safeguards to end-users. non-parametric time series models, regressions and their
generalisations such as autoregressive moving-average
2.7. Detection and reduction of overfitting exogenous models, power laws, neural nets;
 stochastic, general-form but highly structured models
Model structures with too many parameters are still en- which can incorporate prior knowledge, e.g. state-space
demic. Models with too many degrees of freedom incur seri- models and hidden Markov models;
ous risks. Among them are: fitting to inconsistent or  specific theory-based or process-based models (often
irrelevant ‘‘noise’’ components of records; severely dimin- termed deterministic), as often used in environmental
ished predictive power; ill defined, near-redundant parameter physics and economics, e.g. specific types of partial or or-
combinations; and obscuring of significant behaviour by the dinary differential or difference equations;
spurious variation allowed by too much freedom. Even so,  conceptual models based on assumed structural similari-
model testing for redundancies and possible model reduction ties to the system, e.g. Bayesian (decision) networks, com-
are seldom reported. Data paucity should limit the model com- partmental models, cellular automata;
plexity. For example, in modelling of flow and transport for  agent-based models allowing locally structured emergent
prediction, spatial data on landscape attributes may be useful behaviour, as distinct from models representing regular be-
to structure and discretise a model in fine detail, but detail is haviour that is averaged or summed over large parts of the
unwarranted if the flux measurements available for model cal- system;
ibration cannot support it (Jakeman and Hornberger, 1993).  rule-based models, e.g. expert systems, decision trees;
A related sin is the use of a favourite model even when it is  a spectrum of models which represent dynamics (time-
over-parameterized for the data available. Indeed there are in- spread responses to the inputs at any given instant) in dif-
stances in the literature of simple models with well identified fering degrees of detail. This spectrum spans instantaneous
parameters working better than complex models where less (static, non-dynamical), discrete-event and discrete-state
formal attention is paid to the parameters. One is Marsili-Libelli models (e.g. Petri nets, Markov transition matrices),
and Checchi (2005). They observe that ‘‘The current trend in lumped dynamical (finite-state-dimensional, ordinary dif-
horizontal subsurface constructed wetlands (HSSCW) model- ferential equation), distributed (partial differntial equation)
ling advocates structures of increasing complexity, which and delay-differential infinite-state-dimensional models;
however have produced a limited improvement in the under-  a corresponding spectrum of spatial treatments, compris-
standing of their internal functioning or in the reliable estima- ing non-spatial, ‘region-based’ or ‘polygon-based’ spatial,
tion of their parameters.’’ Their proposed use of simple model and more finely (in principle continuously) spatially dis-
structures in combination with robust identification algorithms tributed models (e.g. finite-element/grid-based discretisa-
deserves attention in a wider domain than HSSCW modelling. tions of partial differential equations).

3. Ten steps Many authors also find it useful to distinguish between


white box (theory-based), black box (empirical) and grey
Whatever the type of modelling problem, certain common box (theory-influenced empirical) models (e.g. Seppelt,
steps must be considered if the goals are credible results and 2003). The steps we shall delineate are appropriate whether
knowledge acquisition, for the immediate purpose of the exer- the exercise employs traditional models, e.g. the dynamical-
cise and for the wider community and the longer term. Major statistical families of models considered by Ljung (1999),
steps have been elucidated, for example, by Jorgensen and Norton (1986), Söderström and Stoica (1989), and Young
Bendoricchio (2001) for ecological modelling, Seppelt (1984); the empirical, deterministic or conceptual families
(2003) for landscape ecology, Grafton et al. (2004) for eco- covered by Jakeman et al. (1993); more recent artificial-intel-
nomic-environmental systems and Wainwright and Mulligan ligence or ‘‘knowledge-based’’ model types (e.g. Davis, 1995;
(2004) for environmental modelling. Young (1993) summa- Forsyth, 1984; Kidd, 1987; Schmoldt and Rauscher, 1996); or
rizes a detailed set of steps for a ‘‘typical statistical environ- a mixture. Most of the essential features of development prac-
mental modelling procedure’’ and comments that it is an tice outlined in this section are shared by all these types of
interpretation of the scientific method from the Popper view- model. In addition we broaden the context to include the spec-
point. The guidance offered by these authors partly comple- ification of objectives, choice of approach for finding model
ments and partly overlaps ours. We are trying to be more structures, involvement of interest groups, and choice of pa-
generic and to suggest guidelines for a wide range of model rameter estimation methods and algorithms. Although
A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614 607

examples will be given, the focus throughout is mainly on to place in the model. It is important to recognize that some
what questions must be addressed, not what alternatives exist. purposes, particularly increased understanding of the system
The steps sketched in Fig. 1 and listed below are largely it- and data, may be realised well even if the final model is
erative, involving trial and error. If there is pressure to use an poor in many respects. An inaccurate model may still throw
already developed model for all or part of the exercise, atten- light on how an environmental system works.
tion to all steps remains warranted. That is, the steps proposed Purposes include:
are not just of relevance for developing a new model. Depend-
ing on the purpose, some steps may involve end-users as well  gaining a better qualitative understanding of the system
as modellers. The steps are not always clearly separable. For (by means including social learning by interest groups);
instance, it is a matter of taste where the line is drawn between  knowledge elicitation and review;
model-structure selection and parameter estimation, as model  data assessment, discovering coverage, limitations, incon-
structures are partly defined by structural parameters. sistencies and gaps;
 concise summarising of data: data reduction;
3.1. Definition of the purposes for modelling  providing a focus for discussion of a problem;
 hypothesis generation and testing;
It is a truism that the reasons for modelling should have  prediction, both extrapolation from the past and ‘‘what if’’
a large influence on the selecting of a model family or families exploration;
(see Section 2.5) to represent the system, and on the nature and  control-system design: monitoring, diagnosis, decision-
level of diagnostic checking and model evaluation. However, it making and action-taking (in an environmental context,
is not necessarily easy to be clear about what the purposes are. adaptive management);
Different stakeholders will have different degrees of interest in  short-term forecasting (worth distinguishing from longer-
the possible purposes of a single model. For example, a re- term prediction, as it usually has a much narrower focus);
source manager is likely to be most concerned with prediction,  interpolation: estimating variables which cannot be mea-
while a model developer or scientific user may place higher sured directly (state estimation), filling gaps in data;
stress on the ability of the model to show what processes dom-  providing guidance for management and decision-making.
inate behaviour of the system. That said, better understanding
is valuable for all parties as part of defining the problem and These motives are not mutually exclusive, of course, but the
possible solutions, and as a means of assessing how much trust modeller has to establish the purposes and priorities within the

Fig. 1. Iterative relationship between model building steps.


608 A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614

list, because of their influence on the choices to be made at or unknown) or as outputs (observed or not). The choice of
later stages. For example, economy in the degrees of freedom a boundary is closely tied in with the choice of how far to ag-
of a prediction model (‘‘parsimony’’) is important if the model gregate the behaviour inside it. Classical thermodynamics
is to register the consistent behaviour observed in the data but gives an object lesson in the benefits of choosing the boundary
not the ephemeral, inconsistent ‘‘noise.’’ Experience confirms and degree of aggregation well, so as to discover simple rela-
that it is often counterproductive to include much detail in tions between a small number of aggregated variables (e.g. en-
a prediction model for a restricted purpose (Jakeman and ergy) crossing the boundary, without having to describe
Hornberger, 1993). Conversely, a model designed to increase processes inside the boundary in detail. In environmental man-
insight into the processes which determine the system’s overall agement, deciding on the boundary and degree of aggregation
behaviour has to be complex enough to mimic those processes, is a critical but very difficult step. It can usually only be learnt
even if only very approximately. A model intended for knowl- through trial and error, since managers and stakeholders usu-
edge elicitation or hypothesis generation may have a provi- ally do not initially know the boundaries of what should be
sional structure too elaborate to be validated by the data, but modelled.
may be simplified when the knowledge or hypotheses have Flexibility can be a major practical issue in matching the
been tested. Reichert and Omlin (1997) point out possible dif- scope of the model to resources. For example, the time taken
ficulties in prediction using a parsimonious model with too lit- to introduce a new management practice proposed by an inter-
tle flexibility to accommodate changes in perception of which est group might be an issue, given that, for instance, data/GIS
processes are significant. They discuss how to identify and em- layers need to be redrawn. A further concern is the resources
ploy non-parsimonious models for prediction. For the model- to operate the model. In this example, can it be operated by
ling of wastewater treatment plants, Gernaey et al. (2004) give people without GIS training and equipment? More generally,
some excellent examples of how model purpose influences what specialist knowledge does a user need in order to modify
model selection, data selection and model calibration. a model parameter?
It is worth stressing that improvement of understanding of
the system is almost always a purpose of modelling, even 3.3. Conceptualisation of the system, specification of
when the users say otherwise. The quality of management de- data and other prior knowledge
cisions rests ultimately on how well the system is understood,
not merely on the quality of model predictions: insight must, Conceptualisation refers to basic premises about the work-
on average, improve decisions. Moreover, increased under- ing of the system being modelled. It might employ aids to
standing is often the useful outcome of a modelling exercise thinking such as an influence diagram, linguistic model, block
which is, by its stated criteria, a failure. diagram or bond graph (Gawthrop and Smith, 1996; Well-
stead, 1979), showing how model drivers are linked to internal
3.2. Specification of the modelling context: scope (state) variables and outputs (observed responses). Initially the
and resources conceptualisation may be rudimentary, with details postponed
until the results of knowledge elicitation and data analysis can
This second step identifies: be exploited. A tentative initial conceptualisation and a visual-
isation such as a block diagram may be a great help in showing
 the specific questions and issues that the model is to what else must be found out about the system.
address; The conceptualisation step is important even if a model is
 the interest groups, including the clients or end-users of not designed from scratch because time and money (as well
the model; as the clients’ beliefs) restrict one to using a ‘canned’ model.
 the outputs required; Conceptualisation exposes the weaknesses of the canned ap-
 the forcing variables (drivers); proach and perhaps ways to mitigate them.
 the accuracy expected or hoped for; This third step defines the data, prior knowledge and as-
 temporal and spatial scope, scale and resolution (but see sumptions about processes. The procedure is mainly qualita-
also Section 3.3); tive to start with, asking what is known of the processes,
 the time frame to complete the model as fixed, for exam- what records, instrumentation and monitoring are available,
ple, by when it must be ready to help a decision; and how far they are compatible with the physical and tempo-
 the effort and resources available for modelling and oper- ral scope dictated by the purposes and objectives. However, it
ating the model, and; becomes quantitative as soon as we have to decide what to in-
 flexibility; for example, can the model be quickly recon- clude and what can be simplified or neglected. What variables
figured to explore a new scenario proposed by a manage- are to be included, in how much detail? Once the outputs are
ment group? selected, a rough assessment is needed of which drivers they
are sensitive to and what internal processes influence the rela-
A crucial step here is to decide the extent of the model, i.e. tions between the drivers and outputs; this will usually be
where the boundary of the modelled system is. Everything out- partly a quantitative assessment.
side and not crossing the boundary is ignored. Everything The degree of aggregation and the spatio-temporal resolu-
crossing the boundary is treated as external forcing (known tion (intervals and accuracy) of the outputs also have to be
A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614 609

chosen but, as for all these decisions, the choices may have to used to specify links, spatial and temporal scales of processes
be revised as experience grows. The time-step and the bounds and their interactions, and bin sizes for AI techniques such as
of what is to be modelled may have to be modified part way data-mining. Features help to sharpen the conceptualisation
through an application, perhaps more than once. This is not and determine what model synthesis and calibration tech-
trivial. Few models are flexible enough to respond to these niques are available. In simpler models, a common set of fea-
evolving needs, which are commonly passed off by modellers tures will apply throughout, but a more complex integrated
as due to the client ‘‘not thinking their problem through prop- model may well be a hybrid, with the feature set varying
erly at the beginning.’’ from one part to another. For example, a deterministic or sta-
The first part of this step is just to state what degree of de- tistical climate-prediction model might interface with a non-
tail is needed in the outputs. However, the next step is to fol- statistical but empirical rainfall-runoff model, then with an
low up the implications: the internal resolution of the model irrigation model consisting of predetermined rules.
must be sufficient to produce outputs at the required resolu- Families and features often overlap, and in some cases fam-
tion, and the time and spatial intervals throughout the model ilies can even be transformed into each other. For instance lin-
must be compatible with the range of rates of change of the ear, constant-coefficient, ordinary differential equations can be
variables. The only way to ensure that these requirements transformed into, or from, Laplace or Fourier transfer func-
are met is by a careful quantitative assessment. Such assess- tions. The choice depends on the purpose, objectives, prior
ment takes considerable effort and insight into the processes knowledge and convenience.
operating in the system, so it is often given too little attention. For prediction and/or management, a key question is what
Too often sampling intervals in time and space are chosen by the subjects of predictive or management interest are. For ex-
guesswork or simply because data are available at those inter- ample is a qualitative idea of behaviour (e.g. direction of
vals. Ill-chosen intervals can destroy the validity of the model, change) required, or a rough indication of the extent of a re-
but once recognized can be amended as part of the learning sponse, an extreme value, a trend, a long-term mean, a proba-
process. bility distribution, a spatial pattern, a time series, the
‘‘Prior knowledge’’ can be genuinely known in advance, frequency or location of an event? These questions aren’t
found from experiments or analyses performed as part of model asked thoroughly enough at the beginning of model projects.
development, or assumed, with reservations, on the basis of ex- That said, the initial answers can easily change as the project
perience. It includes observational data and their properties (in- develops, especially when managers are involved, emphasiz-
cluding error characteristics), structural information (e.g. ing again the need for iteration.
coupling or independence, additivity of effects or interaction, The selection of model family should also depend on the
existence of feedbacks), the nature of processes (e.g. stationar- level (quantity and quality) of prior information specified in
ity, correlations, directionality of flows, conservation laws, step 3.3. It must take account of what can be determined
switching between modes), the extent and nature of spatio- and how far, i.e. to which accessible and inaccessible variables
temporal forcing, and parameter values and their uncertainties. the model outputs are sensitive, what aspects of their behav-
Quantitative information on uncertain parameters and errors iour must be considered, and the associated spatial dimensions
may consist of point estimates and variances or covariances, and sampling intervals in space and time.
bounds (ranges) or, if you are lucky, probability distributions. At this stage a first judgement has to be made of how prom-
For some environmental systems one has the luxury of op- inent uncertainty is likely to be. It will help to set reasonable
timal experimental design where inputs (such as to a bioreac- expectations of capability (e.g. predictive power), and to de-
tor) can be manipulated to enhance the identifiability of cide whether and how randomness should be included in the
a model (e.g. Versyck et al., 1998; Walter and Pronzato, model formulation. It may include an estimate of how far
1997). For most systems, however, we must at any given past observed behaviour can be extrapolated into the future
time accept the data that are available. On the other hand, or into changed circumstances.
modellers can play a more proactive role in designing future Selection of model features and families should be flexible,
data collection exercises. Monitoring efforts in the global prepared for revision according to evaluation of the reason-
change community are amongst the most striking. ableness of initial guesses. However, in practice it is usually
difficult to change fundamental features of a model beyond
3.4. Selection of model features and families quite an early stage, for understandable but regrettable human
reasons like unwillingness to admit a poor choice or abandon
Any modelling approach requires selection of model fea- something into which much effort has already gone. A prefer-
tures, which must conform with the system and data specifica- ence for a particular model, due to familiarity, established ac-
tion arrived at above. Major features such as the types of ceptance by the technical community or availability of tools
variables covered and the nature of their treatment (e.g. for it, often impedes change.
white/black/grey box, lumped/distributed, linear/non-linear, The difficulty is exacerbated by uncertainty and changes of
stochastic/deterministic) place the model in a particular family mind about the factors which define model features and family
or families. Model structure specifies the links between system (part of the learning process). The problem is that expenditure
components and processes. Structural features include the and commitment to models based on the initial judgements are
functional form of interactions, data structures or measures usually too powerful to allow any significant changes to be
610 A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614

made. The result may well be an inappropriate model. An ini- sludge biochemical systems (Petersen et al., 2003; Checchi
tial exploration with a crude, cheap, disposable model would and Marsili-Libelli, 2005). Structural identifiability (Bellman
often be a better start, so long as there is enough time and flex- and Åstrom, 1970) concerns what parameters can be identi-
ibility of mind to allow later choices. fied, in principle, without ambiguity in the absence of
Model structure covers the degree of detail permitted. It measurement errors or deficiencies in model structure.
may include the choice of spatial units (e.g. hydrological re-
sponse units or grid cells) and corresponding variables (e.g. 3.6. Choice of estimation performance criteria
points where flows and precipitation are represented), the or- and technique
der of a differential equation representing a process, and
whether or not non-linearity or time variation is included in The parameter estimation criteria (hardly ever a single crite-
a relation. Selection of model structure and parameter estima- rion) reflect the desired properties of the estimates. For example
tion jointly make up model calibration, discussed in Section we might seek robustness to outliers (bad data), unbiasedness
3.7. Before calibration, the methods for finding the structure and statistical efficiency, along with acceptable prediction
and parameter values have to be selected. performance on the data set used for calibration. A great deal
of effort in recent decades has gone into developing parameter-
3.5. Choice of how model structure and parameter estimation algorithms with good theoretical properties (Norton,
values are to be found 1986; Söderström and Stoica, 1989; Ljung, 1999). Some of
them make quite restrictive assumptions, not always realistic
In finding the structure, prior science-based theoretical and verifiable, about the properties of the system and the imper-
knowledge might be enough to suggest the form of the relations fections in the data. Two texts that consider pertinent non-linear
between the variables in the model. This is often implicitly as- theory, at least from a regression analysis perspective, are Bates
sumed to be so, even in complicated environmental systems and Watts (1988) and Seber and Wild (1989).
where it is not. Shortage of records from a system may prevent In selecting an estimation algorithm, rounding errors and
empirical modelling from scratch and force reliance on scien- ill-conditioning may be a worry, especially when there is
tific knowledge of the underlying processes. Choice of struc- a risk that more parameters are being estimated than justified
ture is made easier by such knowledge, and it is reassuring to by the data. A further risk is numerical instability, which can
feel that the model incorporates what is known scientifically arise through injudicious implementation of an algorithm that
about the parts of the system. However, empirical studies fre- is stable and well-conditioned in another, exactly algebraically
quently find that a much simpler structure is adequate for equivalent, implementation. An instance occurs among opti-
a specified purpose. In some instances the structure may be mal smoothing algorithms to estimate time-varying parameters
found by trial and error among a modest number of possibili- (Norton, 1975).
ties, on the basis of credibility of model behaviour. Structural Well executed general-purpose parameter estimation (iden-
parameters, such as dynamical order or number and location tification) packages and more specialised packages for hydro-
of spatial subdivisions, may sometimes be treatable as extra pa- logical and other uses have now been available for many years
rameters to be estimated along with the others. Parsimony (e.g. Ljung, http://www.mathworks.com/products/sysid; http://
(Ockham’s razor) is an overriding principle: avoid more com- www.mathworks.com/products/neuralnet). They may not be
plication than is necessary to fulfil the objectives. able to handle complex, integrated models with specialised
The next choice is of how to estimate the parameter values structures. If, as a result, parameter-estimation software has
and supply non-parametric variables and/or data (e.g. distrib- to be written, careful testing of the model against criteria not
uted boundary conditions). The parameters may be calibrated used in the estimation is essential for at least three reasons.
all together by optimising the fit of the model outputs to ob- First, parameter-estimation algorithms are often predictor-
served outputs, or piecemeal by direct measurement or infer- correctors, capable of giving plausible results in the presence
ence from secondary data, or both. Coarse parameter values of coding errors. Second, parameter estimation for complex
indicating presence or absence of a factor or the rough timing models usually involves non-convex numerical optimisation,
of a seasonal event, for instance, might be found by eliciting with a risk that the global optimum is not found. Third,
expert opinion. a model, especially one that is put together from several sub-
The choices of how to put the model together must take ac- models, may well have more parameters than necessary to pre-
count not only of what data can be obtained, but also of its in- scribe its overall behaviour (over-parameterisation), and may
formativeness. Substantial quantitative data may be needed to thus not be capable of yielding well-defined estimates of all
identify parameter values even in a model with a very simple parameters. Over-parameterisation can lead to misinterpreta-
structure. Jakeman and Hornberger (1993) show how few pa- tion, numerical ill-conditioning, excessive ability to fit the
rameters can be identified sharply from daily streamflow data. ‘‘noise’’ (inconsistent behaviour) in records and poor predic-
Substantial trial and error may be required to discover how tion performance.
much can be adequately modelled from a given data set. In summary, the parameter estimation technique should be:
In order to ensure uniqueness of parameter estimates, struc-
tural identifiability analysis has been undertaken quite actively  computationally as simple as possible to minimise the
in a few environmental system types, including activated chance of coding error;
A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614 611

 robust in the face of outliers and deviations from assump- parameter estimates and consistency with prior knowledge
tions (e.g. about noise distribution); (see Sections 3.8 and 3.10).
 as close to statistically efficient as feasible (as reflected by The underlying aim is to balance sensitivity to system vari-
the amount of data required for the estimates to converge); ables against complexity of representation. The question is
 numerically well-conditioned and reliable in finding the whether some system descriptors, for instance dimensionality
optimum; and processes, can be aggregated to make the representation
 able to quantify uncertainty in the results (not at all easy, more efficient, worrying only about what dominates the re-
as the underlying theory is likely to be dubious when the sponse of the system at the scales of concern. Again it is im-
uncertainty is large); and portant to avoid over-flexibility, since unrealistic behaviour,
 accompanied by a test for over-parameterisation. ill-conditioning and poor identifiability (impossibility of find-
ing unique, or well enough defined, parameter estimates) are
In an integrated model, a second area of choice for param- severe risks from allowing more degrees of freedom than jus-
eter estimation at this stage is of the sections into which the tified by the data.
model is disaggregated. Disciplinary boundaries often define
sections, for example hydrological, policy, economic and eco- 3.8. Conditional verification including diagnostic
logical components. Spatial sectioning, e.g. of a stream net- checking
work, is also natural. Sectioning into time segments is much
less common, even though many environmental phenomena Once identified, the model must be ‘conditionally’ verified
have time-varying characteristics which should influence and tested to ensure it is sufficiently robust, i.e. insensitive to
model applications such as prediction. possible but practically insignificant changes in the data and to
The last decade or so has seen a strong trend towards possible deviations of the data and system from the idealising
models explicitly divided into simpler sections for parameter assumptions made (e.g. of Gaussian distribution of measure-
estimation, an example being piecewise linear models. Sim- ment errors, or of linearity of a relation within the model). It
pler sections make for greater flexibility and easier testing, is also necessary to verify that the interactions and outcomes
but pose a larger risk of producing a model more elaborate of the model are feasible and defensible, given the objectives
than necessary, e.g. having internal variables with little influ- and the prior knowledge. Of course, this eighth step should in-
ence on external behaviour or higher resolution than needed volve as wide a range of quantitative and qualitative criteria as
to provide the required output resolution. circumstances allow.
Practical convenience often dictates piecemeal identifica- Quantitative verification is traditionally attempted, but
tion of model components, and pre-existing models are often rarely against a wide range of criteria. Criteria may include
available for parts of the system (e.g. rainfall-runoff, flood, goodness of fit (comparison of means and variances of ob-
groundwater and/or water quality models for hydrological sec- served versus modelled outputs), tests on residuals or errors
tions), but it is wise to test the overall model to see whether (for heteroscedasticity, cross-correlation with model variables,
simplification is possible for the purposes in mind. Sensitivity autocorrelation, isolated anomalously large values) and, par-
assessment (Saltelli et al., 2000) plays a large rôle here. ticularly for relatively simple empirical models, the speed
and certainty with which the parameter estimates converge
as more input-output observations are processed.
3.7. Identification of model structure and parameters Qualitative verification preferably involves knowledgeable
data suppliers or model users who are not modellers. Where
Section 3.5 discussed choice of methods for finding model the model does not act feasibly or credibly, the assumptions,
structure and parameters, and Section 3.6 the criteria and tech- including structure and data assumptions, must be re-evaluated.
niques. The present step addresses the iterative process of find- Indeed, this stage of model development may involve reassess-
ing a suitable model structure and parameter values. This step ment of the choices made at any previous stage. Checking of
ideally involves hypothesis testing of alternative model struc- a model for feasibility and credibility is given little promi-
tures. The complexity of interactions proposed for the model nence in the literature because it is largely informal and
may be increased or reduced, according to the results of model case-specific, but it is plainly essential for confidence in the
testing (steps 3.8e3.10). In many cases this process just con- model’s outputs. Again this is a very important step, not
sists of seeing whether particular parameters can be dropped only to check the model’s believability, but to build the client’s
or have to be added. confidence in the model. It assumes sufficient time for this
Formal statistical techniques for differentiating among dif- checking and enough flexibility of model structure to allow
ferent model structures are well developed. They provide cri- modifications. Often these assumptions are not met.
teria which trade the number of parameters against the
improvement in model fit to observations (Veres, 1991). Be- 3.9. Quantification of uncertainty
cause of their reliance on statistical assumptions, statistical
model-structure tests are best treated as guides, checking the Uncertainty must be considered in developing any model,
results of the structure recommended on other grounds such but is particularly important, and usually difficult to deal
as prediction performance on other data sets, credibility of with, in large, integrated models. Beven (2000) expresses the
612 A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614

concept of model equifinality, recognising that there often is is also indicative of the attention given to uncertainty in envi-
a wide range of models capable of yielding similar predictions. ronmental modelling. The papers there illustrate the breadth of
Uncertainty in models (Walker et al., 2003) stems from incom- the field and the eclectic way in which ideas, problem formu-
plete system understanding (which processes to include, which lations and technical resources from many sources are being
processes interact); from imprecise, finite and often sparse brought to bear.
data and measurements; and from uncertainty in the baseline Model uncertainty must be considered in the context of the
inputs and conditions for model runs, including predicted in- purposes of the model. For example, discrepancies between
puts. In Van der Sluijs et al. (2005) uncertainties are consid- actual output, model output and observed output may be im-
ered from a non-technical standpoint, to include those portant for forecasting models, where cost, benefit and risk
associated with problem framing, indeterminacies and value- over a substantial period must be gauged, but much less criti-
ladenness. Their procedure is important if these attributes cal for decision-making or management models where the user
dominate. A diagnostic diagram can be used to synthesize re- may be satisfied with knowing that the predicted ranking order
sults of quantitative parameter sensitivity analysis and qualita- of impacts of alternative scenarios or management options is
tive review of parameter strength (so-called pedigree analysis). likely to be correct, with only a rough indication of their sizes.
It is a reflective approach where process is as important as
technical assessments.
Some modelling approaches are able explicitly to articulate 3.10. Model evaluation or testing (other models,
uncertainty due to data, measurements or baseline conditions, algorithms, comparisons with alternatives)
by providing estimates of uncertainty, usually in probabilistic
form such as parameter covariance. Others require comprehen- Finally the model must be evaluated in the light of its ob-
sive testing of the model to develop this understanding. Ideally jectives. For simpler, disciplinary models, a traditional
the model would be exercised over the whole credible range of scientific attitude can be taken towards ‘‘validation’’ (non-
every uncertain input and parameter, suitably weighted by like- falsification or provisional confirmation, strictly). That is,
lihood. Such comprehensive testing is a complex task even for confirmation is considered to be demonstrated by evaluating
relatively simple integrated models, so is very rarely performed model performance against data not used to construct the
because of time and resource constraints. For example, the sen- model (Ljung, 1999, ch. 16; Söderström and Stoica, 1989,
sitivity of model outputs to changes in individual parameters, ch.11). However, this style or level of confirmation is rarely
and perhaps two at a time, may be tested, but analysis of the possible (or perhaps even appropriate) for large, integrated
effects of bigger combinations of parameter changes is usually models, especially when they have to extrapolate beyond the
limited to crude measures such as contribution to mean-square situation for which they were calibrated. If so, the criteria
variation in output, under some statistical assumptions. Funds have to be fitness for purpose and transparency of the process
are seldom available to cover the time that this testing takes, by which the model is produced, rather than consistency with
but even some crude error estimates based on output sensitivity all available knowledge. More detailed assessment of the
to the most important variables is useful. Often modellers do model ‘for the purposes for which it has been constructed’
not provide even this level of uncertainty estimation. must be considered (e.g. Ravetz, 1997).
The results from extensive sensitivity testing can be diffi- Details of such an approach are still at an early stage of de-
cult to interpret, because of the number and complexity of velopment, but should extend to: testing the sensitivity of the
cause-effect relations tested. To minimise the difficulty, clear model to plausible changes in input parameters; where possible
priorities are needed for which features of which variables to or desirable, changes in assumptions about model structure; as
examine, and which uncertainties to cover. A good deal of trial well as documentation and critical scrutiny of the process by
and error may be required to fix these priorities. which the model has been developed, including the assump-
Few approaches explicitly consider uncertainty introduced tions invoked. A critical difference from traditional model
by the system conceptualisation or model structure. Alterna- ‘‘validation’’ is the openly subjective nature of such criteria.
tive structures and conceptualisations are unlikely to be exam- Fitness for purpose should also include ‘softer’ criteria like
ined after an early stage. The reasons include preferences of ability to accommodate unexpected scenarios and to report
the developer, compatibility with previous practice or other predictions under diverse categories (by interest group, by
bodies’ choices, availability of software tools, agency policy, location, by time, etc), and speed of responding to requests
peer pressure and fashion within technical communities, and for modified predictions. In other words, model accuracy
shortage of time and resources. It is hard to see how this (the traditional modeller’s criterion) is only one of the criteria
sort of uncertainty can be taken into account beyond remain- important in real applications.
ing alert to any compromises and doubts in such choices. In summary, the modelling process is about constructing or
On the positive side, the issue of uncertainty is widely rec- discovering purposeful, credible models from data and prior
ognised and increasing resources are being devoted to it. For knowledge, in consort with end-users, with every stage open
example, Hession and Storm (2000) demonstrate a method to critical review and revision. Sadly, too often in reality it
for incorporating uncertainty analysis in watershed-level mod- is the application of a predetermined model in a highly con-
elling and summarise a lot of the literature in this applied area. stricted way to a problem, and to the social dimensions of
A recent special issue of this journal (Jolma and Norton, 2005) which the modeller is oblivious.
A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614 613

4. Minimum standards and education Bates, D., Watts, D., 1988. Non-linear Regression Analysis and Its Applica-
tions. John Wiley and Sons, New York.
Beck, M.B., 2005. Vulnerability of water quality in intensively developing ur-
We conclude by noting that certain minimum standards ban watersheds. Environmental Modelling and Software 20, 381e400.
suggest themselves in reporting on model development and Bellman, R., Åstrom, K.-J., 1970. On structural identifiability. Mathematical
performance and in progressing knowledge. Aber et al. Biosciences 7, 329e339.
(2003) summarise a workshop discussion on much-needed Beven, K.J., 2000. Rainfall-Runoff Modelling: The Primer. John Wiley and
standards, such as exist for ecological data, of practice for Sons Ltd, Chichester.
Caminiti, J.E., 2004. Catchment modelling e a resource manager’s perspec-
review and publication of models in ecology. They relate to tive. Environmental Modelling and Software 19, 991e997.
reporting on model structure, parameterisation, testing and Checchi, N., Marsili-Libelli, S., 2005. Reliability of parameter estimation in
sensitivity analysis. Hoping to cover a wide range of model- respirometric models. Water Research 39, 3686e3696.
ling situations, we recommend that the standards include Davis, J.R., 1995. Expert systems and environmental modelling. In:
(but may not be limited to): Jakeman, A.J., Beck, M.B., McAleer, M.J. (Eds.), Modelling Change in
Environmental Systems. Wiley, Chichester, pp. 505e517.
Denzer, R., 2005. Generic integration of environmental decision support systems
 clear statement of the objectives and clients of the model- e state-of-the-art. Environmental Modelling and Software 20, 1217e1223.
ling exercise; Fath, B.D., Beck, M.B., 2005. Elucidating public perceptions of environmental
 documentation of the nature (identity, provenance, quan- behavior: a case study of Lake Lanier. Environmental Modelling and Soft-
tity and quality) of the data used to drive, identify and ware 20, 485e498.
Forsyth, R., 1984. The expert systems phenomenon. In: Forsyth, R. (Ed.), Expert
test the model; Systems: Principles and Case Studies. Chapman and Hall, London, pp. 3e8.
 a strong rationale for the choice of model families and fea- Funtowicz, S.O., Ravetz, J.R., 1990. Uncertainty and Quality in Science and
tures (encompassing alternatives); Policy. Kluwer, Dordrecht.
 justification of the methods and criteria employed in Gawthrop, P.J., Smith, L.S., 1996. Metamodelling: Bond Graphs and Dynamic
calibration; Systems. Prentice Hall, Englewood Cliffs, NJ, USA.
Gernaey, K.V., van Loosdrecht, M.C.M., Henze, M., Lind, M., Jorgensen, S.B.,
 as thorough analysis and testing of model performance as 2004. Activated sludge wastewater treatment plant modelling and simula-
resources allow and the application demands; tion: state of the art. Environmental Modelling and Software 19, 763e783.
 a resultant statement of model utility, assumptions, accu- Grafton, R.Q., Adamowicz, W., Dupont, D., Nelson, H., Hill, R.J., Renzetti, S.,
racy, limitations, and the need and potential for improve- 2004. The Economics of the Environment and Natural Resources. Black-
ment; and quite obviously but importantly; well Publishing Ltd, Oxford.
Hare, M., Letcher, R.A., Jakeman, A.J., 2003. Participatory modelling in nat-
 fully adequate reporting of all of the above, sufficient to ural resource management: a comparison of four case studies. Integrated
allow informed criticism. Assessment 4 (2), 62e72.
Hawking, S.W., 1988. A Brief History of Time: From the Big Bang to Black
Adoption of these standards by modellers, through fuller Holes. Bantam Books, New York.
execution and reporting of the steps outlined in this paper, Hession, W.C., Storm, D.E., 2000. Watershed-level uncertainties; implications
for phosphorus management and eutrophication. Journal of Environmental
would benefit both the model-building community and those Quality 29, 1172e1179.
relying on model-based insight and model recommendations Jakeman, A.J., Hornberger, G.M., 1993. How much complexity is warranted in
to make decisions. a rainfallerunoff model? Water Resources Research 29, 2637e2649.
In addition to adhering to standards, the education of mod- Jakeman, A.J., Beck, M.B., McAleer, M.J. (Eds.), 1993. Modelling Change in
ellers on further aspects is warranted, for instance on how to Environmental Systems. Wiley, Chichester.
Janssen, P., Stoica, P., Söderström, T., Eykhoff, P., 1988. Model structure se-
engage with clients and stakeholders, on the need to develop lection for multivariable systems by cross-validation methods. Interna-
more flexible models and on understanding the context in tional Journal of Control 47, 1737e1758.
which the model will be used. Jolma, A., Norton, J.P., 2005. Methods of uncertainty treatment in environ-
mental models. Environmental Modelling and Software 20, 979e980.
Acknowledgments Jorgensen, S.E., Bendoricchio, G., 2001. Fundamentals of Ecological Model-
ling. Elsevier, Amsterdam.
Kidd, A.L., 1987. Knowledge acquisition an introductory framework. In:
The authors are grateful to Jessica Spate, Stefano Marsili-
Kidd, A.L. (Ed.), Knowledge Acquisition for Expert Systems: A Practical
Libelli, David Swayne, Richard Davis, Rodolfo Soncini Sessa, Handbook. Plenum Press, New York.
Ralf Seppelt, Alexey Voinov, Tom Chapman and several anon- Konikow, L.F., Bredehoeft, J.D., 1992. Ground-water models cannot be vali-
ymous reviewers for their detailed comments on the manu- dated. Advances in Water Resources 15, 75e83.
script. Thanks to Andrea Rizzoli for managing the Letcher, R.A., Jakeman, A.J., 2003. Application of an adaptive method for in-
tegrated assessment of water allocation issues in the Namoi River Catch-
manuscript review process and Giorgio Guariso for inventing
ment, Australia. Integrated Assessment 4, 73e89.
the Position Paper idea and process for EMS. Ljung, L., 1999. System Identification e Theory for the User, second ed. Pren-
tice Hall, Upper Saddle River, NJ.
References Marsili-Libelli, S., Checchi, N., 2005. Identification of dynamic models for
horizontal subsurface constructed wetlands. Ecological Modelling 187,
Aber, J.D., Bernhardt, E.S., Dijkstra, F.A., Gardner, R.H., Macneale, K.H., 201e218.
Parton, W.J., Pickett, S.T.A., Urban, D.L., Weathers, K.C., 2003. Standards Mostert, E., Participation for sustainable water management. In: Giupponi, C.,
of practice for review and publication of models: summary of discussion. Jakeman, A., Karssenberg D., Hare M. (Eds.), Sustainable Management of
In: Canham, C.D., Cole, J.J., Lauenroth, W.K. (Eds.), Models in Ecosys- Water Resources: An Integrated Approach. Edward Elgar, Cheltenham,
tem Science. Princeton University Press, New Jersey. UK, in press.
614 A.J. Jakeman et al. / Environmental Modelling & Software 21 (2006) 602e614

Murray-Darling Basin Commission, 2000. Groundwater Flow Modelling Seppelt, R., 2003. Computer-based Environmental Management. VCH-Wiley
Guideline. Murray-Darling Basin Commission, Canberra. Project no. 125. Verlag GmbH & Co, Weinheim, Germany.
Norton, J.P., 1975. Optimal smoothing in the identification of linear time- Söderström, T., Stoica, P., 1989. System Identification. Prentice Hall Interna-
varying systems. Proceedings of the Institution of Electrical Engineers tional, UK.
122 (6), 663e668. STOWA/RIZA, 1999. Smooth Modelling in Water Management, Good Mod-
Norton, J.P., 1986. An Introduction to Identification. Academic Press, London. elling Practice Handbook. STOWA Report 99e05. Dutch Department of
Oreskes, N., Shrader-Frechette, K., Belitz, K., 1994. Verification, validation, Public Works, Institute for Inland Water Management and Waste Water
and confirmation of numerical models in the earth sciences. Science Treatment, ISBN 90-5773-056-1. Report 99.036.
263, 641e646. Van der Sluijs, J.P., Craye, M., Funtowicz, S., Kloprogge, P., Ravetz, J.,
Parker, P., Letcher, R., Jakeman, A.J., Beck, M.B., Harris, G., Argent, R.M., Risbey, J., 2005. Combining quantitative and qualitative measures of un-
Hare, M., Pahl-Wostl, C., Voinov, A., Janssen, M., et al., 2002. Progress certainty in model-based environmental assessment: the NUSAP system.
in integrated assessment and modeling. Environmental Modelling and Risk Analysis 25, 481e492.
Software 7 (3), 209e217. Veres, S.M., 1991. Structure Selection of Stochastic Dynamic Systems.
Petersen, B., Gernaey, K., Devisscher, M., Dochain, D., Vanrollegem, P.A., Gordon & Breach, London.
2003. A simplified method to assess structurally identifiable parameters Versyck, K.J., Claes, J.E., Van Impe, J.F., 1998. Optimal experimental design
in Monod-based activated sludge models. Water Research 38, 2893e2904. for practical identification of unstructured growth models. Mathematics
Ravetz, J.R., 1997. Integrated Environmental Assessment Forum: Developing and Computers in Simulation 46, 621e629.
Guidelines for ‘‘Good Practice’’. Darmstadt University of Technology.. Wainwright, J., Mulligan, M. (Eds.), 2004. Environmental Modelling: Finding
ULYSSES WP-97e1, ULYSSES Project. Simplicity in Complexity. Wiley, Chichester.
Refsgaard, J.C., Henriksen, H.J., Harrar, W.G., Scholte, H., Kassahun, A., Walker, W.E., Harremoës, P., Rotmans, J., van der Sluijs, J.P., van
2005. Quality assurance in model based water management e review of Asselt, M.B.A., Janssen, P., Krayer von Krauss, M.P., 2003. Defining un-
existing practice and outline of new approaches. Environmental Modelling certainty: a conceptual basis for uncertainty management in model-based
and Software 20, 1201e1215. decision support. Integrated Assessment 4, 5e18.
Reichert, P., Omlin, M., 1997. On the usefulness of overparameterized ecolog- Walter, E., Pronzato, L., 1997. Identification of Parametric Models from
ical models. Ecological Modelling 95, 289e299. Experimental Data. Springer Verlag, Berlin.
Risbey, J., Kandlikar, M., Patwardhan, A., 1996. Assessing integrated assess- Wellstead, P.E., 1979. Introduction to Physical System Modelling. Academic
ments. Climatic Change 34, 369e395. Press, London.
Saltelli, A., Chan, K., Scott, E.M. (Eds.), 2000. Sensitivity Analysis. Wiley, Young, P.C., 1984. Recursive Estimation and Time Series Analysis. Springer
Chichester, UK. Verlag, Berlin.
Schmoldt, D.L., Rauscher, H.M., 1996. Building Knowledge-based Systems Young, P.C., 1993. Environmental modelling and the scientific method. In:
for Natural Resource Management. Chapman and Hall, New York. P.C.Young (Ed.), Concise Encyclopaedia of Environmental Systems. Per-
Seber, G.A.F., Wild, C.J., 1989. Non-linear Regression. Wiley, New York. gamon, Oxford, pp. 204e206.

You might also like