How To Write The Methodology in A Report, ACAPS Documenting Methods and Data in Rapid Needs Assessments May 2012

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Aldo Benini

A note for ACAPS

Documenting methods and data


in rapid needs assessments

Guidance for assessment teams

16 May 2012

1
Contents

Acknowledgement ........................................................................................................... 4
Summary ......................................................................................................................... 5
Recommendations - Short version ................................................................................... 6
Generalities ................................................................................................................. 6
Data............................................................................................................................. 6
Methods ...................................................................................................................... 7
Documentation in the final datasets ............................................................................. 8
Introduction ..................................................................................................................... 9
[Sidebar:] A close look at some reports .................................................................. 10
Methodology section in the report ................................................................................. 14
The section on data .................................................................................................... 15
The "who" ............................................................................................................. 15
The "what" ............................................................................................................ 16
The "when" and "where" ........................................................................................ 19
The section on methods ............................................................................................. 19
The "how" ............................................................................................................. 20
Design and design participation ............................................................................. 20
Data collection methods......................................................................................... 21
Sampling ............................................................................................................... 23
Data processing ..................................................................................................... 26
Analysis methods and participation........................................................................ 26

[cont. next page]

2
Documentation in the final datasets ............................................................................... 29
Generalities ............................................................................................................... 29
Spreadsheet database ................................................................................................. 30
Workbook level ..................................................................................................... 30
Worksheet level ..................................................................................................... 31
Descriptive statistics .............................................................................................. 32
The GIS project ......................................................................................................... 33
Project-level metadata............................................................................................ 33
Layer-level metadata ............................................................................................. 33
Outlook ......................................................................................................................... 33
Appendix....................................................................................................................... 34
Macro to document formulas in an Excel worksheet .................................................. 34
References ..................................................................................................................... 35

Tables and figures

Table 1: Methodological details in a sample of 37 assessment reports............................ 10

Figure 1: Map of affected provinces, Cambodia 2011 floods, by information density .... 18
Figure 2: Use of the INDIRECT function for descriptive statistics ................................. 33

3
Acknowledgement
For the structure of the sections on data and methods, I have been inspired by "The
Chicago Guide to Writing about Multivariate Analysis" (Miller 2005). Patrice
Chataigner, with ACAPS, drew my attention to the Guide.

The metadata recommendations regarding GIS projects largely follow advice from
Olivier Cottray, Geneva International Center for Humanitarian Demining, and Charles
Conley, iMMAP, Washington DC.

This and related notes are available from the Resources page of the ACAPS Web site, at
http://www.acaps.org/en/resources.

4
Summary
This note offers guidance for writing the methodology section of rapid needs assessment
reports. It also enumerates good practices in documenting the databases that the
assessment team leaves behind for its principals, successors and other interested parties.

The recommendations are presented in a short version for the rushed reader, and in a
longer one that provides rationales.

The section on data addresses the "who" and "what" of the universe and sample used in
the assessment. The section on methods deals with the "how", and specifically with

• Design and design participation


• Data collection methods
• Sampling process
• Data processing
• Analysis methods and participation

The database section assumes that data will be held in spreadsheet and GIS applications
and enumerates requisite metadata and additional good practices.

The data and methods chapter should be short, not exceeding six pages. It is short
because of time pressure and of the fatigue that dampens both those who write and those
who read it. This has several consequences:

• Information about the data that is not of general interest should be described in the
metadata areas of the databases, not in the report.
• The chapter must not be a restatement of the terms of reference or a condensed
version of the substantive chapters.
• Outside the mandatory basic points, interest in the chapter is best stimulated by
focusing on the unusual and on methodological innovations that the assessment
contributed.

Our approach to "Data and methods", therefore, can be summarized as "culture before
checklists" - the chapter is to argue the methodological soundness of the assessment while
giving the analyst considerable freedom as to how he makes his case.

"Methodological soundness", however, is a backward-looking concept unless the analyst


makes an effort to formulate at least some points worth remembering when the next
disaster strikes, and the next assessment begins. This is chiefly a matter of style and
emphasis, of highlighting the unusual over the trivial, the innovative over the well-
rehearsed, the teaching moment over the accountant mindset.

5
Recommendations - Short version
This is a short version of the recommendations, stripped of most of the rationales given in
the longer chapter. The recommendations regarding the metadata to be placed inside the
databases are not abbreviated here; only a reference to them is given to the pages where
you find them in already fairly condensed formulation.

Generalities
• Empathize with the readers of the assessment report, with your personal successor
in the team, other assessment teams, with others needing to re-use the data.
• Make limitations, strengths and innovative content of data and methods
transparent.
• Make the data reusable.
• Write the "Data and methods" chapter such that it strengthens the credibility of
the assessment.
• Do not re-write yet another substantive summary or restate the terms of reference.
• Give credit to participating stakeholders, in ways that highlight the networked
approach, not as a substitute for methodological accountability.

You determine the reasonable effort, space and detail. Write as little as you can, as much
as you must, but do provide the level of detail that enables the reader to understand the
limits of validity, reliability and completeness as well as the value-added to the wider
assessment culture and toolbox.

Information is required both on the data and on the methods by which they were collected
and analyzed. Detailed variable descriptions belong inside the databases, in spaces meant
for metadata, not in the report.

Data
The data section speaks to cases, content (variables) and values, in different detail.

Cases
• Define the universe, the set of units (affected groups, communities, areas) about
which the assessment was conducted and the resolution of the data (villages,
subdistricts, districts, etc.).
• Define the denominators if impacts are described in rates or ratios.
• Describe sample size and structure (by area, group, data-collecting agency,
whichever is of chief interest) but reserve the description of the sampling method
for the methods section.
• [As a style reminder for the entire report: If the sample size is below 100, use
absolute numbers, not percentages.]

6
• A map colored by coverage intensity (e.g., "Full enumeration in all sub-districts"
/ "Some enumerations, some estimates" / "Estimates for all sub-districts" /
"Global district estimate" / "No estimate so far") may be helpful.

Content
• Briefly enumerate the topics on which detailed data were collected and refer to
the database(s) that contain more detailed descriptions of the variables.
• Enumerate the topics and sources of important secondary data sets ("Data on the
number of mobile phone calls made out of sub-districts in two periods
immediately prior to and after the earthquake - an indicator of service disruption
- were obtained from ..").
• Give more detail if measures are used that are not intuitive, unconventional, or
derived in complex modes.

Values
• Restrict remarks to missing values and problematic outliers. Refer to places where
the users may find descriptive statistics (annex, tables in Excel workbook).
• Note missing value problems if significant numbers of cases are excluded from
some or all analyses because of missing data. If imputations were made, describe
them in the methods section.
• If cases are disqualified or adjusted because of suspect values (extreme outliers),
note extent and coping measures (".. will be resurveyed next month ..").
[Caveats about problematic data have to appear also in the substantive chapters
(e.g., "The districts for which we have data reported that ..")].

Methods
Design
• Unless taken care of in other chapters, describe how the assessment was designed,
and which organizations were active in the design.
• Describe the data collection methods in minimally instructive detail ("Teams of
interviewers sat with small groups of key informants in each of the visited affected
villages, conducting semi-structured interviews in Haussa and Fulani, using a
common thematic guideline printed in English. In addition, the teams conducted
short interviews with, and collected attendance statistics from, personnel of
schools and health care facilities found operating within a one-kilometer radius
from the village centers ..").
• Elaborate on the unusual and innovative, highly successful or unexpectedly
difficult, language issues.
• State which instruments were pretested, and with what consequences.
• Do not claim multi-method synergies (e.g., triangulation) unless you can
demonstrate some non-trivial benefit.

7
• If questionnaire interviews were one of the principal methods of data collection,
attach the questionnaire(s). Mostly this has value for others only if you annotate
them for what they should know (problems, rationales for category sets, etc.).

Sample
• Describe the sampling process.
• For purposive samples, state the dimensions in which the assessment sought to
fathom range and diversity of impacts and needs, the planned and effective
sample sizes (and reasons for significant deviations), the sampling frame (or
frames) and the stratification. Describe convenience-sampling aspects, such as
those imposed by logistics.
• A table of sample strata, with planned and effective units, is helpful.

Data processing and analysis


• Describe field-editing post-coding, translation (if any), routing, error-checking
and data entry arrangements, use of novel media and problems of more than
fleeting interest.
• Basic descriptive statistics do not warrant listing, save a reference where tables
can be found. However, describe analytical statistics (= anything that uses
probabilities), composite measures, spatial analyses, uncommon measurement
units if any, together with the software used.
• If missing values are imputed, document method and extent (cases, variables).
This holds also when statistics are based on zeros de facto replacing the missing.
[In the database, create new variables for imputations and keep the original ones.]
• Unless taken care of elsewhere in the report, state data owner and public data
source, or address at which to apply for releases.
• Just as with design and data collection, if stakeholders gave significant input to
the analysis, describe it, in terms of events (workshops, etc.), translation support,
interpretation and connections with third parties.

Documentation in the final datasets


• Provide minimum metadata in spreadsheet workbooks and in GIS projects
• In particular, for every important data table in spreadsheet workbooks, create a
separate sheet listing all variables, with column number, variable names and
labels, formulas, as well as other comments.

The requirements as well as recommended good practices are enumerated starting on


page 29.

8
Introduction
This note details good practices that should be followed in documenting the methods
used and the major data sets assembled during a rapid needs assessment. Methodological
notes and annotated data sets - data about data, "metadata" - allow users to better see the
scope, strengths and limitations of the assessment. They can be used also to highlight
innovative qualities as well as critical connections to other endeavors, documents, data
sets or standards.

Purposes of documentation
Such notes and metadata have both social and technical functions. Socially, they mitigate
the loss of organizational memory when humanitarian personnel turn over. This is true
regardless of whether persons come and go within the same assessment team, across
cooperating teams, or across related assessments. The documentation is valuable
particularly for later arrivals and for remote users who are not in direct contact with those
initially collecting and analyzing the data. It helps them to make sense of the data and of
their conceptual and institutional milieux.

Technically, the documentation makes the data reusable, linkable to data from other
studies and safe from confusion with outdated, defective or unauthorized versions that
tend to clutter hard drives and e-mail attachments.

Too little or too much guidance


Most needs assessment reports place their methodological notes in a distinct chapter
although some minimal information regarding methods and data appears, by necessity, in
substantive chapters and in the executive summary. It is reasonable to assume that the
methodological chapter is the last one written, under time pressure, and sometimes after
essential substantive chapters have already been shared and debated with some of the
stakeholders. Time pressure and wind-down atmosphere may explain why there seems to
be little or nothing in the way of minimum requirements for documentation and data
preservation. In an informal review of 37 rapid assessment reports, all from major
international disasters between 2004 and 2011, ACAPS noted that key methodological
elements were treated very unequally.

9
Table 1: Methodological details in a sample of 37 assessment reports

Reports
Methodological elements
detailing
Assessment date 34
Assessment objective 33
Data collection method 28
Sample size 22
Data limitations 16
Sampling design 15
Questionnaire attached 12

Even the customary assumption that reports would routinely carry, in appendices, the
questionnaires used in the principal data collection proved incorrect.

[Sidebar:] A close look at some reports


In order to put a bit of flesh on the dry bones of prescriptive methodology, and also to honor those
many assessment team members who troubled to give accounts of the how's of their work, we
briefly scan four reports for the extent, style and notable takeaways. Here we are not so much
concerned with the completeness of the data and methods section, as detailed in the above table,
but with their orientation in time. Does the section essentially look backward, on what was
finished, or does it (attempt to) send messages for future assessments?

The multi-cluster assessment in the Pakistan floods of 2010


The "Multi-Cluster Rapid Humanitarian Needs Assessment" (UNOCHA 2010) was administered
in four provinces of Pakistan struck by massive floods in 2010. It dovetailed its methodological
accounting between sections of the report and a collection of appendices made available, for a
while, on a public Web site. In the report itself, the introduction has as much of a methodological
thrust than the dedicated "Note on sampling and methodology". A "note on gender
mainstreaming" elaborates on the gender-segregated information collection in villages and the
subsequent merging of the male/female records in the analysis. Those sections together fill about
five of the 54 pages.

With multi-stage selections made of districts (in Sindh also sub-districts), villages and
households, the focus is almost entirely on sampling. Interestingly, districts were selected on the
strength of information supplied by local NGOs and by preceding single-agency assessments
such as a WFP Initial Vulnerability Assessment. Villages were selected, in three provinces, at
random from a list of affected villages, and, in the more turbulent Sindh, by snowballing within the
two subdistricts reported to have the most displaced people within each selected district.

While the account is informative, there is little to be taken away in terms of lessons learned. The
"geographically dispersed purposive sample of the population in the areas most affected", with
383 settlements visited and 2,442 households interviewed, is massive by the habits of purposive
sampling. As usual for this type, the authors affirm that their findings "cannot be statistically
extrapolated to arrive at firm numeric conclusions." Is this really so? One is left to wonder why
certain other options were not taken: 1. a much smaller sample of villages, without household-
level data collections, and with an emphasis on speed, or 2. an attempt at representative

10
estimates of some kind, using auxiliary information (e.g. hydrological data) for some sensible
post-stratification.

The Initial Rapid Assessment after the Haiti earthquake of 2010


The draft version of the "CDC Summary of Initial Rapid Assessment (IRA) conducted by UN
OCHA in Haiti" (UNOCHA and CDC 2010) (121 pages) carries a brief summary description of the
methods used (1 page) and a discussion of "limitations" (1 page). These are followed by a more
detailed presentation in the appendix (5 pages with maps) and an extensive discussion of
variables (as interview questions and response entered in PDAs) (11 pages). No fewer than 223
sites were visited; analytic sample size for substantive sections varied between 158 and 212.

This assessment is unusual in the sense that it drew and implemented a systematic, stratified,
two-stage sample. The systematic selection was based on a complete tiling of the country by 10-
by-10 km squares with random starting points and fixed intervals. The stratification was between
the highest earthquake intensity area, which was Port-au-Prince (sections with complete census),
and the rest of the country (sample of quadrants). The two stages were of sections in Port-au-
Prince / quadrants outside and then two sites or settlements within each first-stage unit.

This data and methods section clearly has learning value for users engaged in future
assessments. To the extent that the IRA tool had been standardized, the detailed, question by
question, accounting of what went well and what less so, suggested areas to be revised and
others to be kept flexible. The limited programmability of PDAs should give pause for the use of
such instruments or, alternatively, fire up the development of rapidly reprogrammable survey
software. The sampling is carefully documented, and reasons why, despite the systematic sample
type, population-level estimates were not feasible are noted.

Bangladesh - Flooding and water-logging (2011)


This regionally limited disaster triggered a joint assessment in which 12 NGOs joined forces and
collaborated with local government officials. The 87-page final report (Anonymous 2011) devotes
two pages to the methodology. The section gives a succinct overview of different types of data
(quantitative area based versus qualitative site-based) and of the geographical coverage. Some
methodological points such as the scoring system for priority interventions are explained, briefly,
in the substantive chapters.

Of note, the division of labor among NGOs was geographic for the data collection, but by sector
for the analysis. Besides the site-specific focus group discussions, supplementary information
was obtained through emergency market mapping and nutrition surveillance. All in all, 61 sites in
41 local government areas were assessed. The 41 Unions were designated by the government;
the mode of selecting the 61 sites within them for focus group discussions is not explained.

The lessons to be taken away for others are essentially about the ability to rapidly form an
effective task force from engaged NGOs and to create, within a two-day workshop, a common
instrument that all will master in the field. Since participation, speed and consistency are often
thought to be in conflict, some greater detail about this successful dynamic and its challenges, if
any, would have been instructive. Also, the characterization of data as "qualitative" vs.
"quantitative" is in part misplaced, here as well as in other rapid assessment literature, hopefully
motivating the search for more appropriate labels.

Yemen - Joint rapid assessment of a regional conflict zone (2011)


In this assessment, six international NGOs with a tradition of working in the country were
supported by ACAPS. The 157-page report is one of the densest in terms of quantitative

11
information, most of it presented in charts (ACAPS 2011a) (available at
https://sites.google.com/site/ierpjna11/home). The 15-page methodology section carefully
balances that orientation with institutional and assessment process information. The sampling of
ultimately 46 focus group sites is derived over several pages, detailing the four sample strata (by
type of conflict outcome) as well as the regional distribution (of IDPs and of the individual NGOs'
mandates). Since the statistics are about estimated population totals per district, the sampling
frame (we expect a list of sites to select from) is not entirely clear, despite the amount of detail.
The effective sample, however, is fully transparent, by district, agency and affected group type.

In considerable detail, topics of security and access, training and ethics, data management and
data collection instruments as well as the assessment timeline are covered. Little is said about
the analysis although early on in the report the reader receives precise guidance as to how to
read the charts.

In terms of accountability, the methodology section is more than sufficient, even generous.
Doubts, however, arise as to the learning value for the reader who submits to parsing these
fifteen pages. What transpires forcefully from them is that the assessment succeeded in large
part because responsibilities, functional and regional, were allocated very precisely among the
partners, in a conflict region where security and movements were always precarious. By contrast,
the report makes no attempt to include any critical reflections on the analytic aspects. In fact, it
excuses itself with the, for outsiders, mystifying reference that "ACAPS followed a 'Rubik Cube'
data management and analysis model that has been tested in previous emergencies (Bolivia)".
One must assume that such questions were discussed, at least to some degree in the workshop
and in the organized feedback, but that the short timeline forbade their elaboration in the report.

Looking backward or forward?


Although these four examples come from a small fraction of rapid needs assessment reports,
they make it obvious that the accounting for the methodology poses two related dilemmas:

For the assessment agencies, the methodology section justifies that the work was done correctly.
The reader, however, would like to take away non-trivial implications for a critical reading of the
substantive findings and as reminders for future assessments. With the exception of Haiti, the
tension between accountability and teachable moments was largely resolved in favor of the
former.

The data and methods chapter may lean more towards the institutional setup or more towards the
internal analytic logic. Since most assessments bring together coalitions of the willing, in other
words of temporary partners who otherwise each live in their own organizational universe, some
detail about the cooperation framework is appropriate. However, (again with the exception of the
Haiti report) little is said about measurement issues and validity challenges, about how scales
and rankings were handled, or composite measures, if any, defined.

In sum, while brevity versus lengthiness is not of great concern, one may say that the reports, by
their argumentative choices, emphasize defensive accountability over aggressive projection, an
orientation towards the past over one towards the future. Their authors can rightly point out that
they had neither the time nor the mandate to teach others lessons. Also, collective learning is
better served by the movement of experienced team members to future assessments than by
spilling more sweat and ink on documents that generally are short-lived. But to the extent that one
writes on methodology anyway, he might just as well strive to transcend the single moment.

12
International standards and tools
On the other extreme, in academic research, commercial survey and government IT
milieux, "the world of metadata has become greatly elaborated in the last few years"
(Groves, Fowler et al. 2004: 338). Several international initiatives have been working to
define standards for data documentation (IHSN 2009a, 2009b; DDI 2011; Wikipedia
2012a, 2012b). Researchers find detailed guidance in a freely downloadable book by the
Organisation for Economic Co-operation and Development (OECD 2007). The
International Household Survey Network (IHSN) offers a freeware application for data
documentation useful for other survey types as well. Statistical applications such as
STATA facilitate documenting data structures with semi-automatic codebooks. It is
conceivable that Excel add-ins for similar purposes exist; we have not found any yet.
Metadata standards for geographic data have been developed by the International
Organization for Standardization (ISO 19115 and affiliates) and, in the USA, by the
Federal Geographic Data Committee.

Reasonable effort
While the internationally promoted metadata standards provide useful pointers, they are
too detailed and too laborious for the purposes of rapid assessments. We need a
pragmatic system of good practices and tools, adequate for the purpose and appropriate
given time and work pressures, low user priority and limited shelf life. While common
practices are ideal, the particular situation of each assessment has to be recognized.
Ultimately, it is up to the assessment team to determine the extent and emphases of the
documentation. The team will determine the right amount of information, in order to lend
credence to the report and to make the data further usable.

Culture rather than checklists


It also seems productive to give considerable freedom to the team member who writes
most of the chapter on data and methods, or who is the one synthesizing the contributions
by others. This person will often be the one in the team who led the analysis, and who
knows the data landscape best. She may have a clear notion of the key points to
communicate to users and successors in a methodological narrative of her own that
should be helped, but not hindered by standard checklists.

Our guidance for describing data and methods, therefore, is "culture, not checklists". This
is not true to the same degree of the final shape expected of the databases. This ought
indeed to be edited respecting some specific requirements.

Accordingly, the rest of this note is divided into a section that guides the description, in
the assessment report, of methods and data and another concerning metadata to be given
in the databases.

13
Methodology section in the report
Data and methods can be described in different places in the assessment report. The
default case will be the production of a separate chapter devoted to both. The chapter is to
speak to the majority of assessment users, in ways that clarify how data limitations,
choice of method, innovations and special efforts affect the substantive findings. In
addition, if we must anticipate subsequent assessments, evaluations or studies by
specialist experts, separate notes may be helpful, such as on the specifics of a GIS project
that was created to visualize assessment findings.

The "must have"s


Chapter sections may focus on the data and on the methods used to collect and analyze
them. The separation will remain incomplete because both the generation of primary data
and the incorporation of secondary data depend on methods. However, we appreciate the
advice of the "Chicago Guide to Writing about Multivariate Analysis" (Miller 2005: 272-
300) to organize the data description by sections inspired by W-words and honorary W-
words, such as

• Who
• When
• Where
• How, and
• What.

In practice, the distinctions will not always as clear-cut as one wishes. The interpretation
of what belongs in the "who", and what in the "what", remains somewhat uncertain. In
the Chicago Guide, the "who" relates to the cases (the disaster-affected entities), which,
technically, become the rows of the data tables. The "what" comprises the variables that
express properties of the cases (in particular measures of specific impacts), handled in the
columns of the data tables.

Also, one has to keep in mind that in needs assessments there is no strict analog to the
academic "study design" and that they are often carried out by networks of government
and NGO participants, with local variations in methods. If these variations are
consequential (such as for the pattern of missing information), the methodological
chapter needs to report them. The strength or absence of multiple participants, their
functional or regional affiliations, and their differences vis-à-vis data types and data
quality will influence how and where in the chapter we want to list the data sources.

This is another way of saying that a listing of participating organizations, or of local


officials interviewed, is not a sufficient description of the assessment methodology.
Similarly, the broad consensus implied by the logos, displayed on the cover pages, of
agencies that have signed up to the assessment, does not dispense from presenting the
soundness of methods and the scope of the data.

14
Despite these caveats, in most cases the chapter can indeed by divided, more or less
neatly, into a data section and a methods section.

The section on data


The sequence of data aspects that this section addresses - the "who", "what", etc. - is to a
certain degree malleable. We give preference to context and coherence over a fixed
treatment. The sequence followed here therefore gives an example rather than a firm rule.

The "who"
The universe of the assessment - the subject of interest and the sites, groups or
individuals concerned by it - will already have been described in the main part of the
study. The sample on which the findings are based will likely have been mentioned in
several places in the reports, if not fully described. The data-and-methods chapter is to
make the definitions more precise.

Defining the universe


The default assumption is that the assessment is purely about disaster-affected regions
and groups. This in itself is not clear enough. Three factors need attention:

• The degree of resolution in defining groups or regions: For example, the


population of all communes affected by a flood is not the same as that of all
affected districts (there may be unaffected communes in parts of affected
districts).
• The specificity of the defining impacts: To stay with the same example,
communes affected by the flood are not necessarily the same as communes with
households displaced by the flood (there may be villages on higher ground whose
low-lying crops were destroyed while the built-up area remained dry). For
different impacts, different units will come into play - persons for deaths,
households for displacements, hectares for crop loss, etc.
• The impact threshold for inclusion: During the assessment, information will
surface also about non-affected and lightly affected groups and areas; it will help
to delimit the significantly affected ones. Lightly affected units may pose
definitional problems that need to be decided one way or another.

One of the tasks in this chapter is make the actual operational definitions and
denominators clear.

The result of sampling, not yet its method


That concerns the universe - that about which the assessment speaks. As for the sample -
the units within the universe that provided the evidence -, the data section should
summarize the final analytic sample, i.e. the number and distribution of surveyed units

15
about which enough data was collected in order to be included in all or in most of the
relevant statistics. How the sample was drawn belongs in the methods section.

Problems may arise when the analytic sample fluctuates by variable, and the missing
values are not justified by substantive differences (such as between stable communities
and camps for displaced persons). This happens, for example, in networked assessments
when some participating organizations drop questions from their survey interviews. If the
differences can be succinctly described, it should be done here, in a phrase such as
"Information about most disaster impacts was collected in 45 sample villages in all five
affected districts; however, information about the education situation is available for 35
villages in four districts". If the analytic sample is intricate, with arcane differences, it
should be described in the methods section, perhaps with a detailed tabular breakdown in
an appendix.

Regardless, the process of drawing the sample and the differences between planned,
executed and analytic samples are to be described under "methods".

Data sources
At some point in this section, the data sources should be described. There are no
universal rules where this is done best. One might proceed, for example, starting from the
"what this is exactly about", followed by "the major portion of data was collected in a
sample of .." and "these communities were visited by workers of ..", and supplemented by
"in addition, these sources contributed [primary, secondary] information on ..", with an
overview table, possibly stating public file locations. The section can then conclude with
the discussion of the analytic sample, as suggested above.

The "what"
The detailed description of variables is best left to the Variables sheet or other metadata
sections in the database, such as in the final Excel workbook. In the data section of the
report, the assessment team should briefly enumerate the major sub-topics (e.g, by sector,
data source, or by data table if there are several tables for distinct entities). Greater detail
should be reserved for explanations of variables that are

• not intuitive
• measured in unconventional ways
• derived from others (such as a composite measure of impact)
• (if they are important) excluded from analysis on account of poor data.

This is the section also where reliability and missing information are appropriately
discussed. In broad terms, we evaluate measures as valid / invalid and measurements as

• precise and reliable


• precise, but unreliable

16
• coarse estimates
• missing for this particular variable ("item non-response"), or
• missing for all variables in the case ("unit non-response").

If missing values were imputed, the imputation methods belong in the methods section.

When the data are problematic


Unreliable, coarse and missing information, and their various mixtures, provoke
interpretation problems. These must be signaled already in the main parts of the report.
There, appropriate language qualifiers suffice (e.g., "The districts for which we have data
reported approximately 12,000 displaced households"). The methodological chapter
should delimit these challenges in greater detail. Two major considerations apply:

• Formally, statistics computed from mixtures of precise and coarse values create a
wrong impression of precision and reliability when in fact none or few cases
returned precise and reliable values. "A total of 12,341 displaced households"
may result from 341 in District A (counted by officials, but the true value may be
closer to 500), 12,000 in B (rough estimate) and missing in C (because no counts
or estimates were communicated).
• Substantively, unreliable, imprecise and missing data are not random. Units with
no reports, incomplete reports, or reports with broad estimates only may be facing
particular difficulties:
o They may be severely impacted; the disaster impairs data collection or
data transmission.
o They may have reservations towards the assessment and choose to respond
selectively or not at all.
o They may place high estimates of casualties and damage in order to attract
attention and relief.
o They may be only lightly touched by the disaster and therefore may not
have received much in the way of assessment support.

Significant patterns of unreliability (which we often can judge subjectively only),


imprecision and missingness should be detailed. This can be done in tabular form or,
more compellingly, in maps colored by information quality. For example, districts in
which floods displaced people could be graded on a scale like "Full enumeration in all
sub-districts" / "Some enumerations, some estimates" / "Estimates for all sub-districts" /
"Global district estimate" / "No estimate so far". Obviously, an acceptable terminology
has to found in labeling grades. This example is from Cambodia.

17
Figure 1: Map of affected provinces, Cambodia 2011 floods, by information density

Source: UNOCHA (2011: 27, Figure 19)

Attach questionnaires, but ..


If a questionnaire was used in the principal data collection, it should be placed in an
annex. Readers can peruse it for the kinds of questions asked and for the categorizations
that the coding instructions reveal. However, the learning value from reading a
questionnaire is limited unless it is presented with detailed annotations. Teams will likely
not have the time to produce them. Instead, for users who wish to go deeper into the
methodological aspects (for example, while designing a similar assessment elsewhere), it
may be more helpful to create a column "Refers to question no." in the Variables sheet of
the database.

"Know as you need", no more


The tenor of our guidance for this section is: Say as little as you can, say as much as you
must, but do provide the level of detail that enables the reader to understand the limits of
validity, reliability and completeness. Refer to metadata provided as part of the database
or in separate technical notes.

18
The "when" and "where"
The "when" and "where" are elements that one may treat as subordinate to, or properties
of, the "who" and "what". Notably, units have locations, and variables are measured for
certain points in time or periods of time. Relevant temporal and spatial information is
therefore attached to units and variables. The "who" and "what" sections can take care of
the details that need to be communicated. They do not justify separate sections.

Special situations
Here we wish to point out some special situations that should be described in appropriate
places in the chapter and/or in the databases. Such arise when there are temporal or
spatial discrepancies between units or variables that the analysis combines. Frequently,
the points in time when data on numerators and denominators were collected differ.

Rates of affected populations, denominated to census figures of administrative units,


provide an example. The census may have taken place several years back. If we make
adjustments for population growth, we need to state assumptions and parameters.
Temporal differences may occur also in slowly evolving disasters. In such situations -
say, the Pakistan floods moving from north to south -, the dates of first community
assessments may extend over a considerable period. If this is of consequence for the
adjustment findings, the way of dealing with it analytically should be noted.

In the spatial dimension, a common problem arises when impact variables are aggregated
to different administrative levels. For example, displaced households may be counted at
the lowest local government level, such as the commune or union, whereas crop damage
is estimated for districts only. Practically, if these impact types are to be compared side
by side, one needs to aggregate upward to the lowest common level. More finely grained
information on crop damage may become available later, which may again change
analytic opportunities.

The diversity of special situations in the temporal and spatial aspects is not foreseeable.
The only firm rule is that the chapter needs to explain what is needed to understand
current findings and to connect to subsequent assessment work.

The section on methods


This section tells the "how" - the ways and tools used to collect, process and analyze the
information on which the findings are based. This information may be broader than the
conventional aspect of "data". Some of it may consist, for example, of collections of
notes that team members jotted down in villages, outside the more structured interviews.
If village transects and the ensuring notes, however informally compiled and used, were
an important source of information, they should at least be mentioned ("In addition to
formal interviews, notes taken during villages transects were used for.."). De facto, many
assessments may be "multi-method"; this should be brought out in the methods section.

19
The "how"
Key aspects of the "how" that should be described include:

• Design and design participation


• Data collection methods
• Sampling process
• Data processing
• Analysis methods and participation

The extent to which stakeholder participation during design and analysis are to be
elaborated in this section depends on the actual input from outside the assessment team
and on whether the institutional set-up (scope, partners, coordination, etc.) was already
given sufficient space in preceding chapters. Even if this has already been taken care of,
stakeholder participants may have made important methodological contributions, such as
providing template elements from their own assessment work, questionnaire translation-
retranslation tests, or setting weights for different disaster impacts when preliminary
findings were discussed in a workshop. If their participation did influence specific
methodological choices, the methods section needs to reflect them.

The sequence in which the key elements are presented may vary. For example, some may
want to describe the sampling process before the data collection methods.

Design and design participation


Once the assessment objectives have been defined, they are translated into a mental
model of how the effects of the disaster will work themselves through the affected
communities. In the pragmatic world of rapid assessments, these models remain implicit.
For the most part, they are not put on paper because they are understood intuitively, both
at the design point and later by readers and users. In this, rapid assessments differ from
conventional academic research, which requires an explicit translation from theory to
measurement.

Indicators and uncommon measures


An explicit treatment, however, is required of standardized indicators that were selected
at the design phase and supposedly collected uniformly in all assessed sites. The
indicators need to be presented in the methods section, such as in a conceptual tree chart.
Some may be straightforward (the number of casualties as the expression of physical
impact), others will need justification. Two examples:

• The fraction of operating health centers after the disaster is less informative than
the number of consultations compared to pre-disaster levels, but easier to
establish.

20
• The celebration vs. cancellation among villages, in the weeks after the disaster, of
an important annual festival may attract ridicule as an indicator in a needs
assessment, but, if properly explained, it may be legitimate as a quick measure of
community stress.

As a general rule, any measure designed to proxy for something else needs description
and rationale in this section.

In a similar vein, multiple measurements of the same impacts are a design element that
needs explanation. In the Pakistan floods, for example, identical questionnaires were
administered to separate male and female groups in the community, out of concern to
include the women's perspective and knowledge. If composite measures were foreseen at
design, they are to be described here. If they were constructed ad-hoc later, the
description belongs in the analysis section.

Local contributions
The participation of stakeholders is acknowledged not only as a matter of respect and
integrity (in a spirit similar to scientific citations). It also will help explain substantive
and methodological choices advised by local knowledge. Participants from outside the
core team may offer pre-calibrated instruments. For example, a partner NGO may, pre-
disaster, have been using a "most significant change" method in its routine monitoring
(Davies and Dart 2005). Its tested local implementation is likely to work better in the
needs assessment than an imported question battery. Such adoptions are explained for
due authorship and to rule out the impression of incongruous foreign bodies.

Data collection methods


The opinions of assessment teams are probably formed more by information other than
that produced with formal methods. Visual contact, ad-hoc conversations, "bingo!"
stories, media images are more powerful than statistical tables. "What is necessary in
sensemaking is a good story" (Weick 1995: 60-61; as quoted by Davenport and Prusak
1998: 82).

These informal ways of sense-making and belief-testing shade into semi-formal


acquisition and processing, as in personal notes, and are finally "hardened" by the use of
formal methods. In this sense, needs assessment are multi-method, with less formal, but
more efficient "give me the big picture"-activities supplemented by more structured and
therefore more tedious formal methods.

What is a method?
On the informal end, information gathering is hardly ever documented. Recognized
formal methods, such as standardized questionnaire interviews, focus group discussions,
hydrological modeling in GIS, are candidates for the methods section (even though, as
we have seen, by far not all reports describe their methods). Semi-formal areas form a

21
grey zone; brief visits to villages, en route to some other place, may not count as
methodical work unless they follow some common pattern that qualifies them as
"transects" in the rapid appraisal lingo.

The minimum of formalization needed in order to earn description in the methods section
is not a universal constant. As a vague guideline, one may want to note as data collection
methods anything that

• results in data tables or maps, or


• follows a repeated query pattern, or
• was taught in a training event as a way to collect information.

GIS-based analyses and interview-based surveys exemplify methods of the first kind,
frequent informal questioning during fieldtrips about disruptions of weekly markets the
second (Benini 1992), instructing village teachers during payday how to organize cattle
counts the third.

Multi-method claims
Researchers, particularly of the qualitative and participatory creed, have taken to
"triangulation", the claim that they corroborate findings by testing them under several
methods. Few such claims deserve to be taken seriously, in the sense of concurrent
methods meaningfully focusing the falsibility of clearly stated hypotheses. Nevertheless,
if assessment findings were honed in a multi-method approach, this section should
elaborate on it. For example, estimates of displaced households supplied by commune
councils may be correlated with proportions of built-up zones that are flooded, calculated
from aerial photography. Not only the concordance between methods, but also the
divergence - in this example, why people persevere in flooded homes, or why they leave
in anticipation of worse to come - is of interest.

Finally, under data collection methods, the personnel applying them (were there any
female interviewers?), the training they received, translations and translation tests,
pretests (what, when and where, what consequences) need to described in adequate detail.

Adequate detail
"Adequate" is a matter of common sense, context and competing tasks. For example,
information on whether the team had time to pretest the principal questionnaire or not is a
"must-have" piece in the methods section. To write, in a description of focus groups, that
"numerous socio-professional groups were represented" may be scant. "In the majority
farmers, but also traders, teacher, health care personnel, and imams were present.
Separate sessions were held with women, most of whom engaged in farming, but in many
villages also midwives, students and domestic servants" may be enough. "Self-rated
occupation in attendance sheets revealed that 54 percent were male farmers, etc." in a
long listing would tire the reader.

22
Sampling
The preferred sampling method in rapid needs assessments is the purposive sample. We
select units (such as affected village and urban communities) because we have reason to
assume that they, more than others, serve the purpose of the assessment. This purpose is
to narrow the information gaps considered, at this point of stakeholder debate and
assignment design, the most pressing. An ACAPS technical brief gives the rationale
together with a simulated case study and detailed instructions (Acaps 2011b). The
bottomline is that purposive sampling aims to maximize our learning about the range and
diversity of disaster impacts rather than estimating averages representative of the whole
population of affected units.

Aspects of the sampling process to be reported include:

1. purpose (a brief restatement of what we were after)


2. sample size and deviations from plan
3. information about the units from which the sample was drawn (sampling frame)
4. stratification if any

The appropriate sequencing depends on material, style and best coherence.

Purposive and convenience samples


De facto, in most situations, the sample is a mixture of purposive and convenience
elements, and the task of the methods section is to summarize the planned, executed and
analytic samples and to explain how much they deviate from each other, and why.

Convenience rivals with purpose chiefly for logistics and security reasons. Visits to
selected sites may not be feasible; yet teams have to make the best use of their time,
transportation and other support resources. They may thus have to substitute achievable
site visits for initially planned ones, sometimes en route and based on shaky information 1.
Inaccessible selected sites and substitutions should be noted and commented on for the
possible bias that these changes may induce, e.g. by being limited to meeting
communities that are plausibly less affected, etc., as may be the case.

Reporting the sample size


The effective sample is the set of sites actually visited. The analytic sample is the set of
visited sites that yield useful information for the analysis. In data terms, the analytic
sample size is the number of cases with non-missing values. This number varies
depending on whether we consider a single variable or a set of variables. One can
1
In random sample-based surveys, substitutes are sometimes fixed in advance (also based on random
selection, usually within a short distance from the first choice). In disaster-affected areas, this is not
practical, and anyway not called for in purposive designs. But the purposive vs. convenience sample
challenge remains.

23
conceivably define the analytic sample on the basis of one pre-eminent indicator, such as
physical destruction in an earthquake: "The assessment is based on visits to X affected
communities for which on-site visits produced estimates of the relative degree of habitat
destruction." In this example, X would not be influenced by the number of communities
for which, say, landslide data are available.

Conversely, the sample size can be model-based. This obtains, for example, when a
composite measure is used. If reasonable in the circumstances, some missing values
might be replaced with imputed ones, and units with imputed valued kept in the analytic
sample. The language must reflect this: "The assessment is based on visits to, and reports
from, 25 sub-districts for which we have data on displaced households, crop damage and
health service disruption. For five sub-districts with no crop damage data, area-weighted
median values were substituted as temporary estimates (see details below)."

The point is that the sample size of interest is more than a simple number; it has to be
qualified, if briefly, for adjustments such as en-route substitutions of sites or imputed
values.

From what did we select?


Samples are drawn from enumerable sets of units that reflect the target population. Such
a set is known as a sampling frame. Most commonly, this is a list of all the units that have
a chance to be selected, such as the villages that are marked by name and location on a
map, or a list of affected communities provided by authorities.

Sometimes the selectable units have an indirect relation to the sites that we endeavor to
investigate. This may be the case, for example, when we only have map grid squares, in
the absence of village points. On arriving at selected squares, teams would then select
localities within, as a second stage of the sampling. It can also happen that end-stage
localities are not selectable because there is not enough information on them; instead,
teams select from a number of presumably feasible circuits or itineraries on which they
hope to find instances of the entities of interest. This applies chiefly to shifting
populations such as nomads and refugees.

Multiple-frame sampling occurs when teams have information on affected sites from
several sources and based on different entities, such as a list of affected villages and an
aerial map (or hydrographic model) of the flooding. Several types of information are then
put together in making on-site visit plans. If this occurs, it is to be described in
formulations such as: "The sample is composed of two parts. At first, 15 sites were
selected from a list of 145 affected communities in seven districts on which authorities
provided some initial information. The locations were identified, as far as possible, on
existing maps and were compared to flood maps derived from aerial photography. In a
second step, in two districts with large pockets of unreported flooded communities, a

24
separate list was made, from maps, of 55 place names in those areas, and a
supplementary sample of ten sites was selected."

Why we selected them


The reasons for which the purposive selection was done are to be stated if that has not yet
been done earlier in the report. If dealt with earlier, they may simply be reiterated in one
sentence and then used in row or column titles of an overview table.

The sample may have been stratified ex-ante, i.e. sub-samples were defined by categories
of interest (e.g. by target group, from ex-ante lists of IDP camps, host communities,
communities with returnees, etc.). The effective sub-samples will most likely have been
reported in the preceding substantive chapters. Here, in the methods section, the frame
and sample size figures for the strata should be presented side by side. They can be
crossed with other factors of relevance, e.g. the participating agencies, but such tables
quickly become unwieldy.

In other situations, stratification will not be relevant (e.g., in an earthquake affecting a


relatively homogenous set of communities) or not possible ex-ante. Also the effective
sample may contain cases that mix the categories, e.g. communities that host significant
numbers of IDPs from other areas as well as of returnees. While inconvenient for neat
tabulation, such discoveries are welcome in purposive-sample designs that probe the
diversity of situations. Common sense will dictate what can and must be described at this
point, in words, tables or short references to preceding chapters.

In fact, "stratification" is a bit of a misnomer in purposive sampling since any type of


information of interest may justify inclusion in the sample even if it attaches to one unit
only.

Generalizing from purposive samples to the population


Under tenuous assumptions - that local contexts and disaster impacts are similar within
certain population subsets -, one may consider generalizing from a purposive sample to
the wider affected population. The areas of presumed similarity would have to be defined
- e.g. concentric rings of similar Mercalli Index values around the epicenter, elevation
bands in a flood model - with enough sample members in each of them for a sample
reweighting. This is similar to poststratification in other contexts (Olsen, Orr et al. 2010:
19-21). The estimates would still be biased, but less so than if we generalized from the
unweighted sample.

To the extent that this procedure uses information from the sampling frame - notably size
and coordinates of populated places -, this should be mentioned here, in the section on
sampling, for the first time. The precise method should be described under "analysis
method".

25
We have not found any such reweighting applications in rapid needs assessments and will
therefore not pursue this for the moment. Readers interested in the generalization from
purposive samples may also consult Shadish et al. (2002: 374-389).

Data processing
If the assessment ran more or less smoothly, interest in data processing will be mild. The
report might briefly describe error-minimizing mechanisms and data entry organization.

While in large-scale sample surveys field editing arrangements may be elaborate (Sana
and Weinreb 2008), in rapid assessments it must suffice for somebody to have a look at
completed questionnaires while those collecting and those processing them are still
around and their memories fresh. The data processing segment should briefly describe the
checks - by whom, at what point, plus any problems of more than fleeting interest.

Similarly the segment should devote a short para to the data flow - which may be as
trivial as one team returning to its base and entering notes in computers every evening -
and where and how data were entered, and partial tables combined into final
comprehensive tables. If translation occurs between interviews and database, describe
where (when interviewers filled in questionnaires, or later at data entry) and by whom.
Mention the use of modern data entry and transmission media - whether in computer or
mobile phone assisted interviewing or in email transmission of decentralized batch entry
files.

Where are the data now?


If this has not been done elsewhere in the report, this segment should mention the names
and types of essential master data files left with the assessment authority, the ownership,
and how to obtain copies (from a Web site, or by applying to an office): "The assessment
team submitted final master tables of the community survey data, the sampling frame,
and auxiliary matter in [x-application, presumably Excel]. A GIS project with
administrative, populated places, river and road layers was created in [y-application],
with record identifiers that link to the substantive data tables." Etc.

A small photo and map image library may be part of the files deposited. If so, proper
captions and credits will keep them usable.

Analysis methods and participation


In theory, analysis follows collection and entry. In practice, some exploratory analysis
activity may take place already while collection and entry are ongoing. Descriptive
statistics and outliers can be usefully established on partial data while field teams and
entry personnel are still available for questions of plausibility and understanding.

Such clarification can later be helpful for, say, the recoding of variables. Most of such
deliberations will likely remain informal and merit discussion in the methods section only

26
if they fundamentally contradict something that readers would take for granted, or if they
lead to a practice change mid-course that affects data collection or variable definitions for
the later part of the sample.

Mention the unusual


The analysis operations themselves will for the most part remain simple, given the nature
of the sample. Sample weights and survey estimation do not come into play. Excel pivot
tables may be the workhorse pulling together all sorts of cross-tabulations, but this hardly
needs mentioning. Only if particular statistical packages were used, they should be noted.

More detail is recommended when advanced statistical procedures were employed, or


when derived variables are not self-explanatory. Examples include:

• Exploratory factor analysis or other data reduction techniques


• Spatial statistical analyses in the GIS application
• Composite measures of disaster impacts.

Measurement units that are not commonly used need to be explained, e.g. miles outside
maritime and US usage, the above-mentioned Mercalli Index in earthquakes, etc.
Recoded variables, if the changes are innocuous, are best documented in worksheets
within the Excel workbook that hold the original and recoded data. Only if the recoding is
motivated by important conceptual changes or produces highly surprising results, should
it be discussed in the methods section.

All descriptive statistics on purposive samples are legitimate and need no justification in
this section. They speak for the sample, not for the population. The challenge is in the
language appropriate in the substantive chapters and in the psychology of writer and
reader, who will not easily resist generalizations unless properly warned.

Descriptive statistics do not need justification ..


Conditional descriptive statistics are equally valid if clearly marked or formulated as
based on this sample. The statement in a substantive chapter that "teams visited affected
villages at different distance from the epicenter. In the ten villages seen within 50 km of
the epicenter, the estimated proportion of destroyed or severely damaged residential
buildings varied between 60 and 80 percent. In 15 villages located in the next 50 km out,
we estimated a range between 40 and 70 percent. Etc." is both legitimate and useful. It
does not require comment in the methods section. All the same it may be reassuring for
some to read here that "where statistics were presented in the substantive chapters, we
repeatedly reminded the reader that they came from a non-random sample".

Ranges of purposive sample statistics are particularly useful. Ruling out measurement
error, the population maximum is never smaller than the sample maximum. The
population minimum is never higher than the sample minimum. Finding one village with

27
80 percent destroyed buildings implies that the rate of destruction in part of the affected
area climbed as high as 80 percent. Admitting the possibility of considerable
measurement error (but not systematic bias!), we can still uphold that "from summary
estimates at multiple sites, we conclude that damage in areas near the epicenter is
extensive".

.. but extrapolations do
The burden of explanation would be much higher if the team went on to extrapolate that,
for example, "based on our xyz statistics and the pre-disaster population distribution, we
estimate that between one and two million affected persons will need shelter assistance
before the onset of winter." Something like this would call for detailed demonstration in
the analysis section.

Imputing missing values


It is not uncommon to see missing values de facto treated as zeros. For example, from
among twenty communities surveyed, the team may have collected estimates of displaced
households in 15. A statement to the effect that "we estimate that 2,500 families from the
20 villages have sought shelter elsewhere" is therefore incorrect if 2,500 is the sum of the
15 estimates. Correctly, one might say something like "In the 15 villages for which
estimates of displacement were available, a total of 2,500 ..".

The analyst may have reason to replace missing values with imputed ones. The
motivation may be stylistic (qualifiers like the above example are unwieldy and grow
tedious over the length of the report) or analytic. The latter may apply particularly with
composite measures, in order to minimize loss of cases.

If any imputations were made - including the use of zeros -, their extent (variables, cases)
and method should be noted. The literature appears silent on appropriate methods for
purposive samples. Spreadsheet users may be constrained to simple methods such as
using the median or a population-weighted median. Statistical applications offer more
adaptive methods such as predictive mean matching.

In the database, the original variables should be kept as such, and imputations made in a
different variable.

The use of imputed values should be signaled already in the data section, the specific
methods here.

Stakeholder participation
Analysis and interpretation of analysis results are intertwined in an iterative process.
Initial data analysis throws up questions; these may be shared with assessment
participants and observers with local or functional expertise. Their response opens new
perspectives in which the data, ideally, can be analyzed from different angles.

28
The typical event in which statistical findings and special knowledge are brought together
is the stakeholder workshop prior to finalizing the report. Such meetings, if well done, are
extremely valuable, but are seldom presented with more than a perfunctory reference to
the fact that they happened. In the analysis segment of this chapter, the team should
endeavor to say something specific and meaningful about how emphases and perceptions
were changed due to local input, and findings connected with broader agendas, or
qualified with specific local insights.

Participation in analysis is particularly valuable in determining the relative importance


that the various disaster impacts and unmet needs have in the minds of different
stakeholder groups. Under time pressure and with larger groups of participants, methods
such as the Nominal Group Technique (Delbecq, Van de Ven et al. 1975; SAC 2003;
Makundi, Manongi et al. 2006) can reveal preference orders. If such are used in events
with stakeholders, their preparation, conduct and results should be described summarily
in this chapter, and, if wider interest justifies, more extensively in a separate note.

Unfinished business
By the time the report is to be finalized, some parts of the assessment - more sites being
assessed, supplementary data awaited from another source, etc. - may be outstanding.
Arrangements may exist to update database and analysis for stakeholders after
publication. If this is the case, the report should mention it, both in the introduction and,
for the likely consequences for the current analysis results, in this section as well. A
couple phrases like "as of this writing [date], … we are expecting .. Linkages to
upcoming assessments are made easier by [e.g., some feature of the GIS project left
behind] .." may be enough.

Documentation in the final datasets


Generalities
For the following, we assume that data are held, shared and archived in two applications:
MS Excel for substantive data, and a GIS application for geographically referenced data.

We also assume that the data have been cleaned. For example, numeric variables no
longer have any text-format cells. This section does not deal with elementary formatting
issues. It recommends a number of practices for data management while recognizing that
analysts may have different habits and customs.

Metadata should to a large extent be held in the databases - Excel and GIS - in order to
keep the data and methods chapter in the report short. Also, the small number of users
who will ever venture into the databases should not have to constantly cross-reference
between these and the report.

29
Our recommendations are for substantially shorter metadata than those required by some
of the international standards. For illustration, this footnote speaks about the "Dublin
Core" metadata, vaunted for its simplicity, yet comparatively much more demanding 2.

Spreadsheet database
Workbook level

Minimum metadata
The required workbook-level metadata include:

• Meaningful file name, including country, assessment nickname, year-month-day


date of closure

2
The DCMI Metadata Element Set (ISO standard 15836), also known as the Dublin Core metadata
standard, is a simple set of elements for describing digital resources. This standard is particularly useful to
describe resources related to microdata such as questionnaires, reports, manuals, data processing scripts and
programs, etc. It was initiated in 1995 by the Online Computer Library Center (OCLC) and the National
Center for Supercomputing Applications (NCSA) at a workshop in Dublin, Ohio. Over the years it has
become the most widely used standard for describing digital resources on the Web and was approved as an
ISO standard in 2003. The standard is maintained and further developed by the Dublin Core Metadata
Initiative - an international organization dedicated to the promotion of interoperable metadata standards.

A major reason behind the success of the Dublin Core metadata standard is its simplicity. From the outset it
has been the goal of the designers to keep the element set as small and simple as possible to allow the
standard to be used by non-specialists. The purpose of the standard is to make it easy and inexpensive to
create simple descriptive records for information resources, while providing for effective retrieval of those
resources on the Web or in any similar networked environment. In its simplest form the Dublin Core
consists of 15 metadata elements, all of which are optional and repeatable. The 15 elements are:

1. Title
2. Subject
3. Description
4. Type
5. Source
6. Relation
7. Coverage
8. Creator
9. Publisher
10. Contributor
11. Rights
12. Date
13. Format
14. Identifier
15. Language

From: http://www.surveynetwork.org/home/index.php?q=tools/documentation/standards

30
• Document properties: lead organization as author, full assessment title, comments
including team leader, person finalizing this file, other participating organizations,
organization that owns the data.

Good practices
• Use R1C1 notation, particularly in documented formulas
• If the workbook contains numerous sheets, create a hyperlinked table of contents,
manually or using an Excel add-in such as "excel-it"3
• If the workbook has many named ranges, create a listing in a separate auxiliary
sheet (helpful for successors / users wanting to re-calculate something) 4.

Worksheet level

Minimum metadata
• For each major datasheet in the workbook, create a variables sheet:

1. If the datasheet is called "CommuneData", name the variables sheet


"CommuneVariables", etc.
2. The minimum fields in the variables sheets include: ColNo (the column
number that the variable has in the datasheet), VarName (variable name),
VarLabel (variable label), one or several Comment1 Comment2 etc. fields 5.
3. If columns in data sheet are colored, these colors should also be transferred to
the Variables sheet.

• Each datasheet has to have a unique record identifier.

An example can be found in the demo workbook distributed with the ACAPS note
"A template for managing data in needs assessments".

Good practices
• Use strictly unique variable names, short, no spaces, no special characters except
underscore (the variable labels in the variables should be descriptively long.)
• Name all column vectors in the data sheets so that they can be readily referenced
in formulas 6.
• In addition to the unique record identifier, include linkage identifiers (particularly
to the GIS project) and administrative identifiers (e.g., p-codes), as needed.

3
http://www.excel-it.com/freeadd-ins/AddTOC.zip.
4
Automate the process with Name Manager, from http://www.jkp-ads.com/OfficeMarketPlaceNM-EN.asp.
5
VarName is easy to fill by copying the data table header row and paste (transpose, values only) into this
column.
6
Collectively done by selecting the table, then, in the menu: Formulas - Defined names - Create from
Selection (check top row only).

31
• Create tag variables for analytic sample definitions that will be used frequently.
Tags are binary variables that take the value 1 if the case is included in the
sample, else 0. They allow other users to understand / reproduce Pivot tables.
• Tags must be provided for multi-level or multi-record structures in a table, such as
when values for the village are repeated in separate records from male and female
focus group discussions. This avoids double counting.
• In data tables, strictly avoid in-column derived variables (subtotals, column totals,
counts of missing, etc.)
• In-row derived new variables are ok and often are necessary. Formulas may be
replaced with values (unless in simulations where they depend on parameters and
changing inputs). If the formula used are not obvious, then give them in a
comment field of the variables sheet.
• If any worksheet - data table or other - uses a considerable variety of formulas,
document them in a separate formula sheet using the formula-documenting macro
in the appendix.
• Use cell comments sparingly. Use this feature only if they remain few (you cannot
sort on comments) and do not critically affect the interpretation of any finding. If
many cells need annotation with repetitive text, consider conditional formatting
with explanations given outside the table or sheet. For numerous comments
concerning entire records, create one or several text variables for this purpose.

Descriptive statistics
The kinds of descriptive statistics that are routinely included in scientific papers are not
absolutely necessary in our context, but we may produce them, with little extra effort, as
a benefit for the user. In most assessment, it may be enough to leave the table in the Excel
workbook.

Descriptive statistics can be efficiently produced in several ways. We hint at two:

• Use SSC-Stat, a free Excel add-in offered by Reading University 7


• Calculate the statistics in the variables sheet, using the INDIRECT function

This screenshot exemplifies the process. The column vectors in the data table had been
named with their field names (such as "Total_population"). INDIRECT references these
ranges one by one from the list of variable names. The ranges are then passed to the
statistical functions such as COUNTA 8. The formulas are the same in each column.

7
http://www.reading.ac.uk/ssc/n/software/sscstat/helpfile/ht_start.htm. This tool offers a host of other
applications as well, in data manipulation, visualization and analysis. Highly recommended.
8
Or simply COUNT if we want to exclude text entries.

32
Figure 2: Use of the INDIRECT function for descriptive statistics

Producing the descriptive statistics in Pivot tables is less efficient and not feasible at all
for certain ones such as the median.

The GIS project


Project-level metadata
Properties required at this level are similar to those filled in for an Excel workbook:

• Author, title and comments need to be provided in appropriate detail.


• File names too should be informative, giving away country, assessment short title
and date.
• Data source locations must be specified.

GIS projects are path-dependent. Saving relative paths and archiving project and data
sources in a common directory / subdirectory structure makes the project more portable.

Layer-level metadata
For each layer, the name for the files encapsulating it should be similarly meaningful as
those chosen for the project. In addition, information is expected on

• the bounding coordinates (max X, max Y, min X, min Y), or failing that
• the country or region that the data pertains to
• the projection if any
• the original source/author of the data
• the date the data was produced
• the theme (admin boundaries, water, thematic surveys etc.)
• the user or distribution license.

Outlook
This note, spilling ink over thirty pages, offers guidance for a chapter which, if succinctly
written, should not exceed 4 - 6 pages. Admittedly, some of the advice pertains to
databases, which are documents undergirding, but formally separate from, the assessment
report. The reader must by now have noted our preference for keeping the data and

33
methods chapter short and for leaving in the databases those metadata elements that are
of no interest to the majority of assessment readers.

The chapter should be attractive and educating, emphasizing not only the usual
challenges and achievements, but particularly also what the assessment contributes in
terms of methodological innovation.

"Data and methods", however, must not become a condensed new version of the
substantive chapters. Of the five characteristics that the statistician Robert Abelson
(1995: 11-14) underlined in well-written quantitatively supported studies:

• Magnitude
• Articulation
• Generality
• Interestingness
• Credibility

"Data and methods" essentially contributes to the credibility of the assessment. It is up to


the substantive part of the report to persuasively present the magnitude of disaster
impacts and unmet needs, to connect the various pieces in mutually reinforcing argument,
and to clarify the extent and limits of conclusions drawn from purposive-sample data.

It is in those chapters that the reader primarily expects to find an interesting story. On this
point - interestingness - the data and methods chapter can give succor. Discipline and
originality in methods, although seemingly opposite values, both project a train of
research making important and compelling points. Well groomed, easily accessible
databases in turn make the assessment attractive for secondary use beyond its original
context and instrumental for the further development of assessment tools.

Appendix
Macro to document formulas in an Excel worksheet
For R1C1 notation
Sub ListFormulasR1C1()

' Author: John Walkenbach, http://j-walk.com/ss/excel/tips/tip37.htm


' Adapted from his A1-style by Aldo Benini 12/29/2004
Dim FormulaCells As Range, cell As Range
Dim FormulaSheet As Worksheet
Dim Row As Integer

' Create a Range object for all formula cells


On Error Resume Next
Set FormulaCells = Range("A1").SpecialCells(xlFormulas, 23)

' Exit if no formulas are found


If FormulaCells Is Nothing Then

34
MsgBox "No Formulas."
Exit Sub
End If

' Add a new worksheet


Application.ScreenUpdating = False
Set FormulaSheet = ActiveWorkbook.Worksheets.Add
FormulaSheet.Name = "Formulas in " & FormulaCells.Parent.Name

' Set up the column headings


With FormulaSheet
Range("A1") = "Address"
Range("B1") = "Formula"
Range("C1") = "Value"
Range("A1:C1").Font.Bold = True
End With

' Process each formula


Row = 2
For Each cell In FormulaCells
Application.StatusBar = Format((Row - 1) / FormulaCells.Count, "0%")
With FormulaSheet
'Cells(Row, 1) = Cell.Address _
(RowAbsolute:=False, ColumnAbsolute:=False)
'Replacing Walkenbach's A1-style with R1C1-style:
Cells(Row, 1) = cell.Address _
(ReferenceStyle:=xlR1C1)
'Cells(Row, 2) = " " & Cell.Formula
Cells(Row, 2) = " " & cell.FormulaR1C1
Cells(Row, 3) = cell.Value
Row = Row + 1
End With
Next cell

' Adjust column widths


FormulaSheet.Columns("A:C").AutoFit
Application.StatusBar = False

End Sub

References

Abelson, R. P. (1995). Statistics as Principled Argument. Hillsdale, New Jersey, USA


and Hove, UK, Lawrence Erlbaum.
ACAPS (2011a). Joint Rapid Assessment of the Northern Governorates of Yemen [9
October 2011]. Sana'a, Assessment Capacities Project (ACAPS), in collaboration
with ADRA Yemen, CARE International, Save the Children, OXFAM, and
Islamic Relief. Prepared for CARE International in Yemen.
Acaps. (2011b)."Purposive sampling and site selection in Phase 2 [Technical brief]."
Geneva, Acaps. Retrieved 30 April 2012, from
http://www.acaps.org/img/documents/purposive-sampling-and-site-selection-
purposive-sampling-and-site-selection.pdf.
Anonymous (2011). Flooding & Prolonged Water-logging in South West Bangladesh -
Coordinated Recovery Assessment [presumably December]. Dhaka
Benini, A. A. (1992). "Armed Conflict, Access to Markets and Food Crisis Warning: A
Note from Mali." Disasters 16(3): 240 - 248.

35
Davenport, T. H. and L. Prusak (1998). Working knowledge: How organizations manage
what they know. Boston, Harvard Business School Press.
Davies, R. and J. Dart. (2005)."The ‘Most Significant Change’ (MSC) Technique. A
Guide to Its Use." Trumpington. from www.mande.co.uk/docs/MSCGuide.htm.
DDI. (2011)."Data Documentation Initiative - [Homepage:] A metadata specification for
the social and behavioral sciences ", DDI Alliance. from
http://www.ddialliance.org/.
Delbecq, A. L., A. H. Van de Ven, et al. (1975). Group techniques for program planning:
A guide to nominal group and Delphi processes, Scott, Foresman Glenview, IL.
Groves, R. M., F. J. Fowler, et al. (2004). Survey methodology. Hoboken, NJ, J. Wiley.
IHSN. (2009a)."The International Household Survey Network." Washington DC, The
International Household Survey Network (IHSN) and the World Bank Data
Group. Retrieved 24 April 2012, from http://www.surveynetwork.org/home/.
IHSN. (2009b)."Microdata documentation." Washington DC, The International
Household Survey Network (IHSN) and the World Bank Data Group. Retrieved
24 April 2012, from
http://www.surveynetwork.org/home/index.php?q=tools/documentation.
Makundi, E., R. Manongi, et al. (2006). "The use of nominal group technique in
identifying community health priorities in Moshi rural district, northern
Tanzania." Tanzania Journal of Health Research 7(3): 133-141.
Miller, J. E. (2005). The Chicago Guide to Writing about Multivariate Analysis. Chicago
and London, University of Chicago Press.
OECD. (2007, 26 April 2012)."Data and Metadata Reporting and Presentation
Handbook." Paris, Organisation for Economic Co-operation and Development.
from www.sourceoecd.org/statisticssourcesmethods/9789264030329.
Olsen, R. B., L. L. Orr, et al. (2010). A Conceptual Model of Purposive Site Selection in
Impact Evaluations. Washington DC and Baltimore, Abt Associates and Johns
Hopkins University.
SAC (2003). Landmine Impact Survey, Operational Protocol P08 v 3 – Impact Scoring
and Community Classification. Washington DC, Survey Action Center.
Sana, M. and A. A. Weinreb (2008). "Insiders, Outsiders, and the Editing of Inconsistent
Survey Data." Sociological Methods & Research 36(4): 515-541.
Shadish, W. R., T. D. Cook, et al. (2002). "Experimental and quasi-experimental designs
for generalized causal inference."
UNOCHA (2010). Multi-Cluster Rapid Humanitarian Needs Assessment. Affects of
Severe Flooding on People in 4 Provinces of Pakistan: Information Collected in
the Field August 24-29, 2010 using the McRAM. Islamabad and Geneva, United
Nations Office for the Coordination of Humanitarian Affairs.
UNOCHA (2011). Kingdom of Cambodia: Evaluation of Post-Flood Needs Assessment
Data. Geneva, United Nations Office for the Coordination of Humanitarian
Affairs.
UNOCHA and CDC (2010). CDC Summary of Initial Rapid Assessment (IRA)
conducted by UN OCHA in Haiti [Draft 19 February 2010]. Port-au-Prince,

36
United Nations Office for the Coordination of Humanitarian Affairs and US
Center for Disease Control.
Weick, K. E. (1995). Sensemaking in organizations, Sage Publications, Inc.
Wikipedia. (2012a)."Metadata." Retrieved 24 April 2012, from
http://en.wikipedia.org/wiki/Metadata.
Wikipedia. (2012b)."Metadata standards." Retrieved 24 April 2012, from
http://en.wikipedia.org/wiki/Metadata_standards.

1st May 2012 / Revised 16 May 2012 / AB

37

You might also like