1 - Introduction To Impact Evaluation PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Impact Evaluation Notes

No. 1. March 2012

INTRODUCTION TO IMPACT EVALUATION

Patricia J. Rogers, RMIT University (Australia) and BetterEvaluation

This is the first guidance note in a four-part series of notes related to impact evaluation developed
by InterAction with financial support from the Rockefeller Foundation. The other notes in this series
are: Linking Monitoring & Evaluation to Impact Evaluation; Introduction to Mixed Methods in Impact
Evaluation; and Use of Impact Evaluation Results. The complete series can be found on InterActions
website at: http://www.interaction.org/impact-evaluation-notes.
ACKNOWLEDGEMENTS
This guidance note has benefited from feedback from members of InterActions Evaluation and Program
Effectiveness Working Group. Information about evaluation methods in Guidance Note 1 has been drawn
from BetterEvaluation, an international collaboration for sharing information to improve evaluation.
More information about each of the methods can be found on the website www.betterevaluation.org.

Photo: David Darg


Contents

Introduction 1

1. What Do We Mean by Impact Evaluation? 2

2. Why Should We Do Impact Evaluation? 3

3. What Questions Does Impact Evaluation Seek to Answer? 3

4. Who Should Conduct Impact Evaluation? 3

5. How Should We Choose Methods for Impact Evaluation? 5

6. Clarifying Values for an Impact Evaluation 5

7. Developing a Theory or Model of How the Intervention is Supposed to Work 6

8. Measuring or Describing Impacts (and other Important Variables) 8

9. Explaining to What Extent Observed Results Have Been Produced by the Intervention 9

10. Synthesizing Evidence 12

11. Reporting Findings and Supporting Use 13

12. When Should an Impact Evaluation Be Done? 13

13. What Is Needed for Quality Impact Evaluation? 14

14. Common Challenges in Impact Evaluation in Development 15

Summary 17

References and other Useful Resources 17

| Introduction to Impact Evaluation |


Introduction
Credible and appropriate impact evaluation can greatly improve
the effectiveness of development. The increasing emphasis on
impact evaluation in development has led to many questions.
What constitutes credible and appropriate impact evaluation? How
should impact evaluations be managed? What measures and data
sources are appropriate? How can qualitative and quantitative
data be effectively combined in impact evaluation? What should be
done to support the appropriate use of impact evaluations? What
are the implications of the increasing focus on impact evaluation
for other types of monitoring and evaluation (M&E)?

InterAction has produced a series of guidance methods, approaches and designs that can be
notes addressing these questions to support man- used for the different aspects of impact evalua-
agement, program and M&E staff in international tion: clarifying values for the evaluation, develop-
NGOs to plan, design, manage, conduct and use ing a theory of how the intervention is under-
impact evaluations. These notes can also inform stood to work, measuring or describing impacts
their discussions with external evaluators, partners and other important variables, explaining why
and funders. impacts have occurred, synthesizing results, and
reporting and supporting use. The note discusses
This first guidance note, Introduction to Impact what is considered good impact evaluation
Evaluation, provides an overview of impact evaluation that achieves a balance between the
evaluation, explaining how impact evaluation competing imperatives of being useful, rigorous,
differs from and complements other types ethical and practical and how to achieve this.
of evaluation, why impact evaluation should be Footnotes throughout the document contain ref-
done, when and by whom. It describes different erences for further reading in specific areas.

| Introduction to Impact Evaluation | | 1 |


1. What Do We Mean by Impact Evaluation? In practice, it is often helpful for an evaluation to
include both outcomes and impacts. This allows
Impact evaluation investigates the changes earlier indication of whether or not an interven-
brought about by an intervention. Impact evalu- tion is working and if it is not working, helps to
ation can be undertaken on interventions at any identify where, and perhaps why.
scale: a small, local HIV-AIDS project; an entire
civil society strengthening program of an NGO; a In this guidance note, an impact evaluation
sequence of natural resource management proj- includes any evaluation that systematically and
ects undertaken in a geographic area; or a collec- empirically investigates the impacts produced by
tion of concurrent activities by different organiza- an intervention. Some individuals and organiza-
tions aimed at improving a communitys capacity. tions use a narrower definition of impact evalu-
ation, and only include evaluations containing a
The expected results of an intervention are an counterfactual of some kind (an estimate of what
important part of an impact evaluation, but it is would have happened if the intervention had not
important to also investigate unexpected results. occurred) or a particular sort of counterfactual
In this guidance note, impacts are defined as: (for example, comparisons with a group who did
not receive the intervention). USAID, for example,
the positive and negative, intended and uses the following definition: Impact evaluations
unintended, direct and indirect, primary measure the change in a development outcome
and secondary effects produced by an that is attributable to a defined intervention; im-
intervention. (OECD Development pact evaluations are based on models of cause and
1
Assistance Committee definition) effect and require a credible and rigorously defined
counterfactual to control for factors other than the
Impacts are usually understood to occur later than, intervention that might account for the observed
and as a result of, intermediate outcomes. For change. These different definitions are important
example, achieving the intermediate outcomes of when deciding what methods or research designs
improved access to land and increased levels of will be considered credible by the intended users of
participation in community decision-making might the evaluation or by partners or funders.
occur before, and contribute to, the intended final
impact of improved health and well-being for Impact evaluation is, of course, not the only type of
women. The distinction between outcomes and evaluation that supports effective development. It
impacts can be relative, and depends on the stated is important to ensure that investments in impact
objectives of an intervention. evaluation (in terms of time and money) are not
made at the expense of monitoring or other types
of evaluation such as needs assessment, process
1 Impacts are sometimes defined quite differently. For example, evaluation and cost-benefit evaluation that are
the W.K.Kellogg Foundation Logic Model Development Guide also needed to inform decisions about practice
(www.wkkf.org/knowledge-center/resources/2006/02/WK-Kellogg-
Foundation-Logic-Model-Development-Guide.aspx) distinguishes and policy. Guidance Note 2 discusses how impact
impact in terms of its spread beyond those immediately involved evaluation and these other types of monitoring
in the program. Specific changes in program participants behavior,
knowledge, skills, status and level of functioning are referred to as and evaluation can be done in ways that support
outcomes, and only changes to organizations, communities or
systems as a result of program activities within seven to 10 years
each other. For example, monitoring data can
are described as impacts.

| Introduction to Impact Evaluation | | 2 |


provide a good foundation for impact evaluation, Guidance Note 4 discusses in more detail how
and an impact evaluation can guide the develop- to support these different ways of using impact
ment of monitoring systems. Impact evaluation evaluation.
provides necessary information for cost-benefit
and cost-effectiveness evaluations. 3. What Questions Does Impact Evaluation
Seek to Answer?
2. Why Should We Do Impact Evaluation?
An impact evaluation should focus on a small
The best way to undertake a particular impact number (five to seven) of specific key evaluation
evaluation depends in part on its purpose and who questions. These are the high-level questions that
its primary intended users are. Some common an evaluation addresses, not specific questions
reasons for doing impact evaluation include: that might be asked in an interview or a question-
naire. It is better to focus on a small number of
To decide whether to fund an interven- questions directly related to the purpose than to
tion ex-ante evaluation is conducted spread evaluation resources, and users focus,
before an intervention is implemented, across a large number of questions. (See box on
to estimate its likely impacts and inform p. 4 for examples of key evaluation questions for
funding decisions. impact evaluation.)

To decide whether or not to continue or 4. Who Should Conduct Impact Evaluation?


expand an intervention.
Impact evaluation can be undertaken by: an exter-
To learn how to replicate or scale up a nal evaluator or evaluation team; an internal but
pilot. separate unit of the implementing organization;
those involved in an intervention (including com-
To learn how to successfully adapt a munity members); or a combined team of internal
successful intervention to suit another and external evaluators.
context.
An external evaluator can bring a range of expertise
To reassure funders, including donors and experience that might not be available within
and taxpayers (upward accountability), the organization, and may have more indepen-
that money is being wisely invested dence and credibility than an internal evaluator. For
including that the organization is learning example, the USAID Evaluation Policy sets out an
what does and doesnt work, and is using expectation that most evaluations will be done by
this information to improve future imple- an external evaluator.
mentation and investment decisions.
However, for some stakeholders, external evalua-
To inform intended beneficiaries and tors are not always perceived as unbiased, as their
communities (downward accountability) data gathering and interpretations may be affected
about whether or not, and in what ways, by their lack of familiarity with the context. In some
a program is benefiting the community. cases, involving program stakeholders and/or

| Introduction to Impact Evaluation | | 3 |


Examples of key evaluation questions for impact evaluation
Overall impact What helped or hindered [the intervention] to
Did it work? Did [the intervention] produce achieve these impacts?
[the intended impacts] in the short, medium
and long term? How it works
How did [the intervention] contribute to
For whom, in what ways and in what circum- [intended impacts]?
stances did [the intervention] work?
What were the particular features of [the
What unintended impacts (positive and intervention] that made a difference?
negative) did [the intervention] produce?
What variations were there in
Nature of impacts and their distribution implementation?
Are impacts likely to be sustainable? What has been the quality of implementation
Did these impacts reach all intended in different sites?
beneficiaries? To what extent are differences in impact ex-
Influence of other factors on the impacts plained by variations in implementation?
How did [the intervention] work in conjunc-
tion with other interventions, programs or
Match of intended impacts to needs
services to achieve outcomes? To what extent did the impacts match the needs of
the intended beneficiaries?

community members in conducting an evaluation An evaluation can be managed by an internal


can add rigor and credibility by supporting bet- group (perhaps an internal steering committee,
ter access to data (especially key informants) and informed by an advisory group with external mem-
more appropriate interpretation of the data. bership) or by a combined group. Participatory ap-
proaches to managing evaluations typically involve
Three practices in particular can often produce program staff, community members and develop-
the best quality evaluation: establishing a team of ment partners. They participate not only in collect-
evaluators with external and internal perspectives; ing data, but also in negotiating the purpose of the
ensuring transparency in terms of what data are impact evaluation, developing the key evaluation
being used and how in the evaluation; and triangu- questions, designing an evaluation to answer them
2
lation using multiple sources of evidence (which and following through on the results.
have complementary strengths) and multiple
perspectives in analysis and interpretation. It is
2 Additional sources on participatory methods include: Marisol
especially useful to include local evaluation experts Estrella et al. (eds), Learning from Change: Issues and Experiences
on the team who know the context, history and in Participatory Monitoring and Evaluation (Brighton: Institute of
Development Studies, 2000), http://www.idrc.ca/EN/Resources/
comparative interventions by other agencies. Publications/Pages/IDRCBookDetails.aspx?PublicationID=348;
Andrew Catley et al., Participatory Impact Assessment, Feinstein
International Center, Tufts University: October 2008, http://sites.
tufts.edu/feinstein/2008/participatory-impact-assessment; Robert
Chambers,Who Counts? The Quiet Revolution of Participation
and Numbers, Working Paper No. 296 (December 2007), Brigh-
ton: Institute of Development Studies, http://www.ids.ac.uk/files/
Wp296.pdf.

| Introduction to Impact Evaluation | | 4 |


5. How Should We Choose Methods for Impact Synthesizing evidence into an overall evalua-
Evaluation? tive judgment.
Reporting findings and supporting their use.
There has been considerable debate in develop-
ment evaluation, and more broadly, about which This guidance note discusses each of these
methods are best for impact evaluation. These dis- aspects and provides information about a range
cussions reflect different views on what constitutes of methods that can be used for them. Links to
credible, rigorous and useful evidence, and who additional sources of information are provided.
ought to be involved in conducting and controlling Guidance Note 3 discusses how a mixed method
evaluations. approach, combining quantitative and qualitative
data in complementary ways, can be both mea-
Some organizations and evaluators have argued surement/description and explanation.
that particular methods or research designs should
be used wherever possible for example, random- 6. Clarifying Values for an Impact Evaluation
ized controlled trials or participatory methods.
Others have argued for situational appropriateness. The first step is to clarify the values that will under-
This means choosing methods that suit the pur- pin the evaluation. Impact evaluation draws con-
pose of the evaluation, the types of evaluation ques- clusions about the degree of success (or failure)
tions being asked, the availability of resources, and of an intervention, so it is important to clarify what
the nature of the intervention in particular whether success looks like in terms of:
it is standardized or adaptive, and whether interven-
tions work pretty much the same everywhere and for Achieving desirable impacts and avoiding (or
everyone or are greatly affected by context. at least minimizing) negative impacts. For ex-
ample, will the success of a road development
When choosing methods, it is important to ad- project be judged in terms of increased access
dress each of six different aspects of an impact to markets, or improved access to maternity
evaluation: hospitals? What level of loss of habitat and
biodiversity would be considered a reasonable
Clarifying the values that will underpin the cost for the road? What level would not be an
evaluation what will be considered desir- acceptable trade-off?
able and undesirable processes, impacts and Achieving a desirable distribution of ben-
distribution of costs and benefits? efits. For example, should we judge success
Developing and/or testing a theory of how in terms of the average educational outcome,
the intervention is supposed to work these improvements for the most disadvantaged, or
are sometimes referred to as theories of bringing a vulnerable or disadvantaged group
change, logic models or program theory. (like young girls) up to the same level as their
Measuring or describing these impacts and more advantaged counterparts?
other relevant variables, including processes
and context. Formal stated goals (including the Millennium
Explaining whether the intervention was the Development Goals) and organizational policies
cause of observed impacts. are an important start to clarifying values, but

| Introduction to Impact Evaluation | | 5 |


are by themselves usually not sufficient. Different Sticky dot voting in a face to face meet-
stakeholders may well have different views about ing, individuals allocate their multiple
which values should be used in an evaluation. votes (in the form of sticky dots) across
options (NRCOI Quick Tip).
Some methods for clarifying the values for an
impact evaluation: 7. Developing a Theory or Model of How the
Intervention is Supposed to Work
Methods that help people articulate tacit
values It is often helpful to base an impact evaluation on
a theory or model of how the intervention is under-
Appreciative inquiry key stakeholders stood to produce its intended impacts. This might
(including program staff) recall times be called a program theory, a theory of change
when the program worked particularly (ToC), a results chain or a logic model. It is best to
well, then identify the values it exemplified develop the theory of change as part of planning
during those times (Using Appreciative an intervention, and then to review it and revise it
Inquiry in Evaluation Practice). as necessary while planning an impact evaluation.
If this has not been done by the time the interven-
Community surveys individuals in the tion starts, it is possible to retroactively develop an
community either nominate or rate the agreed theory of change.
issues that they see as most important to
address. Depending on when the theory of change is de-
veloped, it can draw on a combination of sources:
Most significant change a structured official documents and stated objectives; research
process for generating and selecting sto- into similar interventions; observations of the
ries of change that identify what different intervention or similar interventions; or asking dif-
individuals and groups see as the most ferent stakeholders (including planners, staff and
important outcomes or impacts. (Most intended beneficiaries) how they think it works (or
Significant Change) should work).

Methods that help negotiate between dif- There can be multiple theories of change differ-
ferent sets of values ent theories showing how the intervention works at
different stages, in different contexts (acknowledg-
Delphi process that works through a ing effects of external influences) and for different
series of written interactions without face- impacts; and different theories that are developed
to-face context, where key stakeholders over time as better understanding develops.
provide their opinions about what they
see as important, then respond to the ag- Theories of change can improve impact evaluation
gregated results (Delphi Method | Delphi by helping to:
Method: Techniques and Applications |
Delphi Survey - Europa). Identify intermediate outcomes or impacts
that can be observed within the time frame of

| Introduction to Impact Evaluation | | 6 |


the evaluation, and that are precursors to the Results chain the intervention is repre-
longer-term impacts that the intervention is sented as a series of boxes in a sequence:
intended to produce. inputs, activities, outputs, short-term
Identify, if an intervention was unsuccessful, outcomes, longer-term outcomes and
where in the process it stopped working or impacts. (Results Chain: Enhancing
broke down. Program Performance with Logic Models
Distinguish between implementation failure Guide |W.K. Kellogg Foundation Logic
(where impacts have not been achieved be- Model Development Guide)
cause the intervention has not been properly
implemented) and theory failure (where the Outcomes chain/outcomes hierarchy/the-
intervention does not lead to desired impact ory of change the theory is represented
even when implemented well). as a series of intermediate outcomes
Identify what aspects of the intervention make leading to the final intended impacts. This
it work, and are therefore critical and need to format focuses attention on how change
be continued when an intervention is adapted comes about, and is helpful for represent-
for other settings. ing programs where different activities
Identify important behavioral and contextual occur along the causal chain, not just up
variables that should be addressed in data col- front. (Theory of change and logic model:
lection, analysis and reporting to understand Telling them apart)
variations in impacts.
Provide a conceptual framework for bring- Outcome mapping this focuses on
ing together diverse evidence about a pro- identifying the boundary partners
gram involving a large number of diverse organizations or groups whose actions
interventions. are beyond the control of the interven-
tion, but are essential for the impact to
Some methods for representing a theory of be achieved and then articulating what
change: these partners need to do and how the
intervention can seek to influence them.
Logical framework approach (logframe) (Outcome Mapping | Outcome Mapping:
the classic format used in many de- ILAC Brief 7)
velopment organizations, which uses a
4x4 matrix. The four rows are activities, Other useful resources for developing a theory of
outputs, purpose and goal, and the four change can be found at Developing a Logic Model
3
columns are a narrative description, objec- or Theory of Change.
tively verifiable indicators (OVIs), means of
verification (MoV) and assumptions. (The A theory of change can also be used to manage
Logical Framework Approach | Logical potential negative impacts, or to plan an impact
Framework Analysis | Beyond Logframe: evaluation that measures them.
Critique, Variations and Alternatives)
3 The Community Toolbox, Developing a Logic Model or Theory
of Change, http://ctb.ku.edu/en/tablecontents/sub_section_
main_1877.aspx.

| Introduction to Impact Evaluation | | 7 |


For example, a program meant to improve agricul- direct measurement (for example, of water quality
tural productivity by encouraging farmers to apply against an international standard).
fertilizer to their fields might lead to increased
phosphate runoff and environmental damage Descriptions of impacts should not only report
to waterways. A balanced impact evaluation will the average, but also how varied the results were,
investigate this possible impact in addition to the and in particular report on patterns. Howard White
intended impact of improved productivity. A theory discusses the importance of looking at heterogene-
of change can be constructed to examine how an ity in his 2010 article:
intervention might produce negative impacts. This
can be used to adapt the intervention in order A study which presents a single impact
4
to minimize or avoid such negative impacts, to estimate (the average treatment effect ) is
develop early warning indicators for monitoring likely to be of less use to policy makers than
purposes, and to ensure that these are included in one examining in which context interven-
the impact evaluation plan. tions are more effective, which target
groups benefit most, and what environ-
8. Measuring or Describing Impacts (and other mental settings are useful or detrimental to
Important Variables) achieving impact. Hence it could be shown
that an educational intervention, such as
An impact evaluation needs credible evidence, flip charts, works but only if teachers have
and not only about impacts. Good information is a certain level of education themselves, or
also needed about how well an intervention has only if the school is already well equipped
been implemented in order to distinguish be- with reading materials, or the pupils par-
5
tween implementation failure and theory failure. ents are themselves educated.
Information is also needed about the context to
understand if an intervention only works in particu- Some sources for measures and indicators in par-
lar situations. ticular sectors include:

It is useful to identify any data already available Catalog of survey questionnaires:


about impacts, implementation and context from International Household Survey
existing sources, such as official statistics, program Network over 2,000 questionnaires
documentation, Geographic Information Systems that can be searched by country, date and
(GIS), and previous evaluation and research survey types.
projects. Additional data can be gathered to fill in
gaps or improve the quality of existing data using Democratic governance UNDP Oslo
methods such as interviews (individual and group; Governance Centre.
structured, semi-structured or unstructured), ques-
tionnaires (including web-based questionnaires
4 The average treatment effect is an estimate of the average differ-
and collecting data by cell phone), observation ence an intervention makes. For example, students in the program
(structured, semi-structured or unstructured) and stayed in school an average of 2.5 years longer (compared to the
control group).

5 Howard White, A Contribution to Current Debates in Impact


Evaluation, Evaluation (April 2010, vol. 16no. 2): 160.

| Introduction to Impact Evaluation | | 8 |


Human Poverty Index three indicators only cause), but to partial attribution or to analyzing
that relate to survival, knowledge and the interventions contribution. This is sometimes
economic provisioning UNDP. referred to as plausible contributions.

Millennium Development Goals 48 For example, in agricultural research, impacts in


technical indicators and 18 targets for the terms of improved productivity can be due to a
8 goals. long chain of basic and applied research, product
development and communication. An investment
Sustainable Development UN in any one of these might reasonably claim that it
Commission on Sustainable Development was essential in producing the impacts, but would
130 indicators of social, economic, en- not have been able to do so without the other
vironmental and institutional aspects of interventions. In other words, it could have been a
sustainable development. necessary intervention but not sufficient to bring
about that impact by itself.
World Development Indicators The
World Bank has data on more than 200 It can be helpful to investigate causal attribu-
countries in terms of more than 1000 tion or plausible contribution in terms of three
indicators. components. The starting point is the factual to
compare the actual results to those expected if the
Guidance Note 3 provides more detail on specific theory of change were true. When, where and for
methods for measuring or describing impacts, the whom did the impacts occur? Are these results
use of mixed methods (quantitative and qualitative consistent with the theory that the intervention
data used in complementary ways), and ways of caused or contributed to the results? The second
addressing challenges in measuring or describing component is the counterfactual an estimate of
impacts. what would have happened in the absence of the
intervention. The third component is to inves-
9. Explaining to What Extent Observed Results tigate and rule out alternative explanations. In
Have Been Produced by the Intervention some cases, it will be possible to include all three
components in an impact evaluation. In complex
One of the important features of an impact evalu- situations, it might not be possible to estimate a
ation is that it does not just gather evidence that counterfactual, and causal analysis will need to
impacts have occurred, but tries to understand the depend on the other components.
interventions role in producing them. It is rarely
the case that an intervention is the sole cause of Possible methods for examining the factual (the
changes. Usually, an intervention works in combi- extent to which actual results match what was
nation with other programs, a favorable context or expected):
other factors. Often a group collaborates to produce
a joint impact, such as when international NGOs Comparative case studies did the
partner with local governments and communities. intervention produce results only in cases
Therefore, causal attribution does not usually refer when the other necessary elements were
to total attribution (that is, the intervention was the in place?

| Introduction to Impact Evaluation | | 9 |


Dose-response were there better out- before and after the intervention, as there
comes for participants who received more is no credible reason that the time taken
of the intervention (for example, attended would have decreased without the inter-
more of the workshops or received more vention (White, 2007). Process tracing can
support)? support this analysis at each step of the
theory of change. (Process Tracing in Case
Beneficiary/expert attribution did Study Research)
participants/key informants believe the
intervention had made a difference, and Matched comparisons Participants
could they provide a plausible explanation (individuals, organizations or communi-
of why this was the case? ties) are each matched with a nonpartici-
pant on variables that are thought to be
Predictions did those participants or relevant. It can be difficult to adequately
sites predicted to achieve the best impacts match on all relevant criteria. (Techniques
(because of the quality of implementation for improving constructed matched com-
and/or favorable context) do so? How can parison group impact/outcome evaluation
anomalies be explained? designs)

Temporality did the impacts occur Multiple baselines or rolling baselines


at a time consistent with the theory of The implementation of an intervention is
change not before the intervention was staggered across time and intervention
implemented? populations. Analysis looks for a repeated
pattern in each community of a change in
Possible methods for examining the counterfac- the measured outcome after the inter-
tual (an estimate of what would have happened in vention is implemented, along with an
the absence of the intervention) include: absence of substantial fluctuations in the
data at other time points. It is increasingly
Difference-in-difference The before-and- used for population-level health interven-
after difference for the group receiving the tions. (The Multiple Baseline Design for
intervention (where they have not been Evaluating Population-Based Research)
randomly assigned) is compared to the
before-after difference for those who did Propensity scores this technique statisti-
not. (Difference-in-Differences) cally creates comparable groups based on
an analysis of the factors that influenced
Logically constructed counterfactual In peoples propensity to participate in the
some cases it is credible to use the base- program it is particularly useful when
line as an estimate of the counterfactual. participation is voluntary (for example,
For example, where a water pump has watching a television show with health
been installed, it might be reasonable to promotion messages). (Propensity Scores:
measure the impact by comparing time What, How, Why | A Practical Guide to
spent getting water from a distant pump Propensity Score Models)

| Introduction to Impact Evaluation | | 10 |


Randomized controlled trial (RCT) be sure about what would have happened if the in-
Potential participants (or communities, tervention had not been implemented. For exam-
or households) are randomly assigned to ple, it might be possible to show that the develop-
receive the intervention or be in a control ment of community infrastructure for raising fish
group (either no intervention or the usual to be consumed or sold was directly due to a local
intervention) and the average results project, without being able to confidently state that
of the different groups are compared. this would not have happened in the absence of
(Using Randomization in Developmental the project (perhaps through an alternative project
Economics Research) being implemented by another organization).
What an impact evaluation can focus on is the
Regression discontinuity Where an in- other two elements of causal analysis the factual
tervention is only available to participants and ruling out alternatives.
above or below a particular cutoff point
(for example, income), this approach The third component of understanding causal link-
compares outcomes of individuals just be- ages is to investigate and rule out alternative expla-
low the cutoff point with those just above nations. Apparent impacts (or lack thereof) might
the cutoff point. (Impact Evaluation: reflect methodological issues such as selection
Regression Discontinuity) bias (where participants are systematically different
from nonparticipants), and contamination effects
Statistically created counterfactual A (where nonparticipants benefit from the interven-
statistical model, such as a regression tion as well, reducing the difference between them
analysis, is used to develop an estimate of and participants in terms of impacts). They might
what would have happened in the absence reflect the influence of other factors, including
of an intervention. This can be used when other interventions or population movements
the intervention is already at scale for ex- between areas assigned to receive an intervention
ample, an impact evaluation of the privati- and those without one.
zation of national water supply services.
Possible methods for identifying and ruling out
Developing a credible counterfactual can be alternative possible explanations include:
difficult in practice. It is often difficult to match
individuals or communities on the variables that General elimination methodology pos-
really make a difference. Randomized controlled sible alternative explanations are identified
trials can randomly create nonequivalent groups. and then investigated to see if they can be
Other methods depend on various assumptions ruled out. (Can We Infer Causation from
which might not be met. In situations of rapid and Cross-Sectional Data?)
unpredictable change, it might not be possible to
construct a credible counterfactual. It might be Searching for disconfirming evidence/
6
possible to build a strong, empirical case that an Following up exceptions
intervention produced certain impacts, but not to
6 Further reading: Matthew B. Miles and A. Michael Huberman,
Qualitative Data Analysis: An Expanded Sourcebook. 2nd Edition
(Thousand Oaks, California: Sage Publications, 1994).

| Introduction to Impact Evaluation | | 11 |


identifying challenges to the theory in-
An evaluation of the impact of legislation for
cluding gaps in evidence and contested
compulsory bicycle helmets found that there
causal links and iteratively collecting
had been a significant decline in the number
additional evidence to address these.
of head injuries among cyclists. While this was
Guidance Note 2 provides some addition-
consistent with the theory of change, an alter-
al information on contribution analysis.
native explanation was that the overall level of
(Contribution Analysis: ILAC Guide Brief
injuries had declined due to increased build-
16 | Contribution Analysis)
ing of bicycle lanes during the same period.
Examination of serious injuries showed that,
Collaborative outcomes reporting this
while the level of head injuries had declined
new approach combines contribution
in this period, the number of other types of
analysis and MLLE. It maps existing data
injuries had remained stable, supporting the
against the theory of change and fills in
theory that it was the helmets that had pro-
important gaps in the evidence through
duced the change. (Walter et al., 2011)
targeted additional data collection. Then a
combination of expert review and commu-
Some approaches that combine these different nity consultation is used to check the evi-
elements of explanation include: dences credibility regarding what impacts
have occurred and the extent to which
Multiple lines and levels of evidence these can be realistically attributed to the
(MLLE) a wide range of evidence from intervention. (Collaborative Outcomes
different sources is reviewed by a panel Reporting Technique)
of credible experts spanning a range of
relevant disciplines. The panel identifies An evaluation of a cross-government executive
consistency with the theory of change development programs impact could not use
while also identifying and explaining a randomized control group, because random-
exceptions. MLLE reviews the evidence for ly assigning people to be in a control group
a causal relationship between an interven- or even participate in the program was
tion and observed impacts in terms of its impossible. Neither could the evaluation use a
strength, consistency, specificity, tem- comparison group, because the nature of the
porality, coherence with other accepted program was such that those accepted into
evidence, plausibility, and analogy with it were systematically different to those who
7
similar interventions. were not. Instead, the evaluation used other
strategies for causal explanation, including
Contribution analysis a system- attribution by beneficiaries, temporality and
atic approach that involves developing a specificity (changes were in the specific areas
theory of change, mapping existing data, addressed by the program). (Davidson, 2006)

7 Further reading: Patricia Rogers, Matching Impact Evaluation


Design to the Nature of the Intervention and the Purpose of the
Evaluation, Journal of Development Effectiveness, 1 (2009): 217-
226. Working paper version available at: http://www.3ieimpact.org/
admin/pdfs_papers/50.pdf.

| Introduction to Impact Evaluation | | 12 |


10. Synthesizing Evidence The quality of evaluation reports can be enhanced
by appropriate stakeholder involvement. Even
It is rare to base the overall evaluative judgment of where an evaluation is being undertaken by an in-
an intervention on a single performance measure. dependent external evaluator, stakeholders can be
It usually requires synthesizing evidence about involved by providing formal responses to findings
performance across different dimensions. and commenting on the data and how they have
been interpreted.
A common way to do this is to develop a weighted
scale, where a percentage of the overall perfor- Where recommendations are included in evalua-
mance rating is based on each dimension. However, tion reports, they need to be supported by evi-
a numeric weighted scale often has problems, dence from the evaluation findings and about the
including arbitrary weights and lack of attention to feasibility and appropriateness of the recommen-
essential elements. (The Synthesis Problem) dations. Involving relevant stakeholders in devel-
oping the recommendations can not only improve
An alternative is to develop an agreed global as- the recommendations feasibility, but can also
sessment scale (or rubric) with intended users that lead the stakeholders to both own and commit to
can then be used to synthesize evidence transpar- implementing them.
ently. The scale includes a label for each point (for
example, unsuccessful, somewhat successful, The use of impact evaluation reports can be
very successful) and a description of what each enhanced by creative reporting formats, verbal
of these looks like. (The Rubric Revolution) presentations, opportunities to engage with others
discussing the implications of impact evaluations,
11. Reporting Findings and Supporting Use and by ensuring the reports remain accessible to
potential users.
The format of the evaluation report should be
agreed on when the impact evaluation is being Guidance Note 4 addresses the need to communi-
planned. Some organizations have standard report cate the findings well to intended audiences.
formats, including length requirements, that must
be followed. In other cases, it is important to agree 12. When Should an Impact Evaluation Be Done?
on a skeleton report of headings and subhead-
ings well before the report is written. Impact evaluations should be undertaken when
there is both a clear need and intent to use the
Impact evaluation reports are most accessible findings. If all interventions were required to have
when they are organized around the key evaluation an impact evaluation, evaluators would risk either
questions, rather than reporting separately on the requiring an excess of resources, or spreading
8
data from different components of data collection. those resources so thin as to make evaluations
superficial. A more effective strategy is to focus
impact evaluation resources on interventions
8 E. Jane Davidson, Improving Evaluation Questions and
Answers: Getting Actionable Answers for Real-World Decision where they are likely to be most useful:
Makers (demonstration session at the 2009 Conference of the
American Evaluation Association, Orlando, FL, November 18,
2009), http://comm.eval.org/resources/viewdocument/?Documen
tKey=e5bac388-f1e6-45ab-9e78-10e60cea0666.

| Introduction to Impact Evaluation | | 13 |


9
Innovative interventions and pilot programs accuracy, ethics, practicality and accountability.
that, if proven successful, can be scaled up or These five standards are often in tension for
replicated. example, a more comprehensive impact evaluation
Interventions where there is not a good under- that will be more accurate might not be practical in
standing of their impacts, and better evidence terms of available resources, might be too intru-
is needed to inform decisions about whether sive in the data collected, or might take too long to
to continue funding them or to redirect fund- complete for it to inform key decisions about the
ing to other interventions. future of the intervention.
Periodic evaluations of the impact of a portfo-
lio of interventions in a sector or a region to Utility good impact evaluation is useful. The
guide policy, future intervention design and likely utility of an evaluation can be enhanced by
funding decisions. planning how it will be used from the beginning,
Interventions with a higher risk profile, such as including linking it to organizational decision-
a large investment (currently or in the future), making processes and timing, being clear about
high potential for significant negative impacts why it is being done and who will use it, engaging
or sensitive policy issues. key stakeholders in the process, and then choosing
Interventions where there is a need for stake- designs and methods to achieve this purpose.
holders to better understand each others
contributions and perspectives. Accuracy good impact evaluation is rigorous. It
pays attention to all important impacts, noticing
The timing of an impact evaluation is important. if any are unintended. It pays attention to the dis-
If it is done too soon, there may be insufficient tribution of impacts, noticing if only some people
evidence of impacts having occurred or being benefit, and who those people are. Accuracy
sustained. If it is done too late, it can be difficult to requires the use of appropriate evidence, includ-
follow up with participants and too late to influ- ing quantitative and qualitative data, appropriate
ence decisions about the future direction of the interpretation, and transparency about the data
intervention. In any case, it is better to plan the sources that have been used and their limitations.
impact evaluation where possible from the begin- Strategies to achieve accuracy include systems
ning of the intervention. This allows for evidence for checking the quality of the data at the point of
to be gathered throughout the intervention, includ- collection and during processing, and that findings
ing baseline data, and allows the option of using have been reported fairly, comprehensively and
methods like randomized controlled trials, which clearly.
require creation of a randomly allocated control
group from the beginning of implementation. Propriety (ethics) ethical issues need to be ad-
equately addressed including confidentiality and
13. What Is Needed for Quality Impact anonymity, as well as potential harmful effects of be-
Evaluation? ing involved in the evaluation. Some ethical issues,

It can be helpful to think about quality evaluation


9 Joint Committee Standards for Educational Evaluation http://
in terms of five competing imperatives: utility, www.jcsee.org/program-evaluation-standards/program-evaluation-
standards-statements. These were originally developed for educa-
tional evaluation but are often widely used more broadly.

| Introduction to Impact Evaluation | | 14 |


such as the need to honor promises made about 5. Take into account the public interest and
privacy and confidentiality, are common across dif- good, going beyond analysis of particu-
ferent types of evaluations and research. There are lar stakeholder interests to consider the
other issues that are particular to impact evaluation. welfare of society as a whole.
Concerns are sometimes raised about the ethics of
using an RCT design, as it involves withholding an Formal approval by the appropriate institutional re-
intervention from some people (the control group). view board is usually needed to undertake an impact
There is less ethical concern when access to the evaluation. Applications for approval need to follow
intervention is going to be rationed in any case, and the prescribed format and address issues of benefi-
can be addressed by allocating the control group cence, justice, and respect. (Evaluation Consent and
to a queue so they do receive the intervention after the Institutional Review Board Process)
the evaluation of the first phase has finished (if it is
shown to be effective). However, this strategy is only Practicality impact evaluations need to be practi-
feasible when the impacts (or credible predictors of cal. They must take into account the resources that
them) will be evident early, and when the interven- are available (time, money, expertise and existing
tion will still be relevant for the control group by the data) and when evaluation results are needed to
time the evaluation has ended. inform decisions. Partnering with one or more
evaluation professionals, research organizations,
There are also potential ethical issues in terms of universities and civil society organizations can
whose interests are served by an evaluation. The leverage the necessary resources.
American Evaluation Association discusses this in
its Guiding Principles in terms of Responsibilities Accountability evaluations need to make clear
for General and Public Welfare: the evidence and criteria on which conclusions
have been drawn, and acknowledge their limita-
Evaluators articulate and take into account tions. Transparency about data sources is impor-
the diversity of general and public interests tant, including showing which sources have been
and values, and thus should: used for which evaluation questions. A formal
process of meta-evaluation having your own
1. Include relevant perspectives and interests evaluation evaluated by approving an evaluation
of the full range of stakeholders. plan and then an evaluation report by an ex-
2. Consider not only immediate operations pert reviewer or a committee of individuals with
and outcomes of the evaluation, but also respected integrity and independence, can improve
the broad assumptions, implications and the accountability of an impact evaluation.
potential side effects.
3. Allow stakeholders access to, and actively 14. Common Challenges in Impact Evaluation in
disseminate, evaluative information, and Development
present evaluation results in understand-
able forms that respect people and honor A number of common challenges for development
promises of confidentiality. evaluation are described below, along with some
4. Maintain a balance between client and suggestions for addressing them.
other stakeholder needs and interests.

| Introduction to Impact Evaluation | | 15 |


Variation in implementation and environment range of projects, and yet an overall evalu-
across different sites ation of the impact of the whole program
An intervention may have been imple- is needed. It can be helpful to develop an
mented in quite different ways to suit overall theory of change for the program,
the different contexts in different country bringing together different components.
offices around the world, or in differ- Sometimes it is possible to do this in the
ent geographic areas within a country. It planning stage, but, especially where proj-
can be useful to compare the theories of ects or components have varied over time,
change for each site. In particular, identify this might need to be done retroactively.
whether different sites are using the same
theory about how change happens (e.g., Long time scales
by increasing peoples knowledge about Often the intended impacts will not be evi-
their entitlements to services) but different dent for many years, but evidence is need-
action theories (e.g., printed brochures vs. ed to inform decisions before then (e.g.,
community theater), or whether they are on whether or not to launch a subsequent
using different change theories altogether phase or replicate the model elsewhere). A
(e.g., increasing peoples knowledge about theory of change can identify intermediate
their entitlements to services in one site outcomes that might be evident in the life
vs. reducing barriers to service access of an evaluation. In some cases, research
such as user fees through advocacy in evidence can be used to fill in later links,
another). and estimate likely impacts given the
achievement of intermediate outcomes.
Heterogeneous impacts Consideration should also be given to
Development interventions often only the expected trajectory of change when
work well for some people, and may be impacts are likely to be evident. (Michael
ineffective or even harmful for some Woolcock on The Importance of Time
other people. In addition, the success and Trajectories in Understanding Project
of an intervention in terms of achieving Effectiveness)
desirable impacts is often affected by the
quality of implementation. It is therefore Influence of other programs and factors
important to not only calculate and report The impacts of development interven-
on the average effect but to also check for tions are heavily influenced by the activi-
differential effects. This requires gathering ties of other programs and other contex-
evidence where possible about the quality tual factors that might support or prevent
of implementation and data about con- impacts being achieved. For example,
textual factors that might affect impacts, cash transfers that are conditional on
including participant characteristics and school attendance will only lead to im-
the implementation environment. proved student achievement in situations
where schools are teaching students
Diverse components adequately. It is possible to identify these
A program might encompass a diverse other programs and contextual factors as

| Introduction to Impact Evaluation | | 16 |


part of developing a theory of change, to Summary
gather evidence about them and to look
for patterns in the data. An impact evaluation should begin with a plan
that clarifies its intended purposes, including
Resource constraints identifying intended users, the key evaluation
Existing evidence (in the form of program questions it is intended to answer, and how it will
documentation, baseline data and official address the six components of impact evaluation
statistics) may have gaps, and there may clarifying values, developing a theory of change,
be few resources (in terms of funding, measuring or describing important variables,
staff time or access to specialist technical explaining what has produced the impacts,
expertise) to collect the types of evidence synthesizing evidence, and reporting and sup-
needed for quality impact evaluation. porting use. Having this plan reviewed (including
For a specific evaluation, when exist- by intended users) will increase the likelihood of
ing evidence is scarce and there are few producing a high quality impact evaluation that is
resources to gather additional evidence, actually used.
key informant interviews from diverse
informants may provide sufficient data, References and other Useful Resources
including reconstructing baseline data.
Planning ahead for impact evaluation can Alton-Lee, A. (2003) Quality Teaching for Diverse
reduce resource constraints by building Students in Schooling: Best Evidence Synthesis.
in sufficient resources at the design and Wellington, New Zealand: Ministry of Education.
budgeting stage, and/or strategically http://www.educationcounts.govt.nz/publica-
allocating evaluation resources across tions/series/2515/5959 An example of synthesiz-
interventions so that they are concentrat- ing evidence from diverse sources to understand
ed more on a smaller number of more what works for whom.
comprehensive evaluations of strategi-
cally important interventions. Catley A., Burns, J., Abebe, D., Sufi, O. Participatory
Impact Assessment: A Guide for Practitioners.
Boston: Tufts University. http://www.prevention-
web.net/english/professional/publications/v.
php?id=9679

Chambers R. (2007) Who Counts? The Quiet


Revolution of Participation and Numbers
Working Paper No. 296, Brighton: Institute of
Development Studies. http://www.ids.ac.uk/
files/Wp296.pdf

Davidson, E. J. (2006) Causal Inference Nuts


and Bolts. Demonstration session at the
2006 Conference of the American Evaluation

| Introduction to Impact Evaluation | | 17 |


Association, Portland, Ore., Nov. 2006 http:// Paz R., Dorward A., Douthwaite B. (2006).
realevaluation.com/pres/causation-anzea09.pdf. Methodological Guide for Evaluation of
Pro-Poor Impact of Small-Scale Agricultural
Davidson, E. J. (2009) Improving Evaluation Projects. Centre for Development and Poverty
Questions and Answers: Getting Actionable Reduction. Imperial College, London. http://
Answers for Real-World Decision Makers. boru.pbworks.com/f/modulosjan07.pdf
Demonstration session at the 2009 Conference Describes 22 methods and tools that can be used
of the American Evaluation Association, to evaluate the direct and indirect impacts of in-
Orlando, Fla., Nov. 2009. http://comm.eval.org/ novation adoption.
resources/viewdocument/?DocumentKey=e5b
ac388-f1e6-45ab-9e78-10e60cea0666. Roche, C. (1999) Impact Assessment for
Development Agencies: Learning to Value
Funnell S. and Rogers, P. (2011) Purposeful Change Oxford: OXFAM, Novib
Program Theory: Effective Use of Theories of
Change and Logic Models. San Francisco: Rogers, Patricia J. (2009) Matching Impact
Jossey-Bass/Wiley. Evaluation Design to the Nature of the
Intervention and the Purpose of the Evaluation,
Guijt, I. (1999) Participatory Monitoring and Journal of Development Effectiveness, 1(3): 217-
Evaluation for Natural Resource Management 226. Working paper version available at: http://
and Research. Socio-economic Methodologies www.3ieimpact.org/admin/pdfs_papers/50.pdf.
for Natural Resources Research. Chatham, UK:
Natural Resources Institute. http://www.nri.org/ Walter, S., Olivier, J., Churches, T. and Grzebieta, R.
publications/bpg/bpg04.pdf (2011). The impact of compulsory cycle helmet
legislation on cyclist helmet head injuries in New
Miles, M. and Huberman, M. (1994) Qualitative South Wales, Australia, Accident Analysis and
Data Analysis: An Expanded Sourcebook (2nd ed.) Prevention, 43 : 2064-2071.
Thousand Oaks California; Sage Publications.
Outlines strategies for checking causal expla- White, S. and J. Petit (2004) Participatory Methods
nations, including searching for disconfirming and the Measurement of Wellbeing Participatory
evidence, following up exceptions, making and Learning and Action 50, London: IIED
testing predictions.
www.betterevaluation.org information on evalu-
Patton, MQ (2008) State of the Art in Measuring ation methods for development, including user-
Development Assistance. Discuses the impor- contributed examples and comments
tance of interpretation and managing uncertainty
in effective management. www.mymande.org information, videos and links
to information about evaluation methods

| Introduction to Impact Evaluation | | 18 |

You might also like