Data Quality Audit Tool
Data Quality Audit Tool
Data Quality Audit Tool
Additional financial support was provided by the President’s Emergency Plan for AIDS Relief and the
Global Fund to Fight AIDS, TB and Malaria.
The author’s views expressed in this publication do not necessarily reflect the views of USAID or the
United States Government. This publication can be accessed online at the MEASURE Evaluation Web
site: http://www.cpc.unc.edu/measure.
Acknowledgements
This tool was developed with input from a number of individuals representing various organizations.
Those most directly involved in development of the tool include Ronald Tran Ba Huy of The Global
Fund to Fight AIDS, Tuberculosis and Malaria and Karen Hardee, J. Win Brown, Ron Stouffer,
Sonja Schmidt, Yoko Shimada, David Boone, and Philip Setel of the MEASURE Evaluation
Project. Einar Heldal, TB Consultant and Charlotte Kristiansson of the Swiss Tropical Institute
also contributed to the development of the tool. Others who were instrumental in its development
include: Bernhard Schwartländer, Bernard Nahlen, Daniel Low-Beer, Linden Morrison, John
Cutler, Itamar Katz, Gulshod Allabergenova, Marton Sziraczki, and George Shakarishvili from
The Global Fund to Fight AIDS, TB and Malaria; Kathy Marconi, Michelle Sherlock, and Annie
La Tour from the Office of the Global AIDS Coordinator. Others who provided technical input
and review included: Malgosia Grzemska, Christian Gunneberg, Pierre-Yves Norval, Catherine
Bilger, Robert Makombe, Yves Souteyrand, Tisha Mitsunaga, Cyril Pervilhac, Chika Hayashi,
Abdikamal Alisalad, Evelyn Isaacs, Thuy Nguyen Thi Thanh, Spes C. Ntabangana, Andrea Godfrey,
and Mehran Hosseini of the World Health Organization (WHO); Bilali Camara of PAHO/WHO,
Deborah Rugg and Saba Moussavi of UNAIDS, Bob Pond of Health Metrics Network (HMN),
Pepukai Chikudwa of the International HIV/AIDS Alliance, Arnaud Trebucq of the International
Union Against Tuberculosis and Lung Disease, Rene L’Herminez of KNCV Tuberculosis
Foundation, Rick Steketee of PATH, Verne Kemerer of MEASURE Evaluation, Abdallah Bchir
and Anshu Banerjee of the Global Alliance for Vaccines and Immunization (GAVI); John Novak
from USAID; Scott McGill and Gloria Sanigwa from Family Health International (FHI); Matthew
Lynch from Johns Hopkins University, and Lee Yerkes from the Elizabeth Glaser Pediatrics AIDS
Foundation. In addition, the tool greatly benefited from the participation of a number of individuals
during pilot tests in Tanzania, Rwanda, Vietnam, and Madagascar.
Acknowledgements . ..................................................................................................................................3
Introduction ................................................................................................................................................7
A. Background..........................................................................................................................................7
B. Objectives . ..........................................................................................................................................8
C. Conceptual Framework ......................................................................................................................9
D. Methodology . .....................................................................................................................................9
E. Selection of Sites . .............................................................................................................................15
F. Outputs ...............................................................................................................................................16
G. Ethical Considerations .....................................................................................................................17
H. Implementation .................................................................................................................................18
Phase 1. Preparation and Initiation ......................................................................................................21
Step 1. Select Country, Program/Project(s), Indicator(s), and Reporting Period ...............................22
Step 2. Notify Program, Request Documentation and Obtain National Authorizations ...................25
Step 3. Select Sites to be Audited .........................................................................................................29
Step 4. Prepare for On-Site Audit Visits ...............................................................................................32
Step 5. Review Documentation ...........................................................................................................36
Phase 2. M&E unit ..................................................................................................................................37
Step 6. Assessment of Data Management SYSTEMS (at the M&E Unit) .......................................38
Step 7. Trace and Verify Results from Intermediate Aggregation Levels (at the M&E Unit) . ........39
Phase 3. Intermediate Aggregation Level(s) ........................................................................................41
Step 8. Assessment of Data Management Systems (at the Intermediate Aggregation Levels) ........41
Step 9. Trace and Verify Results from Site Reports (at the Intermediate Aggregation Levels) .......42
Phase 4. Service Delivery Sites . .............................................................................................................44
Step 10. Assessment of Data Collection and Reporting System (at the Service Delivery Points) . .44
Step 11. Trace and Verify Results from Source Documents (at the Service Delivery Points) .........45
Phase 5. M&E Unit . ................................................................................................................................48
Step 12. Consolidate Assessment of Data Management Systems .....................................................49
Step 13. Draft Preliminary Finding and Recommendation Notes .....................................................52
Step 14. Conduct a Closeout Meeting .................................................................................................55
A. Background
National programs and donor-funded projects are working towards achieving ambitious goals
related to the fight against diseases such as Acquired Immunodeficiency Syndrome (AIDS),
Tuberculosis (TB), and Malaria. Measuring the success and improving the management of these
initiatives is predicated on strong monitoring and evaluation (M&E) systems that produce quality
data related to program implementation.
In the spirit of the “Three Ones,” the “Stop TB Strategy,” and the “RBM Global Strategic Plan,”
a number of multilateral and bilateral organizations have collaborated to jointly develop a Data
Quality Assessment (DQA) Tool. The objective of this harmonized initiative is to provide a
common approach for assessing and improving overall data quality. A single tool helps to ensure
that standards are harmonized and allows for joint implementation between partners and with
National Programs.
The DQA Tool focuses exclusively on (1) verifying the quality of reported data, and (2) assessing
the underlying data management and reporting systems for standard program-level output
indicators. The DQA Tool is not intended to assess the entire M&E system of a country’s response
to HIV/AIDS, Tuberculosis, or Malaria. In the context of Figure 1. Organizing Framework
HIV/AIDS, the DQA Tool relates to component 10 (i.e., for a Functional National HIV
supportive supervision and data auditing) of the “Organizing M&E System – 12 Components.
Framework for a Functional National HIV M&E System.1”
Two versions of the DQA Tool have been developed: (1) the
“Data Quality Audit Tool” which provides guidelines to be
used by an external audit team to assess a program/project’s
ability to report quality data; and (2) the “Routine Data
Quality Assessment Tool” (RDQA) which is a simplified
version of the DQA Tool for auditing that allows programs
and projects to assess the quality of their data and strengthen
their data management and reporting systems.
1
UNAIDS (2008). Organizing Framework for a Functional National HIV Monitoring and Evaluation System.
Geneva: UNAIDS.
• Verify the quality of reported data for key indicators at selected sites; and
• Assess the ability of data management systems to collect and report
quality data.
In addition, for the programs/projects being audited, the findings of the DQA can also be very
useful for strengthening their data management and reporting systems.
B. Objectives
The DQA Tool for auditing provides processes, protocols, and templates addressing how to:
• Determine the scope of the data quality audit. The DQA Tool begins with suggested
criteria for selecting the country, program/project(s), and indicators to be reviewed. In
most cases, the Organization Commissioning the DQA will select these parameters.
• Engage the program/project(s) and prepare for the audit mission. The DQA Tool
includes template letters for notifying the program/project of the data quality audit (and
for obtaining relevant authorizations), as well as guidelines for preparing the country
mission.
• Assess the design and implementation of the program/project’s data management
and reporting systems. The DQA Tool provides steps and a protocol to identify potential
risks to data quality created by the program/project’s data management and reporting
system.
• Trace and verify (recount) selected indicator results. The DQA Tool provides
protocol(s) with special instructions, based on the indicator and type of Service Delivery
Site (e.g. health facility or community-based). These protocols will direct the Audit Team
as it verifies data for the selected indicator from source documents and compares the
results to the program/project(s) reported results.
• Develop and present the Audit Team’s findings and recommendations. The
DQA Tool provides instructions on how and when to present the DQA findings and
recommendations to program/project officials and how to plan for follow-up activities to
ensure that agreed-upon steps to improve systems and data quality are completed.
Note: While the Data Quality Audit Tool is not designed to assess the quality of services provided,
its use could facilitate improvements in service quality as a result of the availability of better
quality data related to program performance.
The conceptual framework for the DQA and RDQA is illustrated in the Figure 1 (below). Generally,
the quality of reported data is dependent on the underlying data management and reporting systems;
stronger systems should produce better quality data. In other words, for good quality data to be
produced by and flow through a data management system, key functional components need to be
in place at all levels of the system — the points of service delivery, the intermediate level(s) where
the data are aggregated (e.g. districts, regions), and the M&E unit at the highest level to which data
are reported. The DQA and RDQA tools are therefore designed to:
Introduction – Figure 1. Conceptual Framework for the (R)DQA: Data Management and
Reporting Systems, Functional Areas, and Data Quality.
D. Methodology
The DQA and RDQA are grounded in the components of data quality, namely, that programs and projects
need accurate, reliable, precise, complete and timely data reports that managers can use to effectively
direct available resources and to evaluate progress toward established goals (see Introduction Table 1 on
the following page). Furthermore, the data must have integrity to be considered credible and should be
produced ensuring standards of confidentiality.
Dimension of
Operational Definition
Data Quality
Also known as validity. Accurate data are considered correct: the data measure what
Accuracy they are intended to measure. Accurate data minimize errors (e.g., recording or
interviewer bias, transcription error, sampling error) to a point of being negligible.
The data generated by a program’s information system are based on protocols and
procedures that do not change according to who is using them and when or how
Reliability
often they are used. The data are reliable because they are measured and collected
consistently.
This means that the data have sufficient detail. For example, an indicator requires
the number of individuals who received HIV counseling & testing and received their
Precision
test results, by sex of the individual. An information system lacks precision if it is
not designed to record the sex of the individual who received counseling and testing.
Completeness means that an information system from which the results are derived
Completeness is appropriately inclusive: it represents the complete list of eligible persons or units
and not just a fraction of the list.
Data are timely when they are up-to-date (current), and when the information is
available on time. Timeliness is affected by: (1) the rate at which the program’s
Timeliness
information system is updated; (2) the rate of change of actual program activities;
and (3) when the information is actually used or required.
Data have integrity when the system used to generate them is protected from
Integrity
deliberate bias or manipulation for political or personal reasons.
Confidentiality means that clients are assured that their data will be maintained
according to national and/or international standards for data. This means that
Confidentiality personal data are not disclosed inappropriately, and that data in hard copy and
electronic form are treated with appropriate levels of security (e.g. kept in locked
cabinets and in password protected files).
Based on these dimensions of data quality, the DQA Tool is comprised of two components: (1)
assessment of data management and reporting systems; and (2) verification of reported data for
key indicators at selected sites.
Accordingly, the implementation of the DQA is supported by two protocols (see ANNEX 1):
Protocol 1: System Assessment Protocol;
Protocol 2: Data Verification Protocol.
These protocols are administered at each level of the data-collection and reporting system
(i.e., program/project M&E Unit, Service Delivery Sites and, as appropriate, any Intermediate
Aggregation Level – Regions or Districts).
The assessment of the data management and reporting systems will take place in two stages:
1. Off-site desk review of documentation provided by the program/project;
2. On-site follow-up assessments at the program/project M&E Unit and at selected Service
Delivery Sites and Intermediate Aggregation Levels (e.g., Districts, Regions).
The assessment will cover five functional areas, as shown in Introduction – Table 2.
The first stage of the data-verification occurs at the Service Delivery Sites. There are five types of
standard data-verification steps that can be performed at this level (Introduction – Table 3):
The second stage of the data-verification occurs at the Intermediate Aggregation Levels (e.g.,
Districts, Regions) and at the program/project M&E Unit. As illustrated in Introduction – Figure
3, the DQA evaluates the ability at the intermediate level to accurately aggregate or otherwise
process data submitted by Service Delivery Sites, and report these data to the next level in a timely
fashion. Likewise, the program/project M&E Unit must accurately aggregate data reported by
intermediate levels and publish and disseminate National Program results to satisfy the information
needs of stakeholders (e.g. donors).
The outcome of these verifications will be statistics on the accuracy, availability, completeness,
and timeliness of reported data.
E. Selection of Sites
There are four methods for selecting sites for the Data Quality Audit:
1. Purposive selection: The sites to be visited are purposely selected, for example based on their
size, their geographical proximity or concerns regarding the quality of their reported data.
In this case, there is no need for a sampling plan. However, the data quality audit findings
produced from such a “purposive” or targeted sample cannot be used to make inferences or
generalizations about all the sites, or a group of sites, in that country.
2. Restricted site design: Only one site is selected for the DQA. The benefit of this approach
is that the team can maximize its efforts in one site and have a high degree of control over
implementation of the audit protocols and knowledge of the site-specific systems from
which the results are derived. This approach is ideal for measuring the change in data quality
attributable to an intervention (e.g. data management training). In this approach, the data
quality audit is implemented in a selected site; the intervention is conducted, and is followed
by another data quality audit in the same site. Any change in the quality of data could therefore
be most likely a result of the intervention.
The number of sites selected for a given DQA will depend on the resources available to conduct
the audit and the level of precision desired for the national level estimate of the Verification Factor.
A more precise estimate requires a larger sample of sites. The Audit Teams should work with the
Organization Commissioning the DQA to determine the right number of sites for a given program
and indicator.
F. OUTPUTS
In conducting the DQA, the Audit Team will collect and document: (1) evidence related to the
review of the program/project’s data management and reporting system; and (2) evidence related
to data verification. The documentation will include:
• Completed protocols and templates included in the DQA Tool.
• Write-ups of observations, interviews, and conversations with key data quality officials
at the M&E Unit, at intermediary reporting locations, and at Service Delivery Sites.
These summary statistics, which are automatically generated in the Excel files, are developed
from the system assessment and data verification protocols included in this tool.
G. ETHICAL CONSIDERATIONS
The data quality audits must be conducted with the utmost adherence to the ethical standards of the
country and, as appropriate, of the Organization Commissioning the DQA. While the audit teams
may require access to personal information (e.g., medical records) for the purposes of recounting
and cross-checking reported results, under no circumstances will any personal information be
disclosed in relation to the conduct of the audit or the reporting of findings and recommendations.
The Audit Team should neither photocopy nor remove documents from sites.
In addition, the auditor shall not accept or solicit directly or indirectly anything of economic value
as a gift, gratuity, favor, entertainment or loan that is or may appear to be designed to in any manner
influence official conduct, particularly from one who has interests that might be substantially
affected by the performance or nonperformance of the auditor’s duty. This provision does not
prohibit the acceptance of food and refreshments of insignificant value on infrequent occasions in
the ordinary course of a meeting, conference, or other occasion where the auditor is properly in
attendance, nor the acceptance of unsolicited promotional material such as pens, calendars, and/or
other items of nominal intrinsic value.
2
Please refer to ANNEX 5 for a description of the methodology for calculating the Composite Verification Factor.
The Data Quality Audit will be implemented chronologically in 19 steps conducted in six phases,
as shown in Introduction Figure 5.
PHASE 1 – Steps 1-5 are performed at the Organization Commissioning the DQA and at
the Audit Team’s Office.
• The Organization Commissioning the DQA determines the country and program/
project(s) to be audited. The Audit Team and/or the Organization Commissioning the
DQA then select(s) the corresponding indicators and reporting period (Step 1).
• The Organization Commissioning the DQA is responsible for obtaining national
authorization to conduct the audit, as appropriate, and for formally notifying the program/
project of the DQA. The Audit Team follows up with a request for documentation for its
review prior to visiting the program/project, including information from which to draw
the sample of sites (Step 2).
PHASE 3 – Steps 8-9 are conducted at the Intermediate Aggregation Levels (such as a
district or regional offices), if the program/project data management system has such levels.
• The Audit Team assesses the data management and reporting system by determining how
data from sub-reporting levels (e.g., Service Delivery Sites) are aggregated and reported
to the program/project M&E Unit (Step 8).
• The Audit Team continues to trace and verify the numbers reported from the Service
Delivery Sites to the intermediate level (Step 9).
PHASE 4 – Steps 10-11 are conducted at Service Delivery Sites (e.g., in a health facility or a
community).
• The Audit Team continues the assessment of the data management and reporting system
at Service Delivery Sites by determining if a functioning system is in place to collect,
check, and report data to the next level of aggregation (Step 10).
• The Audit Team also traces and verifies data for the selected indicator(s) from source
documents to reported results from Service Delivery Sites (Step 11).
PHASE 5 – Steps 12-14 take place back at the program/project M&E Unit.
• The Audit Team finalizes the assessment of the data management and reporting system by
answering the final Audit Summary Questions (Step 12).
• The Audit Team then drafts its preliminary DQA findings and recommendations (Step
13) and shares them with the program/project M&E officials during an Audit Closeout
Meeting (Step 14). Emphasis is placed on reaching a consensus with M&E officers on
what steps to take to improve data quality.
The first phase of the DQA occurs prior to the Audit Team being
PHASE 1 on site at the location of the program/project. Responsibility for
PHASE 1 rests partly with the Organization Commissioning the
DQA and partly with the Audit Agency. The steps in PHASE
Off-Site 1 are to:
(Preparation
and Initiation) 1. Identify the country and program/project and select the
indicator(s) and reporting period that will be the focus of
the actual data verification work at a few Service Delivery
Sites.
1. Select Country,
2. Notify the selected program/project(s) of the impending
Program/Project(s)
Indicators and
data quality audit and request documentation related to the
Reporting Period data management and reporting system that the Audit Team
can review in advance of the site visits. Obtain national
authorization(s), if needed, to undertake the audit. Notify
2. Notify Program, key country officials and coordinate with other organizations
Request such as donors, implementing partners and national audit
Documentation and agencies, as necessary.
Obtain National 3. Determine the type of sample and the number of sites to be
Authorizations the subject of on-site data quality verifications.
4. Prepare for the site visits, including determining the timing
of the visit, constituting the Audit Team, and addressing
3. Select Sites
to be Audited logistical issues.
5. Perform a “desk review” of the provided documentation to
begin to determine if the program/project’s data management
4. Prepare for On- and reporting system is capable of reporting quality data if
site Audit Visits: 1) implemented as designed.
Timing; 2) Team
Constitution; The steps in PHASE 1 are estimated to take four to six
3) Logistics weeks.
5. Review
Documentation
Step 1 can be performed by the Organization Commissioning the DQA and/or the Audit Team.
In all likelihood, the Organization Commissioning the DQA will determine which country
and program/project should be the subject of the Data Quality Audit. This DQA Tool presents
strategies for selecting a program/project(s) for an audit by providing a list of relevant criteria and
other issues to be considered. There is no single formula for choosing program/project(s) to be
audited; international, local and programmatic circumstances must be taken into consideration in
the decision. The audit documentation should include information about who made the selection
and, to the extent known, the rationale for that decision.
An illustrative list of criteria to be used for the selection of a country and program/project is shown
below in Step 1 – Table 1. If a National program is having the audit conducted, it can also use
these criteria to select which aspects of the program (e.g. indicators) will be audited.
Step 1 – Table 1. Illustrative Criteria for Selection of a Country, Disease/Health Area, and
Program/Project
1 Amount of funding invested in the countries and programs/projects within the disease/health area.
Results reported from countries and programs/projects (such as number of people on ART, ITNs
2
distributed, or Directly Observed Treatment, Short Course [DOTS] Detection Numbers).
Large differences in results reporting from one period to the next within a country or a program/
3
project.
Discrepancies between programmatic results and other data sources (e.g., expenditures for health
4
products that are inconsistent with number of people reported on anti-retroviral [ARV] treatment).
Inconsistencies between reported data from a specific project and national results (e.g., reported
5
number of ITNs distributed is inconsistent with national numbers).
Findings of previous M&E assessments indicating gaps in the data management and reporting
6
systems within program(s)/project(s).
7 Opinion/references about perceived data quality weaknesses and/or risks within a program/project.
9 A desire to have some random selection of countries and programs/projects for audit.
This list should help the Organization Commissioning the DQA prioritize the countries or program/
project(s). ANNEX 2, Step 1 – Template 1 is illustrative of such an analysis.
Other important decisions in preparing for a Data Quality Audit are to determine: (1) which
indicators will be included in the audit; and (2) for what reporting period(s) the audit will be
conducted. It is recommended that up to two indicators be selected within a Disease/Health
Area and that, if multiple Diseases/Health Areas are included in a Data Quality Audit, that a
maximum of four indicators be included. More than four indicators could lead to an excessive
number of sites to be evaluated.
The decision regarding which indicators to include will generally be made by the Organization
Commissioning the DQA and can be based on a number of criteria, including an analysis of the
funding levels to various program areas (e.g., ARV, Prevention of Mother-to-Child Transmission
[PMTCT], ITN, DOTS, Behavior Change Communication [BCC]) and the results reported for the
related indicators. In addition, the deciding factor could also be program areas of concern to the
Organization Commissioning the DQA and/or to the National program (e.g., community-based
programs that may be more difficult to monitor than facility-based programs). In some cases, the
Audit Agency may be asked to do an initial selection of indicators to be proposed to the Organization
Commissioning the DQA. The analysis conducted in Step 1 can help guide the selection of indicators
to be included in the Data Quality Audit.
The criteria for selecting the indicators for the Data Quality Audit could be the following:
1. “Must Review” Indicators. Given the program/project(s) selected for auditing, the
Organization Commissioning the DQA may have a list of “must review” indicators that
should be selected first (e.g., indicators related to People on ARV Treatment, ITNs Distributed
[or re-treated], and DOTS Detection Numbers). These are generally the indicators that are
internationally reported to measure the global response to the disease. For example, for audits
undertaken through the Global Fund, the indicators to be audited will generally come from its
list of “Top 10 indicators.” Under the President’s Emergency Plan for AIDS Relief, the list
ANNEX 2, Step 1 – Template 2 contains an illustrative template for analyzing the relative magnitude
of the investments and indicator results per program area.
It is also important to clearly identify the reporting period associated with the indicator(s) to be
audited. Ideally, the time period should correspond to the most recent relevant reporting period
for the national system or to the program/project activities associated with the Organization
Commissioning the DQA. If the circumstances warrant, the time period for the audit could be less
(e.g., a fraction of the reporting period, such as the last quarter or month of the reporting period).
For example, the number of source documents in a busy VCT site could be voluminous, audit
staff resources may be limited, or the program/project’s Service Delivery Sites might produce
monthly or quarterly reports related to the relevant source documents. In other cases, the time
period could correspond to an earlier reporting period where large results were reported by the
program/project(s).
ANNEX 2, Step 1 – Template 3 provides a tool that can be used to document selection of the
country, program/project(s), indicator(s), and reporting period being audited.
The Organization Commissioning the DQA should notify the program/project about the impending
Data Quality Audit as soon as possible and obtain national and other relevant authorizations. They
should also notify other organizations, as appropriate, about the audit and request cooperation. The
Audit Team is expected to comply with national regulations regarding data confidentiality
and ethics. It is the Audit Team’s responsibility to identify such national regulations and adhere
to them.
ANNEX 2, Step 2 – Template 1 contains draft language for the notification letter. This letter can be
modified, as needed, in consultation with local stakeholders (e.g., the National Disease Commission,
the MOH, the CCM, relevant donors). It is important that the Organization Commissioning the
DQA stress the need for the relevant M&E Unit staff member(s) to accompany the Audit Team
on its site visits. The letter should be accompanied by the initial documentation request from the
M&E Unit, which is found in Step 2 – Table 1.
After the notification letter has been sent, the Organization Commissioning the DQA should send
a copy of the notification letter to all relevant stakeholders, including, for example:
• Host country officials related to the program/project being audited;
• National audit agency, as appropriate; and
• Donors, development partners, international implementing partner organizations, and
relevant M&E working-group representatives.
The Audit Agency should follow up with the selected program/project about the pending audit,
timeframes, contact points, and the need to supply certain information and documentation in
advance.
The Audit Team will need four types of documentation at least two weeks in advance of the country
mission:
1. A list of all service points with latest reported results related to the indicator(s);
2. A description of the data-collection and reporting system;
3. The templates of the data-collection and reporting forms; and
4. Other available documentation relating to the data management and reporting systems and a
description of the program/project (e.g., a procedures manual).
Once Service Delivery Sites and the related Intermediate Aggregation Levels are selected for the
audit, it is critical that the Audit Team work through the program/project to notify the selected
sites and provide them with the information sheets found in ANNEX 3, Step 2 – Templates 1, 2, 3.
This is meant to ensure that relevant staff is available and source documentation accessible for the
indicator(s) and reporting period being audited.
2) Description of the data-collection and reporting system related to the indicator(s). The Audit
Team should receive the completed template(s) found in ANNEX 2, Step 2 – Template 2 describing
the data-collection and reporting system related to the indicator(s) being audited.
3) Templates of the data-collection and reporting forms. The Audit Team should receive the
templates of all data-collection and reporting forms used at all levels of the data management
system for the related indicator(s) (e.g., patient records, client intake forms, registers, monthly
reports, etc.).
4) Other documentation for the systems review. The other documents requested are needed so
that the Audit Team can start assessing the data collection and reporting system for the selected
indicator(s). These documents are listed on the following page in Step 2 – Table 1. In the event
the program/project does not have such documentation readily available, the Audit Team should be
prepared to follow-up with the program/project management once in country.
In addition, the Organization Commissioning the Audit should also provide the Audit Team with
relevant background documents regarding the country and program/project being audited.
In certain cases, special authorization for conducting the DQA may be required from another
national body, such as the National Audit Agency. ANNEX 2, Step 2 – Template 3 provides text
for the letter requesting such additional authorization to conduct the Data Quality Audit. This letter
should be sent by the Organization Commissioning the DQA. The recipient(s) of the authorization
letter will vary according to what program or project is being audited. The national authorization
and any other relevant permission to conduct the DQA from donors supporting audited sites or
program/project officials should be included in the Final Audit Report as an attachment.
Step 3 can be performed by the Organization Commissioning the DQA and/or the Audit Team.
In this section, four alternatives are presented for selecting the sites in which the data quality audit
teams will conduct the work. The alternatives are presented in order of complexity, from Sampling
Strategy A which is completely non-statistical, to Sampling Strategy D which is a multistage cluster
sampling method that can be used to make statistical inferences about data quality on a national
scale. Sampling Strategies B and C represent midpoints between the non-statistical and statistical
approaches and offer the audit team an opportunity to tailor the audit to a specific set of sites based
on need or interest.
The Organization Commissioning the DQA should decide on the sampling strategy based on the
objective of the DQA and available resources. The Audit Agency will determine, based on which
type of sample is used, the sites for the audit. The Organization Commissioning the DQA may want
to be involved in decisions regarding site selection, particularly if the sampling is not random.
This is a pre-determined sample that the Organization Commissioning the DQA dictates to the
Data Quality Audit team. In some cases, there may be a need for a data quality audit to focus
specifically on a set of service delivery points that are predetermined. In this case, there is no
need for a sampling plan. However, the data quality audit findings produced from such a
“purposive” or targeted sample cannot be used to make generalized statements (or statistical
inferences) about the total population of sites in that country. The findings will be limited to
those sites visited by the audit team.
Sampling Strategy B is also called a restricted site design. It is commonly used as a substitute for
probability sampling (based on a random algorithm) and is a good design for comparison of audit
results over multiple periods. In the Restricted Site design, the audit team selects one site where all
the work will occur. The benefit of this approach is that the team can maximize its efforts in one
site and have a high degree of control over implementation of the audit protocols and knowledge
of the site-specific systems from which the results are derived. Sampling Strategy B is ideal for
evaluating the effects of an intervention to improve data quality. For example, the DQA is
implemented at a site and constitutes a baseline measurement. An intervention is conducted
(e.g. training), and the DQA is implemented a second time. Since all factors that can influence
data quality are the same for both the pre and post test (the same site is used), any difference
in data quality found on the post test can most likely be attributable to the intervention. Such
a repeated measure approach using the data quality audit tool might be prohibitively expensive if
used in conjunction with a sampling plan that involves many sites.
This sample is drawn by the Data Quality Audit team with the objective of maximizing exposure
to important sites while minimizing the amount of time and money spent actually implementing
the audit. In most cases, Sampling Strategy C involves the random selection of sites from within
a particular group, where group membership is defined by an attribute of interest. Examples
of such attributes include location (e.g. urban/rural, region/district), volume of service, type of
organization (e.g. faith-based, non-governmental), or performance on system assessments (e.g.
sites that scored poorly on the M&E Systems Strengthening Tool).
The stratified random sampling used in Sampling Strategy C allows the audit team to make
inferences from the audit findings to all the sites that belong to the stratification attribute
of interest (like all rural sites, all very large sites, all faith-based sites, etc.). In this way, the
audit findings can be generalized from the sample group of sites to a larger “population” of sites to
which the sampled sites belong. This ability to generate statistics and make such generalizations
can be important and is discussed in more detail in the section below describing Sampling Strategy
D.
The stratified sampling used in Sampling Strategy C is sub-national: the data quality auditors are
not attempting to make generalizations about national programs. In this sense, the strategy differs
from Sampling Strategy D mainly with respect to its smaller scope. Both strategies use random
sampling (explained in more detail in Annex 4), which means that within a particular grouping of
sites (sampling frame), each site has an equal chance of being selected into the audit sample.
A Verification Factor can be calculated that indicates the data quality for the group with the attribute
of interest but which is not national in scope.
Sampling Strategy D is used to derive a national level Verification Factor for program-level
indicators. It is complex and requires updated and complete information on the geographical
distribution of sites (for whatever indicators have been selected) as well as the site-specific
reported results (counts) for the indicator that is being evaluated. Sampling Strategy D could also
be referred to as a modified two-stage cluster sample (modified in that a stratified random sample
of sites, rather than a simple random sample, is taken within the selected clusters).
Cluster sampling is a variation on simple random sampling (where all sites would be chosen
randomly) that permits a more manageable group of sites to be audited. Were all sites chosen at
random they would likely be dispersed all over the country and require much time and resources
to audit. Cluster sampling allows for the selection of a few districts, thereby reducing the amount
of travel required by the auditors.
The primary sampling unit for Sampling Strategy D is a cluster, which refers to the administrative
or political or geographic unit in which Service Delivery Sites are located. In practice, the selection
of a cluster is usually a geographical unit like a district. Ultimately, the selection of a cluster
allows the audit team to tailor the sampling plan according to what the country program looks like.
The strategy outlined here uses probability proportionate to size (PPS) to derive the final set of
sites that the audit team will visit. Sampling Strategy D generates a selection of sites to be visited
by the audit team that is proportionately representative of all the sites where activities supporting
the indicator(s) under study are being implemented.
Clusters are selected in the first stage using systematic random sampling, where clusters with
active programs reporting on the indicator of interest are listed in a sampling frame. In the second
stage, Service Delivery Sites from selected clusters are chosen using stratified random sampling
where sites are stratified on volume of service.
The number of sites selected for a given DQA will depend on the resources available to conduct
the audit and the level of precision desired for the national level estimate of the Verification Factor.
The Audit Teams should work with the Organization Commissioning the DQA to determine the
right number of sites for a given program and indicator. Annex 4 contains a detailed discussion
and an illustrative example of Sampling Strategy D for the selection of clusters and sites for the
DQA.
Note: The precision of estimates of the Verification Factor found using the GAVI sampling
methodology employed here have been questioned.3 It is strongly advised that the Auditing Agency
have access to a sampling specialist who can guide the development of representative samples and
that the verification factors generated using these methods be interpreted with caution.
3
Woodard S., Archer L., Zell E., Ronveaux O., Birmingham M. Design and Simulation Study of the Immunization
Data Quality Audit (DQA). Ann Epidemiol, 2007;17:628–633.
The Audit Agency will need to prepare for the audit site visits. In addition to informing the
program/project and obtaining a list of relevant sites and requesting documentation (Steps 2-3),
the Audit Agency will need to: (1) estimate the timing required for the audit (and work with the
program/project to agree on dates); (2) constitute an Audit Team with the required skills; and (3)
prepare materials for the site visits. Finally, the Audit Agency will need to make travel plans for
the site visits.
A – ESTIMATE TIMING
Depending on the number and location of the sampled sites to be visited, the Audit Agency will
need to estimate the time required to conduct the audit. As a guideline:
• The M&E Unit will typically require two days (one day at the beginning and one day at
the end of the site visits);
• Each Intermediate Aggregation Level (e.g., District or Provincial offices) will require
between one-half and one day;
• Each Service Delivery Site will require between one-half and two days (i.e., more than
one day may be required for large sites with reported numbers in the several hundreds or
sites that include satellite centers or when “spot-checks” are performed).
• The Audit Team should also plan for an extra work day after completion of the site visits
to prepare for the meeting with the M&E Unit.
Step 4 – Table 1 on the following page provides an illustrative daily schedule for the site visits
which will help the Audit Agency plan for the total time requirement.
Country: Indicator:
Estimated
Activity Notes
Time
Note: Add travel and DQA team work days, as needed
4 The time required at the Service Delivery Points will vary between one and two days depending on the size of the
reported numbers to be verified and whether or not spot-checks are performed.
Country: Indicator:
Estimated
Activity Notes
Time
Complete “DQA Protocol 1: System Assessment Protocol”
4 a. Request additional documentation (if needed) 1-2 hrs Morning – day 1
b. Discuss and get answers to protocol questions
5 Complete “DQA Protocol 2: Data Verification Protocol” 2-4 hrs Afternoon – day 1
While the Organization Commissioning the DQA will select the organization to conduct the data
quality audit, it is recommended that the following skills be represented in the audit teams:
• Public Health (closely related to the disease area and indicator(s) being audited);
• Program Auditing;
• Program Evaluation (e.g., health information systems, M&E systems design, indicator
reporting);
• Data Management (e.g., strong understanding of and skills in data models and querying/
analyzing databases);
• Excel (strong skills preferable to manipulate, modify and/or create files and worksheets);
and
• Relevant Country Experience; preferable.
Audit Team members can have a combination of the skills listed above. While the total number of
team members will vary by the size of the audit, it is recommended that the Audit Team comprise
a minimum of two to four consultants including at least one Senior Consultant. The team may be
comprised of international and/or regional consultants. In addition, if the consultants do not speak
the country language, one or more independent translator(s) should be hired by the Audit Team.
Finally, the Organization Commissioning the DQA may have other requirements for team members
or skills. It will be important for all Audit Team members to be familiar with the indicator-specific
protocols being used in the audit and to become familiar with the program/project being audited.
C – PREPARE LOGISTICS
Note: While the protocols in the DQA are automated Excel files, the Audit Team should be
prepared with paper copies of all needed protocols. In some cases, it may be possible to use
computers during site visits, but in other cases the Audit Team will need to fill out the protocols on
the paper copies and then transcribe the findings to the Excel file.
Planning Travel
The Audit Team should work with the program/project to plan for travel to the country (if the
Audit Team is external) and to the sampled sites — both to set appointments and to coordinate with
program/project staff that will accompany the audit team on the site visits. The Audit Team should
arrange for transportation to the sampled sites and for lodging for the team.
The purpose of reviewing and assessing the design of the program/project’s data management and
reporting system is to determine if the system is able to produce reports with good data quality if
implemented as planned. The review and assessment is accomplished in several steps, including a
desk review of information provided in advance by the program/project, and follow-up reviews at
the program/project M&E Unit, at selected Service Delivery Sites, and Intermediate Aggregation
Levels. During the off-site desk review, the Audit Team will work to start addressing the questions
in the DQA Protocol 1: System Assessment Protocol based on the documentation provided. The
Audit Team should nevertheless anticipate that not all required documentation will be submitted
by the program/project in advance of the country mission.
Ideally, the desk review will give the Audit Team a good understanding of the Program’s reporting
system — its completeness and the availability of documentation relating to the system and
supporting audit trails. At a minimum, the desk review will identify the areas and issues the Audit
Team will need to follow-up at the program/project M&E Unit (Phase 2).
Because the M&E system may vary among indicators and may be stronger for some indicators
than others, the Audit Team will need to fill out a separate DQA Protocol 1: System Assessment
Protocol for each indicator audited for the selected program/project. However, if indicators selected
for auditing are reported through the same data reporting forms and systems (e.g., ART and OI
numbers or TB Detection and Successfully Treated numbers), only one DQA Protocol 1: System
Assessment Protocol may be completed for these indicators.
ANNEX 1 shows the list of 39 questions included in the DQA Protocol 1: System Assessment
Protocol that the Audit Team will complete, based on its review of the documentation and the
audit site visits.
As the Audit Team is working, it should keep sufficiently detailed notes or “work papers” related
to the steps in the audit that will support the Audit Team’s final findings. Space has been provided
on the protocols for notes during meetings with program/project staff. In addition, if more detailed
notes are needed at any level of the audit to support findings and recommendations, the Audit
Team should identify those notes as “work papers” and the relevant “work paper” number should
be referenced in the appropriate column on all DQA templates and protocols. For example, the
“work papers” could be numbered and the reference number to the “work paper” noted in the
appropriate column on the DQA templates and protocols. It is also important to maintain notes
of key interviews or meetings with M&E managers and staff during the audit. Annex 3, Step 5
– Template 1 provides a format for the notes of those interviews.
6. Assess Data During PHASE 2, the Audit Team should meet the head of
Management the M&E Unit and other key staff who are involved in data
Systems management and reporting.
While the Data Quality Audit Team can determine a lot about the design of the data management and
reporting system based on the off-site desk review, it will be necessary to perform on-site follow-up
at three levels (M&E Unit, Intermediate Aggregation Levels, and Service Delivery Points) before
a final assessment can be made about the ability of the overall system to collect and report quality
data. The Audit Team must also anticipate the possibility that a program/project may have some data
reporting systems that are strong for some indicators, but not for others. For example, a program/
project may have a strong system for collecting ART treatment data and a weak system for collecting
data on community-based prevention activities.
The Excel-based DQA Protocol 1: System Assessment Protocol contains a worksheet for the Audit
Team to complete at the M&E Unit. The Audit Team will need to complete the protocol as well as obtain
documentary support for answers obtained at the program/project’s M&E Unit. The most expeditious
way to do this is to interview the program/project’s key data management official(s) and staff and to
tailor the interview questions around the unresolved systems design issues following the desk review
of provided documentation. Hopefully, one meeting will allow the Audit Team to complete the DQA
Protocol 1: System Assessment Protocol section (worksheet) for the M&E Unit.
It is important that the Audit Team include notes and comments on the DQA Protocol 1: System
Assessment Protocol in order to formally document the overall design (and implementation) of the
program/project data management and reporting system and identify areas in need of improvement.
Responses to the questions and the associated notes will help the Audit Team answer the 13 overarching
Audit Team Summary Questions towards the end of the DQA (see Step 12 – Table 2 for the list of
summary questions – which will be completely answered in PHASE 5 - Step 12).
As the Audit Team completes the DQA Protocol 1: System Assessment Protocol, it should keep in
mind the following two questions that will shape the preliminary findings (Step 13) and the Audit
Report (drafted in Step 15 and finalized in Step 17):
1. Does the design of the program/project’s overall data collection and reporting system ensure
that, if implemented as planned, it will collect and report quality data? Why/why not?
2. Which audit findings of the data management and reporting system warrant Recommendation
Notes and changes to the design in order to improve data quality? These should be documented
on the DQA Protocol 1: System Assessment Protocol.
Note: While the Audit Team is meeting with the M&E Unit, it should determine how the audit findings
will be shared with staff at the lower levels being audited. Countries have different communication
protocols; therefore in some countries, the Audit Team will be able to share preliminary findings at
each level, while in other countries, the M&E Unit will prefer to share findings at the end of the audit.
It is important for the Audit Team to comply with the communication protocols of the country. The
communication plan should be shared with all levels.
Step 7 is the first of three data verification steps that will assess, on a limited scale, if Service
Delivery Sites, Intermediate Aggregation Levels (e.g., Districts or Regions), and the M&E Unit
are collecting, aggregating, and reporting data accurately and on time.
The Audit Team will use the appropriate version of the DQA Protocol 2: Data Verification
Protocol—for the indicator(s) being audited—to determine if the sampled sites have accurately
recorded the service delivery on source documents. They will then trace those data to determine
if the numbers have been correctly aggregated and/or otherwise manipulated as the numbers are
submitted from the initial Service Delivery Sites, through Intermediary Aggregation Levels, to
the M&E Unit. The protocol has specific actions to be undertaken by the Audit Team at each level
of the reporting system (for more detail on the DQA Protocol 2: Data Verification Protocol,
see Steps 9 and 11). In some countries, however, Service Delivery Sites may report directly to
the central M&E Unit, without passing through Intermediate Aggregation Levels (e.g., Districts
or Regions). In such instances, the verifications at the M&E Unit should be based on the reports
directly submitted by the Service Delivery Sites.
While the data verification exercise implies recounting numbers from the level at which they are
first recorded, for purposes of logistics, the M&E Unit worksheet of the DQA Protocol 2: Data
Verification Protocol can be completed first. Doing so provides the Audit Team with the numbers
received, aggregated and reported by the M&E Unit and thus a benchmark for the numbers the
Audit Team would expect to recount at the Service Delivery Sites and the Intermediate Aggregation
Levels.
At the M&E Unit, the steps undertaken by the Audit Team on the DQA Protocol 2: Data
Verification Protocol are to:
1. Re-aggregate reported numbers from all Intermediate Aggregation Sites: Reported
results from all Intermediate Aggregation Sites (e.g., Districts or Regions) should be re-
aggregated and the total compared to the number contained in the summary report prepared
by the M&E Unit. The Audit Team should identify possible reasons for any differences
between the verified and reported results.
STATISTIC: Calculate the Result Verification Ratio for the M&E Unit.
STATISTIC: Calculate % of all reports that are A) available; B) on time; and C) complete.
C) % Complete Reports =
Number of reports that are complete from all Intermediate Aggregation Sites
Number of reports expected from all Intermediate Aggregation Sites
That is to say, for a report to be considered complete it should include at least (1) the reported
count relevant to the indicator; (2) the reporting period; (3) the date of submission of the report;
and (4) a signature from the staff having submitted the report.
Warning: If there are any indications that some of the reports have been fabricated (for the purpose
of the audit), the Audit Team should record these reports as “unavailable” and seek other data
sources to confirm the reported counts (for example, an end-of-year report from the site containing
results for the reporting period being audited). As a last resort, the Audit Team may decide to visit
the site(s) for which reports seem to be fabricated to obtain confirmation of the reported counts. In
any event, if these reported counts cannot be confirmed, the Audit Team should dismiss the reported
counts and record “0” for these sites in the DQA Protocol 2: Data Verification Protocol.
Note: In no circumstances should the Audit Team record personal information, photocopy or
remove documents from the M&E Unit.
During PHASE 3, the Audit Team should meet with key staff
involved in program/project M&E at the relevant Intermediate
9. Trace and Verify Aggregation Level — including the staff member(s) in charge
Results from of M&E and other staff who contribute to aggregating the
Site Reports data received from Service Delivery Sites and reporting the
aggregated (or otherwise manipulated) results to the next
reporting level.
The steps in PHASE 3 are estimated to take between one-half and one day.
In Step 8, the Audit Team continues the assessment of the data management and reporting system
at the intermediate aggregation levels at which data from Service Delivery Sites are aggregated
and manipulated before being reported to the program/project M&E Unit. Specific instructions
for completing the Intermediate Aggregation Level worksheet of the DQA Protocol 1: System
Assessment Protocol are found in the Excel file of the protocol.
The Audit Team will continue with the DQA Protocol 2: Data Verification Protocol for Steps 9
and 11.
At this stage of the audit, the Data Quality Audit seeks to determine whether the intermediary
reporting sites correctly aggregated the results reported by Service Delivery Points.
The Audit Team will perform the following data quality audit steps for each of the selected
indicators at the Intermediate Aggregation Level(s):
1. Re-aggregate reported numbers from all Service Delivery Points: Reported results from all
Service Delivery Points should be re-aggregated and the total compared to the number contained
in the summary report prepared by the Intermediate Aggregation Site. The Audit Team should
identify possible reasons for any differences between the verified and reported results.
STATISTIC: Calculate the Result Verification Ratio for the Intermediate Aggregation Site.
2. Review availability, completeness and timeliness of reports from all Service Delivery
Points. How many reports should there have been from all Service Delivery Points? How
many are there? Were they received on time? Are they complete?
STATISTIC: Calculate % of all reports that are A) available; B) on time; and C) complete.
C) % Complete Reports (i.e. contains all the relevant data to measure the indicator) =
Number of reports that are complete from all Service Delivery Points
Number of reports expected from all Service Delivery Points
That is to say, for a report to be considered complete, it should at least include (1) the reported
count relevant to the indicator; (2) the reporting period; (3) the date of submission of the report;
and (4) a signature from the staff having submitted the report.
Warning: If there are any indications that some of the reports have been fabricated (for the purpose
of the audit), the Audit Team should record these reports as “unavailable” and seek other data
sources to confirm the reported counts (for example, an end-of-year report from the site containing
results for the reporting period being audited). As a last resort, the Audit Team may decide to visit
the site(s) for which reports seem to be fabricated to obtain confirmation of the reported counts. In
any event, if these reported counts cannot be confirmed, the Audit Team should dismiss the reported
counts and record “0” for these sites in the DQA Protocol 2: Data Verification Protocol.
Note: In no circumstances should the Audit Team record personal information, photocopy or
remove documents from the Intermediate Aggregation Sites.
The fourth phase of the DQA takes place at the selected Service
PHASE 4 Delivery Sites where the following data quality audit steps are
performed:
10. Assess Data During PHASE 4, the Audit Team should meet with key
Collection and Re- data collection and management staff at the Service Delivery
porting System Site — including the staff involved in completing the source
documents, in aggregating the data, and in verifying the reports
before submission to the next administrative level.
11. Trace and Verify The steps in PHASE 4 are estimated to take between one-
Results from Source half and two days. More than one day may be required for
Documents large sites (with reported numbers in the several hundreds),
sites that include satellite centers, or when “spot-checks”
are performed.
In Step 10, the Audit Team conducts the assessment of the data management and reporting system
at a selection of Service Delivery Sites at which services are rendered and recorded on source
documents. Data from Service Delivery Sites are then aggregated and manipulated before being
reported to the Intermediate Aggregation Levels. Specific instructions for completing the Service
Delivery Site worksheet of the DQA Protocol 1: System Assessment Protocol are found in the
Excel file of the protocol.
At the Service Delivery Site, each indicator-specific protocol begins with a description of the service(s)
provided in order to orient the Audit Team towards what is being “counted” and reported. This will
help lead the Audit Team to the relevant source documents at the Service Delivery Point, which can
be significantly different for various indicators (e.g., patient records, registers, training logs).
Regardless of the indicator being verified or the nature of the Service Delivery Site (health based/
clinical or community-based), the Audit Team will perform some or all of the following data
verification steps (Step 11 – Table 1) for each selected indicator:
3. Trace and Trace and verify reported numbers: (1) Recount the reported In all cases
Verification numbers from available source documents; (2) Compare the
verified numbers to the site reported number; (3) Identify reasons
for any differences.
4. Cross-checks Perform “cross-checks” of the verified report totals with other data- In all cases
sources (e.g. inventory records, laboratory reports, other registers, etc.).
5. Spot-checks Perform “spot-checks” to verify the actual delivery of services and/ If feasible
or commodities to the target populations.
Before starting the data verifications, the Audit Team will need to understand and describe the
recording and reporting system related to the indicator being verified at the Service Delivery
Site (i.e., from initial recording of the service delivery on source documents to the reporting of
aggregated numbers to the next administrative level).
1. DESCRIPTION – Describe the connection between the delivery of the service and/or
commodity and the completion of the source document. This step will give the Audit Team
a “frame of reference” for the link between the service delivery and recording process, and
obtain clues as to whether outside factors such as time delays and/or competing activities
could compromise the accurate and timely recording of program activities.
Note that the indicator-specific protocols have listed likely source document(s). If the Audit
Team determines that other source documents are used, the team can modify the protocol(s)
accordingly and document in its work papers the change that has been made to the protocol.
The Audit Team will need to maintain strict confidentiality of source documents.
3. TRACE AND VERIFICATION – Recount results from source documents, compare the
verified numbers to the site reported numbers and explain discrepancies.
STATISTIC: Calculate the Result Verification Ratio for the Service Delivery Site.
Possible reasons for discrepancies could include simple data entry or arithmetic errors. The
Audit Team may also need to talk to data reporting staff about possible explanations and
follow-up with program data-quality officials if needed. This step is crucial to identifying
ways to improve data quality at the Service Delivery Sites. It is important to note that the Audit
Team could find large mistakes at a site “in both directions” (i.e., over-reporting and under-
reporting) that results in a negligible difference between the reported and recounted figures
— but are indicative of major data quality problems. Likewise, a one-time mathematical error
could result in a large difference. Thus, in addition to the Verification Factor calculated for the
site, the Audit Team will need to consider the nature of the findings before drawing conclusions
about data quality at the site.
4. CROSS-CHECKS – Perform feasible cross-checks of the verified report totals with other
data sources. For example, the team could examine separate inventory records documenting
the quantities of treatment drugs, test-kits, or ITNs purchased and delivered during the reporting
period to see if these numbers corroborate the reported results. Other cross-checks could
include, for example, comparing treatment cards to unit, laboratory, or pharmacy registers.
The Audit Team can add cross-checks to the protocol, as appropriate.
As noted above, while the five data verification steps of the DQA Protocol 2: Data Verification
Protocol should not change5 within each verification step the protocol can be modified to better
fit the program context (e.g., add cross-checks, modify the reference source document). Major
modifications should be discussed with the Organization Commissioning the DQA.
Note: In no circumstances should the Audit Team record personal information, photocopy, or
remove documents from sites.
5
1. description, 2. documentation review, 3. trace and verification, 4. cross-checks, 5. spot-checks.
In the fifth phase of the DQA, the Audit Team will return to the
PHASE 5 program/project M&E Unit. The steps in PHASE 5 are to:
By Step 10, the Excel file worksheets of the DQA Protocol 1: System Assessment Protocol related
to the M&E Unit, the Intermediate Aggregation Levels, and the Service Delivery Sites will have
been completed. Based on all responses to the questions, a summary table (Step 12 – Table 1) will
be automatically generated, as will a summary graphic of the strengths of the data management and
reporting system (Step 12 – Figure 1). The results generated will be based on the number of “Yes,
completely,” “Partly,” and “No, not at all” responses to the questions on the DQA Protocol 1:
System Assessment Protocol.
Step 12 – Table 1. Summary Table: Assessment of Data Management and Reporting System
(Illustration)
I II III IV V
SUMMARY TABLE
(per site)
M&E Indicator Data- Links with
Average
Assessment of Data Data
Structure, Definitions Collection and National
Management Management
Functions, and and Reporting Reporting Reporting
and Reporting Systems Processes
Capabilities Guidelines Forms/Tools System
M&E Unit
- National M&E Unit 1.80 1.83 1.80 1.82 1.67 1.78
Intermediate Aggregation Level Sites
1 Collines 2.67 2.50 1.67 1.78 2.00 2.12
2 Atakora 3.00 2.25 1.33 1.67 2.50 2.15
3 Borgu 2.33 2.00 1.67 1.90 2.50 2.08
Service Delivery Points/Organizations
1.1 Savalou 2.67 2.00 1.67 1.86 2.00 2.04
1.2 Tchetti 2.00 2.25 1.67 2.13 2.00 2.01
1.3 Djalloukou 2.67 1.75 1.67 2.00 2.25 2.07
2.1 Penjari 2.33 2.00 2.00 1.86 2.50 2.14
2.2 Ouake 2.67 2.25 1.67 1.88 2.50 2.19
2.3 Tanagou 2.67 2.75 1.67 1.88 2.75 2.34
3.1 Parakou 2.33 2.00 2.00 1.86 2.25 2.09
3.2 Kandi 2.33 2.25 1.67 2.00 2.25 2.10
3.3 Kalale 2.67 2.25 1.67 1.88 2.50 2.19
Average (per 2.46 2.15 1.76 1.92 2.30 2.12
functional area)
Color Code Key
Green 2.5 - 3.0 Yes, Completely
Yellow 1.5 - 2.5 Partly
Red < 1.5 No, Not at All
Interpretation of the Output: The scores generated for each functional area on the Service
Delivery Site, Intermediate Aggregation Level, and M&E Unit pages are an average of the
responses which are coded 3 for “Yes, completely,” 2 for “Partly,” and 1 for “No, not at all.”
Responses coded “N/A” or “Not Applicable,” are not factored into the score. The numerical value
of the score is not important; the scores are intended to be compared across functional areas as a
means to prioritizing system strengthening activities. That is, the scores are relative to each other
and are most meaningful when comparing the performance of one functional area to another. For
example, if the system scores an average of 2.5 for ‘M&E Structure, Functions and Capabilities’
and 1.5 for ‘Data-collection and Reporting Forms/Tools,’ one would reasonably conclude that
resources would be more efficiently spent strengthening ‘Data-collection and Reporting Forms/
Tools’ rather than ‘M&E Structure, Functions and Capabilities.’ The scores should therefore not
be used exclusively to evaluate the information system. Rather, they should be interpreted within
the context of the interviews, documentation reviews, data verifications, and observations made
during the DQA exercise.
Using these summary statistics, the Audit Team should answer the 13 overarching questions on
the Audit Summary Question Worksheet of the protocol (see Step 12 – Table 2). To answer
these questions, the Audit Team will have the completed DQA Protocol 1: System Assessment
Protocol worksheets for each site and level visited, as well as the summary table and graph of
the findings from the protocol (see Step 12 – Table 1 and Figure 1). Based on these sources of
information, the Audit Team will need to use its judgment to develop an overall response to the
Audit Summary Questions.
Program Area:
Indicator:
Answer
Yes - completely
Question Partly Comments
No - not at all
N/A
By Step 12, the Audit Team will have completed both the system assessment and data verification
protocols on selected indicators. In preparation for its close-out meeting with the M&E Unit, in
Step 13 the Audit Team drafts Preliminary Findings. Recommendation Notes for data quality
issues found during the audit. Annex 3, Step 13 – Template 1 provides a format for those
Recommendation Notes. These findings and issues are presented to the program/project M&E
Unit (Step 14) and form the basis for the Audit Report (Steps 15 and 17). The Audit Team should
also send a copy of the Preliminary Findings and Recommendation Notes to the Organization
Commissioning the DQA.
The preliminary findings and Recommendation Notes will be based on the results from the DQA
Protocol 1: System Assessment Protocol and the DQA Protocol 2: Data Verification Protocol
and will be developed by the Audit Team based on:
• The notes columns of the protocols in which the Audit Team has explained findings
related to: (1) the assessment of the data-management and reporting system; and (2) the
verification of a sample of data reported through the system. In each protocol, the final
column requests a check (√) for any finding that requires a Recommendation Note.
• Work papers further documenting evidence of the Audit Team’s data quality audit
findings.
The findings should stress the positive aspects of the program/project M&E system as it relates
to data management and reporting as well as any weaknesses identified by the Audit Team. It is
important to emphasize that a finding does not necessarily mean that the program/project is deficient
in its data collection system design or implementation. The program/project may have in place a
number of innovative controls and effective steps to ensure that data are collected consistently and
reliably.
Nevertheless, the purpose of the Data Quality Audit is to improve data quality. Thus, as the Audit
Team completes its data management system and data verification reviews, it should clearly
identify evidence and findings that indicate the need for improvements to strengthen the design
and implementation of the M&E system. All findings should be backed by documentary evidence
that the Audit Team can cite and provide along with its Recommendation Notes.
Examples of findings related to the design and implementation of data collection, reporting and
management systems include:
• The lack of documentation describing aggregation and data manipulation steps.
• Unclear and/or inconsistent directions provided to reporting sites about when or to whom
report data is to be submitted.
• The lack of designated staff to review and question submitted site reports.
Examples of findings related to verification of data produced by the system could include:
• A disconnect between the delivery of services and the filling out of source documents.
• Incomplete or inaccurate source documents.
• Data entry and/or data manipulation errors.
• Misinterpretation or inaccurate application of the indicator definition.
Step 13 – Table 1. Illustrative Findings and Recommendations for Country X’s TB Treatment Program:
Number of Smear Positive TB Cases Registered Under DOTS Who Are Successfully Treated
Country X runs an organized and long-established TB treatment program based on international treatment
standards and protocols. The processes and requirements for reporting results of the TB program are
specifically identified and prescribed in its Manual of the National Tuberculosis and Leprosy Programme.
The Manual identifies required forms and reporting requirements by service sites, districts, and regions.
Based on information gathered through interviews with key officials and a documentation review, the
Data Quality Audit Team identified the following related to improving data quality.
RECOMMENDATION: That the MOH reinforce its requirement that submitted reports contain a
supervisory signature, perhaps by initially rejecting reports that have not been reviewed.
RECOMMENDATION: That the program office develop a specific document retention policy for
TB program source and key reporting documents in its new reporting system.
RECOMMENDATION: That the program identify steps to eliminate data entry errors wherever
report numbers are entered into the electronic reporting system.
RECOMMENDATION: That TB Service Delivery Sites should systematically file and store TB
treatment source documents by specific reporting periods so that they can be readily retrieved for
audit purposes.
At the conclusion of the site visits, the Audit Team Leader should conduct a closeout meeting with
senior program/project M&E officials and the Director/Program Manager to:
1. Share the results of the data-verifications (recounting exercise) and system review;
2. Present the preliminary findings and Recommendation Notes; and
3. Discuss potential steps to improve data quality.
A face-to-face closeout meeting gives the program/project’s data management staff the opportunity
to discuss the feasibility of potential improvements and related timeframes. The Audit Team Leader
should stress, however, that the audit findings at this point are preliminary and subject to change
once the Audit Team has had a better opportunity to review and reflect on the evidence collected
on the protocols and in its work papers.
The Audit Team should encourage the program/project to share relevant findings with the appropriate
stakeholders at the country-level such as multi-partner M&E working groups and the National
program, as appropriate. The Audit Team should also discuss how the findings will be shared
by the program/project M&E officials with the audited Service Delivery Sites and Intermediate
Aggregation Levels (e.g., Regions, Districts).
As always, the closeout meeting and any agreements reached on the identification of findings
and related improvements should be documented in the Audit Team’s work papers in order to be
reflected in the Final Audit Report.
The last phase of the DQA takes place at the offices of the
PHASE 6 DQA Team, and in face-to-face or phone meetings with the
Organization Commissioning the DQA and the program/
project. The steps in PHASE 6 are to:
Off-Site
(Completion) 15. Draft Audit Report.
16. Discuss the Draft Audit Report with the program/project
and with the Organization Commissioning the DQA.
17. Complete the Final Audit Report and communicate
the findings, including the final Recommendation
15. Draft Audit Note(s), to the program/project and the Organization
Report Commissioning the DQA.
18. As appropriate, initiate follow-up procedures to ensure
that agreed upon changes are made.
16. Review and Col- The steps in PHASE 5 are estimated to take between two
lect Feedback from and four weeks.
Country and Orga-
nization Commis-
sioning the DQA
17. Finalize
Audit Report
18. Initiate
Follow-up of
Recommended
Actions
Within 1-2 weeks, the Audit Team should complete its review of all of the audit documentation
produced during the mission and complete a draft Audit Report with all findings and suggested
improvements. Any major changes in the audit findings made after the closeout meeting in country
should be clearly communicated to the program/project officials. The draft of the Audit Report
will be sent to the program/project management staff and to the Organization Commissioning the
DQA. Step 15 – Table 1 shows the suggested outline for the Audit Report.
Step 15 – Table 1: Suggested Outline for the Final Data Quality Audit Report
Section Contents
I Executive Summary
II Introduction and Background
Purpose of the DQA
Background on the program/project
Indicators and Reporting Period – Rationale for selection
Service Delivery Sites – Rationale for selection
Description of the data-collection and reporting system (related to the
indicators audited)
III Assessment of the Data Management and Reporting System
Description of the performed system assessment steps
Dashboard summary statistics (table and spider graph of functional areas – Step
12: Table 1 and Figure 1)
Key findings at the three levels:
{{ Service Delivery Sites
{{ Intermediate Aggregation Levels
{{ M&E Unit
Overall strengths and weaknesses of the Data-Management System (based on 13
Summary Audit Questions)
To build consensus and facilitate data quality improvements, the Audit Team needs to share the
draft Audit Report with the Organization Commissioning the DQA and with the program/project
management and M&E staff. The program/project will be given an opportunity to provide
a response to the audit findings. This response will need to be included in the Final Audit
Report.
Once the program/project and the Organization Commissioning the DQA have reviewed the Draft
Audit Report (given a time limit of two weeks, unless a different time period has been agreed) and
provided feedback, the Audit Team will complete the Final Audit Report. While the Audit Team
should elicit feedback, it is important to note that the content of the Final Audit Report is
determined by the Audit Team exclusively.
Step 18 can be performed by the Organization Commissioning the DQA and/or the Audit Team.
The program/project will be expected to send follow-up correspondences once the agreed upon
changes/improvements have been made. If the Organization Commissioning the DQA wants the
Audit Team to be involved in the follow-up of identified strengthening measures, an appropriate
agreement may be reached. The Organization Commissioning the DQA and/or the Audit Team
should maintain a “reminder” file to alert itself as to when these notifications are due (see ANNEX
3, Step 19 – Template 1). In general, minor data quality issues should be remedied in one to
six months and major issues in six to twelve months.
Supporting documen-
level at which the
tation required?
question is asked
Component of the M&E System
Aggregation
M&E Unit
Service
Levels
Points
I – M&E Structure, Functions, and Capabilities
There is a documented organizational structure/chart that clearly
1 identifies positions that have data management responsibilities at √ Yes
the M&E Unit.
All staff positions dedicated to M&E and data management
2 √ -
systems are filled.
There is a training plan which includes staff involved in data-
3 √ Yes
collection and reporting at all levels in the reporting process.
All relevant staff have received training on the data management
4 √ √ √ -
processes and tools.
A senior staff member (e.g., the Program Manager) is responsible
5 for reviewing the aggregated numbers prior to the submission/ √ -
release of reports from the M&E Unit.
There are designated staff responsible for reviewing the quality
6 of data (i.e., accuracy, completeness and timeliness) received √ √ -
from sub-reporting levels (e.g., regions, districts, service points).
There are designated staff responsible for reviewing aggregated
7 numbers prior to submission to the next level (e.g., to districts, to √ √ -
regional offices, to the central M&E Unit).
The responsibility for recording the delivery of services on source
8 √ -
documents is clearly assigned to the relevant staff.
II – Indicator Definitions and Reporting Guidelines
The M&E Unit has documented and shared the definition of the
9 indicator(s) with all relevant levels of the reporting system (e.g., √ Yes
regions, districts, service points).
There is a description of the services that are related to each
10 √ Yes
indicator measured by the program/project.
The M&E Unit has provided written guidelines to each sub-reporting level on …
11 … what they are supposed to report on. √ √ √ Yes
12 … how (e.g., in what specific format) reports are to be submitted. √ √ √ Yes
13 … to whom the reports should be submitted. √ √ √ Yes
14 … when the reports are due. √ √ √ Yes
There is a written policy that states for how long source
15 √ Yes
documents and reporting forms need to be retained.
Supporting documen-
level at which the
tation required?
question is asked
Component of the M&E System
Aggregation
M&E Unit
Service
Levels
Points
III – Data-collection and Reporting Forms/Tools
The M&E Unit has identified a standard source document (e.g.,
16 medical record, client intake form, register, etc.) to be used √ Yes
by all Service Delivery Points to record service delivery.
The M&E Unit has identified standard reporting forms/tools to be
17 √ Yes
used by all reporting levels.
Clear instructions have been provided by the M&E Unit on
18 √ √ √ Yes
how to complete the data collection and reporting forms/tools.
The source documents and reporting forms/tools specified by
19 √ √ -
the M&E Unit are consistently used by all reporting levels.
If multiple organizations are implementing activities under the
20 program/project, they all use the same reporting forms and report √ √ √ -
according to the same reporting timelines.
The data collected by the M&E system has sufficient precision
to measure the indicator(s) (i.e., relevant data are collected by
21 √ -
sex, age, etc., if the indicator specifies disaggregation by these
characteristics).
All source documents and reporting forms relevant for measuring
22 the indicator(s) are available for auditing purposes (including √ √ √ -
dated print-outs in case of computerized system).
IV – Data Management Processes
The M&E Unit has clearly documented data aggregation, analysis
23 and/or manipulation steps performed at each level of the reporting √ Yes
system.
There is a written procedure to address late, incomplete,
24 inaccurate, and missing reports; including following-up with sub- √ √ Yes
reporting levels on data quality issues.
If data discrepancies have been uncovered in reports from sub-
reporting levels, the M&E Unit or the Intermediate Aggregation
25 √ √ -
Levels (e.g., districts or regions) have documented how these
inconsistencies have been resolved.
Feedback is systematically provided to all sub-reporting levels
26 on the quality of their reporting (i.e., accuracy, completeness, and √ √ -
timeliness).
There are quality controls in place for when data from paper-
27 based forms are entered into a computer (e.g., double entry, post- √ √ √ -
data entry verification, etc).
Supporting documen-
level at which the
tation required?
question is asked
Component of the M&E System
Aggregation
M&E Unit
Service
Levels
Points
For automated (computerized) systems, there is a clearly
documented and actively implemented database administration
28 √ √ √ Yes
procedure in place. This includes backup/recovery procedures,
security administration, and user administration.
There is a written back-up procedure for when data entry or data
29 √ √ √ Yes
processing is computerized.
If yes, the latest date of back-up is appropriate given the
30 frequency of update of the computerized system (e.g., backups √ √ √ -
are weekly or monthly).
Relevant personal data are maintained according to national or
31 √ √ √ -
international confidentiality guidelines.
The reporting system avoids double counting people …
… within each point of service/organization (e.g., a person
receiving the same service twice in a reporting period, a person
32 √ √ √ -
registered as receiving the same service in two different locations,
etc).
… across service points/organizations (e.g., a person registered
33 as receiving the same service in two different service points/ √ √ √ -
organizations, etc).
The reporting system enables the identification and recording of a
34 √ √ √ -
“drop out,” a person “lost to follow-up,” and a person who died.
The M&E Unit can demonstrate that regular supervisory site
35 √ Yes
visits have taken place and that data quality has been reviewed.
V – Links with National Reporting System
When available, the relevant national forms/tools are used for
36 √ √ √ Yes
data-collection and reporting.
When applicable, data are reported through a single channel of
37 √ √ √ -
the national information systems.
Reporting deadlines are harmonized with the relevant timelines of
38 √ √ √ -
the National program (e.g., cut-off dates for monthly reporting).
The service sites are identified using ID numbers that follow a
39 √ √ √ -
national system.
Disease: AIDS
Ranking of Results reported
Program Area
Program Area Program Area
Countries (or Treatment Behavioral Change
OVC
programs/projects) Communication Notes/
Dollar Investment
(ranked by Dollar Comments
Invested) Indicator 2 Indicator 3
Indicator 1 Number of Number of OVC
People on ARV Condoms Receiving Care and
Distributed Support
2 4 8
Country X $66 Million
(6,500) (3 million) (1,879)
1 10
Country Y $52 Million NA
(7,000) (1,254)
70
Annex 2 – Step 1. Template 2. Illustrative Analysis of the Relative Magnitude of the Investments and Indicator Results per
Program Area
Program/Project: _____________
% of Targets
% of Total Target or
$ Invested in the Key Indicator in the or Results Notes/
Program Area Invested in the Reported Result
Program Area Program Area Reported in the Comments
Program/Project for the Indicator
Country
71
Annex 2 – Step 1. Template 3. Documentation of the Selection of the Country, Disease/Health Area, Program/Project(s), Program
Area and Indicators
72
Annex 2 – Step 2. Template 1. Notification and Documentation Request Letter to the Selected
Program/Project
Date
Address
Dear__________________:
[Your organization] has been selected for a Data Quality Audit by [name of Organization
Commissioning the Audit] related to [Program/Project name].
The purpose of this audit is to: (1) assess the ability of the data management systems of the program/
project(s) you are managing to report quality data; and (2) verify the quality of reported data for
key indicators at selected sites. [Name of Audit Agency] will be conducting the audit and will
contact you soon regarding the audit.
This Data Quality Audit relates to [disease], [program area] and the verifications will focus on the
following indicators:
1 [indicator name]
2 [indicator name]
Prior to the audit taking place, [list name of Audit Agency] will need:
A list of all the Service Delivery Sites with the latest reported results (for the above
indicators);
The completed Template 2 (attached to this letter) describing the data-collection and
reporting system (related to the above indicators);
Data-collection and reporting forms (related to the above indicators).
This information is critical for beginning the audit, therefore it is requested within two weeks of
receipt of this letter and should be sent to [address of Audit Agency].
To help the Audit Team perform the initial phase of the review of your overall data management
system and to limit the team’s on-site presence to the extent possible, we also request that you
provide the Audit Agency with the existing and available documentation listed in Table 1 (attached
to this letter).
Thank you for submitting the requested documentation to ___________ at ______ by _________.
If any of the documentation is available in electronic form it can be e-mailed to _____________.
Because the time required for the audit depends on the number and location of sampled sites, the
Audit Agency will contact you with more specific information regarding timing after the sample
of sites has been selected. However, you should anticipate that the audit will last between 10 and
15 days (including two days at the M&E Unit and around one day per Service Delivery Site and
Intermediate Aggregation Level — e.g., Districts or Regions).
Finally, since the Audit Team will need to obtain and review source documents (e.g., client records
or registration logs/ledger), it is important that official authorization be granted to access these
documents. However, we would like to assure you that no details related to individuals will be
recorded as part of the audit — the team will only seek to verify that the counts from “source
documents” related to the service or activity are correct for the reporting period. The personal
records will neither be removed from the site nor photocopied.
We would like to emphasize that we will make every effort to limit the impact our audit will have
on your staff and ongoing activities. In that regard, it would be very helpful if you could provide
the Audit Agency with a key contact person early on in this process (your chief data management
official, if possible) so we can limit our communications to the appropriate person. If you have any
questions please contact ___________ at ____________.
Sincerely,
Check if
Functional
General Documentation Requested provided
Areas
√
Contact • Names and contact information for key program/project officials,
Information including key staff responsible for data management activities.
Please complete this template form for each indicator being verified by the Data Quality Audit (DQA)
Indicator Name
Indicator Definition
1. Is there a designated person responsible for data management and analysis at the M&E
Yes No
Management Unit at Central Level?
1.1. If “Yes,” please give the name and e-mail address of the contact person: Name
e-mail
2. Is there a standardized national form that all Service Delivery Points use to record the
Yes No
delivery of the service to target populations?
2.1. If “No,” how many different forms are being used by the Service Delivery Points? Number
3. What is the name of the form(s) used by the Service Delivery Points?
Name of the Form(s)
4. What are the key fields in the form that are relevant for the indicator? Field 1
Field 2
Field 3
Field 4
Please add …
76
REPORTING FROM SERVICE DELIVERY POINTS UP TO THE NATIONAL M&E UNIT (through any intermediary levels – Districts,
Regions, etc.)
5. Please use this table to explain the reporting process in your country. In the first row, provide information about reports which are received
in the central office. Show where those reports come from, how many you expect for each reporting period, and how many times per year
you receive these reports.
Number of senders
Number of times reports
(i.e. if reports are sent by
Reports received by: Sender are received each year
districts, put the number of
(i.e. quarterly = 4 times)
districts here)
6. What is the lowest level for which you have data at the M&E Management Unit at Central Level?
Other … [please
Individual patients Health facilities Districts Region
specify]
Other … [please
Health facilities Districts Region National
specify]
Finally, please attach the templates of the (1) source document; and (2) reports received by each level.
77
Annex 2 – Step 2. Template 3. Letter to Request National Authorization for the DQA
Date
Dear__________________:
As part of its ongoing oversight activities, [name of Organization Commissioning the Audit] has selected
[program/project(s)] in [country] for a Data Quality Audit. Subject to approval, the Data Quality Audit will
take place between [months and ], [Year].
The purpose of this Data Quality Audit is to assess the ability of the program’s data management system
to report quality data and to trace and verify reported results from selected service sites related to the
following indicators:
1 [indicator name]
2 [indicator name]
[Name of auditing firm] has been selected by [name of Organization Commissioning the Audit] to carry out
the Data Quality Audit.
Conducting this Data Quality Audit may require access to data reported through the national data reporting
system on [Disease and Program Area]. The audit will include recounting data reported within selected
reporting periods, including obtaining and reviewing source documents (e.g. client records or registration
logs/ledgers, training log sheets, commodity distribution sheets). While the Audit Team will potentially
require access to personal patient information, the Team will hold such information in strict confidence and
no audit documentation will contain or disclose such personal information. The purpose of access to such
information is strictly for counting and cross-checking purposes related to the audit. When necessary, the
Audit Team will need to access and use such information at Service Delivery Sites. The personal records
will neither be removed from the site nor photocopied.
If you have any questions about this Data Quality Audit, please contact ______ at ________.
[Name of Organization Commissioning the Audit] hereby formally requests approval to conduct this Data
Quality Audit.
Please indicate approved or not approved below (with reasons for non-approval) and return this letter to
______________________ at ________________________.
Sincerely, Date:
Title
WARNING: In no circumstances should reports be fabricated for the purpose of the audit.
Annex 3, Step 2 – Template 2. Information Sheet for the Intermediate Aggregation Levels
Selected for the DQA
WARNING: In no circumstances should reports be fabricated for the purpose of the audit.
Site Manager.
Staff responsible for completing the source documents (e.g., patient treatment cards, clinic registers, etc.).
Staff responsible for entering data in registers or computing systems (as appropriate).
Staff responsible for compiling the periodic reports (e.g., monthly, quarterly, etc.).
Reported results to the next level for the selected reporting period (see Point 3 above).
All source documents for the selected reporting period, including source documents from auxiliary/
peripheral/satellite sites (see Point 3 above).
Description of aggregation and/or manipulation steps performed on data submitted to the next level.
WARNING: In no circumstances should source documents or reports be fabricated for the purpose of the
audit.
Check when
No. Item completed
(√)
1 Letter of authorization
10 Other
Contact Person:
Recommended Action for Correction (complete prior to closeout meeting with the program/project):
Final Recommended Action (complete after closeout meeting with the program/project):
The data quality dimensions are: Accuracy, reliability, precision, completeness, timeliness, integrity, and confidentiality.
6
Contact Person:
In the following example, Sampling Strategy D (modified two-stage cluster sample) is used to
draw a sample of ART sites in “Our Country” in order to derive an estimate of data quality at
the national level. In a cluster sampling design, the final sample is derived in stages. Each stage
consists of two activities: (1) listing; and (2) sampling. Listing means drawing a complete list of
all the elements from which a number will be selected. Sampling is when a pre-determined number
of elements are chosen at random from the complete listing of elements. A sample is only as good
as the list from which it is derived. The list, also called a sampling frame, is “good” (valid) if it is
comprehensive, i.e. it includes all the known elements that comprise the population of elements.
For ART sites in a country, a good sampling frame means that every single ART site in the country
is properly identified in the list.
1. Illustrative Indicator for this application = Number of Individuals Receiving Anti-Retroviral
Therapy (ART)
2. Audit Objective: to verify the consistency of Our Country’s national reports of ART progress
based on administrative monitoring systems.
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
6. Sampling Frame for Stage 1: The list in Annex 4, Table 2 on the following page is called
a sampling frame. It contains a complete list of districts that are relevant for auditing ART
sites, because only the districts in which ART sites are located are included in the list.
7. The first column of the frame contains a simple numbering scheme beginning with “1” and
ending with the final element in the list, which in this case is 12, because only 12 districts in
“Our Country” contain ART sites.
8. The second column of the frame contains the number of the district that corresponds to the
illustrative grid display shown in the previous table. These were the highlighted cells that
showed which districts contained ART sites. Column 2 (District Number) does not list the
selected districts. Rather, it lists only those districts in “Our Country” where ART sites are
located. The sample of three districts will be drawn from Column 2.
9. The third column shows how many ART sites are located in each district. This is important
because the selection of districts will be proportional to the number of individuals receiving
ART in each district.
10. The next step in this stage of sampling is to use the sampling frame to select the three districts
where the auditors will conduct the audit at specific ART sites. We are attempting to estimate
a parameter (data quality) for all the districts/sites in the country using a select few. Therefore
we would like that the few we select be as ‘typical’ as possible so as to provide an estimate
as close to the actual value as possible. Some districts may contribute more, or less to the
average of data quality in the whole country. Since we are interested in selecting districts
that are representative of all districts with ART sites in the country, and we know that some
districts with ART sites may not be typical (or representative) of all districts with ART sites,
we need to ensure that districts with a high volume of service (which contribute more to the
average data quality of all districts) are included in our sample. Therefore, the sampling
technique will select districts using “probability proportionate to size.”
11. In other words, the chance of a district being selected for the audit depends on the number of
individuals being treated in the district. This information can be found in column 4 of Annex
4, Table 2: “Number of Individuals Receiving ART per District.” Usually this number
corresponds to quarterly reports.
12. One way to link the probability of selection of a district to the volume of service is to inflate
the sampling frame according to the number of individuals receiving ART in each district.
For example, if in District 1 a total of 300 individuals are receiving ART, then District 1
should be listed in the sampling frame 300 times.
13. To make this easier, divide the values in Column 4 (Number of Individuals Receiving ART)
by 10. For example, now District 1 should appear 30 times instead of 300 times. District
3 should appear 10 times instead of 100 times, and so on. This inflated sampling frame is
shown on in Table 3 of this section.
Annex 4, Table 4. The Four Selected Districts and the Listing of ART Sites within District 12
24. The task is now to select three ART sites in each of the selected districts. But, as can be seen,
District 1 only has two ART sites; District 16 has three sites; and District 26 has five sites.
25. Depending on the population distribution of the country and the epidemiology of the disease
of interest, there may be many sites per district, or comparatively few. Given the relative
maturity of TB programs and the generalized distribution of both TB and Malaria, sites with
programs addressing these diseases are likely to be fairly numerous per district. On the other
hand, sites with HIV/AIDS programs will be relatively few, particularly in countries with
low prevalence or countries with concentrated epidemics (i.e., cases found primarily in high
risk groups). In our ART example there are very few sites per district. With these small
Note: the combination of number of clusters and number of sites within clusters is not fixed; rather,
this combination should be based on the distribution of sites across a programmatic landscape.
Fewer sites per district can be selected when volume of services is heavily concentrated. For
example, in “Our Country” we could have selected four districts and then two sites per district in
order to ensure more geographical representation of sites. While increasing the number of districts
in the sample leads to greater statistical power of the analysis (i.e., greater precision of the estimate
of data quality), the expense and time required for traveling to the additional districts will likely
out-weigh the marginal improvement in precision (see Woodard et al.7 for a discussion on the
precision of estimates using the GAVI DQA sampling methodology).
The total number of clusters and sites will be determined by the Organization Commissioning the
DQA in consultation with the Auditing Agency, but is ultimately dependent upon the resources
available to conduct the Data Quality Audit. The main constraints in this regard are: (1) the time that
an Audit Team can devote to the in-country work; (2) the composition (number and training) of the
audit team in-country; and (3) the funding available to support the implementation of the audit.
Accurate statistics in this case mean that the verification factors that are calculated for the sampled
districts are representative of the verification factors for all the districts that were not selected into
the data quality audit sample.
In other words, random sampling allows the DQA team to estimate a national Verification Factor
by verifying reported counts in only a fraction of the total (national) number of sites. How good is
this estimation? How closely do the results found by the auditors at this fraction of sites represent
the results that might be found for the whole?
7
Woodard S., Archer L., Zell E., Ronveaux O., Birmingham M. Design and Simulation Study of the Immunization
Data Quality Audit (DQA). Ann Epidemiol, 2007;17:628–633.
On the other hand, if the true national verification factor is 0.50, then it probably reflects a
combination of good and poor data quality across all sites in the country. It would take a larger
sample to ensure that enough of these “good” and “bad” sites were represented in the sample just
as they are distributed overall in the country.
The sampling error is a mathematical construct that permits the calculation of confidence intervals.
It specifically relates to the number of standard deviations (plus or minus) that your sample results
deviate from the “true” results (the parameter). Most statistical textbooks have tables of sampling
errors in appendix form, where the specific value of the sampling error is indicated according to
sample size and variability of the parameter.
The key to reducing sampling errors in the context of the data quality audit is to remember that
sample size is not how many clusters (e.g. districts) are in the sample, nor is it how many sites are
in the sample; rather, sample size pertains to how many instances of a health service (a visit to the
site by an ART patient) are recorded at the site.
In Annex 4, we use an example where three districts are selected and three sites are selected per
district. The auditors are verifying reported counts of ART patients receiving ART services at the
selected sites. The total reported number of ART patients is 1,400. This is the actual number that
the data quality auditors are attempting to verify and it constitutes an effective sample size when
considering statistical issues of sample accuracy.
How big is this sample? In Uganda, the total reported number of individuals receiving ART
services directly from sites in 2005 was 49,600. Fourteen hundred individuals is about three
percent of that total, which under most conditions is a reasonable sample size for that population.
In Nigeria, the total direct number of individuals reached with ART services was 18,900 in 2005.
For Nigeria our hypothetical sample size of 1,400 individuals represents about eight percent of the
total – an 8% sample is robust in most applications.
So unless a country has a very large number of sites where important health services are occurring
(e.g., South Africa, Kenya, Uganda), it is usually possible to capture a robust fraction of services
by visiting 8-12 sites using a probability proportionate to size methodology.
That said, it is possible to gain an insight into the overall quality of data in a program/project
without reliance on the national estimate of verification factor. The qualitative aspects of the DQA
are adequate to determine the strengths and weaknesses of a given reporting system. For example,
if indicator definitions are poorly understood in a majority of a representative sample of sites, it is
quite likely that indicator definitions are poorly understood in non-sampled districts as well. The
recounting of indicators and comparison with reported values for a sample of sites is similarly
adequate to determine in a general sense whether data quality is good, mediocre or poor, even
without the benefit of a precise national estimate. Missing reports or large disparities between
recounted and reported results in a handful of sites is indicative of similar disparities elsewhere.
Ultimately, the national verification factor should be interpreted with caution. For the purposes of
the Data Quality Audit, it should be used as an indication of data quality (or lack of data quality),
rather than an exact measure.
From The Rand Corporation, A Million Random Digits with 100,000 Normal Deviates
(New York: The Free Press, 1955)
The use of Verification Factors can be applied to the full set of health indicators that this Data
Quality Audit Tool is designed to cover — provided that the sampling strategy used by the
Audit Team is statistically representative of the country-wide program (or an important
subset of the country-wide program) and that the actual number of sites in the sample is
large enough to generate robust estimates of reporting consistency.
The Verification Factor is an indicator of reporting consistency that is measured at three levels:
(1) the Service Delivery Site level; (2) the district administrative level; and (3) the national
administrative level. It is often called a district-based indicator of reporting consistency because
the primary sampling units for estimating Verification Factors are districts (or ‘intermediate
aggregation levels’). It can also be referred to as a district-based indicator because in the GAVI
approach Verification Factors are constructed at the district level and at the national level.
Step One:
This result equals the proportion of reported counts at a selected site that is verified by the Audit
Team. This result can be called the Verified Site Count.
This result equals the proportion of the selected cluster or district-level reporting that is completely
consistent with the national-level reporting. This result is called the cluster consistency ratio, or
Adjustment Factor.
The adjustment factor answers the following question: “Were the results reported at the selected
district level (for all sites in the selected district — not just those sites that were visited by the Audit
Team) exactly the same as the results (for the selected district) that were observed at the national
level?”
Step Three:
For each sampled district, sum the recounted values for the audited sites and divide by the sum
of the reported values for the audited sites. Multiply this result for each sampled district by the
adjustment factor appropriate for each district. This result, when further adjusted with “district”
weights as shown below, is the National Verification Factor.
It is important to remember that the units of time should be equivalent across each of the factors
used to calculate the Verification Factor. What this means is that if the auditor is tracing and
verifying reported results for the past 12 months at a selected site, then this time period (past 12
months) should be used as the basis for the other factors in the equation.
where
j = selected site (j = 1, 2, 3)
Xij = the validated count from the jth site of the ith district
Yij = the observed reported count from the jth site of the ith district
Rdi = at the district level, the reported count from all the sites in the ith district that were prepared
for submission to the national level
Rni = at the national level, the observed count as reported from the ith district.
In order to derive a National Verification Factor, it is necessary to first calculate Verification Factors
at the district level. The national Verification Factor is calculated as the weighted average of the
district Verification Factors.
The example showing how Verification Factors are derived assumes that the Data Quality Audit
Team is working in the three districts that were selected in the random sample section outlined
previously. These three districts (1, 16, 26) and the ART sites embedded within them are shown
in Annex 5, Table 1.
Annex 5, Table 1. The Flow of Reported ART Counts from the Selected Site Level
Up to the Selected District ( i = 1, 16, 26) Level and Up to the National Level
1 16 26
(300) (500) (600)
1 2 3 4 5 6 7 8 NA* 9
(150) (150) (100) (350) (50) (200) (100) (100) (100) (100)
Site Level: Selected Site Identification Number (j ) and Reported ART Count ( y )
Note that the aggregated ART reported count at District 26 (600) is misreported at the
National Level (700)
* NA = This site not randomly selected
Two-stage cluster sampling, as discussed above, resulted in three districts and a total of 10 ART
sites. In accordance with the GAVI approach, this strategy requires a set number of sites to be
selected per district. In this example, three sites are to be selected per district. The problem is that
since District #1 only has two ART sites it is not possible to select three.
Once an alternative to the sampling issue shown above is identified, then the Audit Team can begin
to complete the matrix required to calculate Verification Factors. The matrix can be illustrated as
below:
Annex 5, Table 2 illustrates the calculations derived from the calculation matrix.
i j x y x/y
1 1 145 150 0.96
1 2 130 150 0.86
Total: 2 275 300 0.91
16 3 100 100 1.00
16 4 355 350 1.01
16 5 45 50 0.90
Total: 3 500 500 1.00
26 6 100 200 0.50
26 7 50 100 0.50
26 8 75 100 0.75
26 9 40 100 0.40
Total: 4 265 500 0.53
One of the rows in the matrix is highlighted for the purpose of further understanding how the
Verification Factor is derived. The row is associated with District 26 (i=26) and Site number 7
(j=7). The third column in the matrix shows (x), or the verified count of ART patients that the
auditors came up with at the site (50). The fourth column in the matrix shows (y), or the reported
count of ART patients at this site (100). This part of the Verification Factor is derived by simply
dividing the verified count (50) by the reported counted (100) = (0.50).
The matrix illustrates how sites are clustered together within districts, because the verification
factors are calculated at the district level by pooling the audit results from each selected site within
a district. Thus the Verification Factor for District 1 in the matrix is 0.91, which is derived by
pooling the [x/y] results from the two sites in District 1.
Judging from these verification factors (based on hypothetical values typed into the x column), the
matrix suggests that District 26 over-reported the number of ART patients served in its sites. Here,
the total number of reported ART patients was 500, while the total verified count that was derived
by the Data Quality Audit Team examining source documents at the four selected sites was 265;
265 divided by 500 equals 0.53, which implies that the auditors were able to verify only about half
of all the ART patients that were reported in this district.
The final two steps to deriving a national Verification Factor is to (1) calculate the adjustment
factor [Rdi/Rni] for each cluster; and (2) multiply this adjustment factor by the weighted district-
level Verification Factors.
This fact was uncovered by a member of the Data Quality Audit Team who was tracing the district
level results to what could be observed at the national level. As a result of this work by the Data
Quality Audit Team that occurs in levels of aggregation higher than the site (namely intermediate
and final levels of aggregation), we now have what we need to calculate the Adjustment Factor.
In our example, the adjustment factors for each district would be:
• District 1: 300/300 = 1.0
• District 16: 500/500 = 1.0
• District 26: 600/700 = 0.86
The next step in the calculation is to weight the adjusted district Verification Factors by the verified
counts at district level. We weight the adjusted district Verification Factors because we want to
assign more importance to a Verification Factor that represents a large number of clients, and
proportionately less importance to a Verification Factor that represents a small number of clients.
In other words, based on our hypothetical example of the three districts, it looks like District 16
has the highest volume of ART patient services and that District 26 has the smallest volume of
ART patient services during this time period. When we construct an average Verification Factor
for all of the three districts, we ideally would like to assign proportionately more weight to the
verification results from District 16, proportionately less weight to District 26, and so on.
The matrix below shows the intermediate and final calculations that are required to construct a
weighted average of all the District Verification Factors.
The District Average is calculated by summing the three District Verification Factors for each
district (0.92+1.00+0.53 = 2.44) and then dividing by three (2.44/3 = 0.813).
Weighted District Average is calculated by first multiplying each of the three adjusted District
Verification Factors by the district-level weight that has been assigned. In this example, the weight
is equal to the district-level verified count (x). In the matrix, this value is shown in the row labeled
Based on the calculations shown in Annex 5, Table 3, the simple arithmetic average of the combined
Verification Factors across all three districts is 0.813, while the weighted average is 0.840. The
weighted average is higher because its calculation took into account the fact that District 16 had
more ART patients than the other districts. Since the Verification Factor for District 16 was 1.00,
this (perfect) Verification Factor was applicable to more ART patients and thus it had more influence
on the overall average.