Managing Quality vs. Measuring Uncertainty in The Medical Laboratory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Article in press - uncorrected proof

Clin Chem Lab Med 2010;48(1):31–40  2010 by Walter de Gruyter • Berlin • New York. DOI 10.1515/CCLM.2010.024

Opinion Paper

Managing quality vs. measuring uncertainty in the medical


laboratory

James O. Westgard* always trying to catch the next wave, whether it be quality
Department of Pathology and Laboratory Medicine, control, quality assurance, total quality management, contin-
University of Wisconsin Medical School, and Westgard uous quality improvement, Six Sigma, lean, patient safety,
QC, Inc, Madison, WI, USA quality indicators, or risk management. In addition, we have
national regulations wsuch as Clinical Laboratory Improve-
ment Amendments (CLIA) in the USx and global guidelines
Abstract
(such as ISO) for inspection and accreditation. Keeping up
ISO 15189’s particular requirements for quality management with the management trends and good practice guidelines
in medical laboratories provide guidance for (a) relating per- may even complicate the management of quality because we
formance specifications to the intended use of a test or exam- often think the newest recommendations and approaches
ination procedure, (b) designing internal quality control should replace the older ones, rather than recognizing there
(IQC) procedures to verify the attainment of the intended are new tools and techniques available to help us manage
quality of test results, as well as (c) determining the uncer- quality.
tainty or results, where relevant and possible. This guidance Actually, all these different quality programs fit into the
has particular implications for analytical quality manage- overall process for managing quality, as shown in Figure 1.
ment, specifically for validating method performance relative This quality management process represents the basic sci-
to quality goals or requirements (intended use), designing entific method, which has been described as the Plan-Do-
statistical quality control procedures on the basis of the qual- Check-Act, or PDCA cycle. In this illustration, the basic
ity required for a test and the precision and bias observed components of scientific management are quality planning
for a method, and characterizing the quality achieved in prac- (QP, the ‘‘Plan’’), quality laboratory processes (QLP, the
tice by calculating measurement uncertainty. There already ‘‘Do’’), quality control and quality assessment (QC and QA,
exists an error framework that provides practical tools and the ‘‘Check’’), and quality improvement (QI, the ‘‘Act’’) (1).
guidance for managing analytical quality, along with an The entire process is centered on quality goals, requirements,
existing concept of total error that can be used to character- and objectives, i.e., what are we trying to achieve.
ize the quality of laboratory tests, thus there is considerable Quality management begins by understanding what we
concern and debate on the merits and usefulness of meas- must achieve, i.e., the meaning of quality itself. Quality is
urement uncertainty. This paper argues that total error such a generic term that we often do not bother to define it
provides a practical top-down estimate of measurement carefully. That omission can become a serious limitation
uncertainty in the laboratory, and that the ISO/GUM because we believe that others have the same idea about
model should be primarily directed to and applied by quality. Most of us have experienced a difference of opinion
manufacturers. about quality when dealing with complaints from our labo-
Clin Chem Lab Med 2010;48:31–40. ratory customers. Quite often there is a difference about
which performance characteristics are important, as well as
Keywords: analytical quality management; measurement whether or not we satisfy the customer’s need for a particular
uncertainty; quality; total error. characteristic. To discuss quality, we need to have a common
starting point from which to build a management framework,
particularly an agreed upon definition of quality itself, such
Introduction as:
• ANSI/ASQC A3-1978: Quality – the totality of features
Quality management guidelines and practices keep evolving and characteristics of product or service that bear on its
in the medical laboratory. To some, it may seem like we are ability to satisfy given needs (2).
• ISO 9000:2005: Quality – degree to which a set of inher-
*Corresponding author: James O. Westgard, 7614 Gray Fox Trail, ent characteristics fulfills requirements (3).
Madison, WI 53717, USA • IOM 2000: Quality of care – degree to which health serv-
Phone: q1-608-833-4718, Fax: q1-608-833-0640,
E-mail: [email protected] ices for individuals and populations increase the likeli-
Received July 30, 2009; accepted July 30, 2009; hood of desired health outcomes and are consistent with
previously published online November 18, 2009 current professional knowledge (4).

2010/407
Article in press - uncorrected proof
32 Westgard: Managing quality vs. measuring uncertainty

and to explicitly define the particular requirements for the


tests and services they provide.

Particular requirements for quality

Many different requirements are identified in ISO 15189


‘‘Medical laboratories – particular requirements for quality
and competence’’ (9). Of particular interest for this discus-
sion of analytical quality management are the following:
• 5.5.4 Performance specifications for each procedure used
Figure 1 Quality management process and related tools and in an examination shall relate to the intended use of that
techniques. procedure.
• 5.6.1 The laboratory shall design internal quality control
(IQC) systems that verify the attainment of the intended
It is clear that these definitions build upon each other, yet quality of results.
when we get to the Institute of Medicine (IOM) definition • 5.6.2 The laboratory shall determine the uncertainty of
for healthcare, quality has become more complex and diffi- results, where relevant and possible.
cult to understand. Note the IOM definition comes from the • 5.6.3 A program for calibration of measuring systems and
landmark publication ‘‘To Err is Human’’ (4), which led to verification of trueness shall be designed and performed
the patient safety movement in the US. In fact, patient safety so as to ensure that results are traceable to SI units or
has also become a practical definition of quality in many by reference to a natural constant or other state references.
healthcare organizations. • 5.6.4 The laboratory should participate in inter-laboratory
Unfortunately, these definitions make quality in healthcare comparisons, such as those organized by external quality
more complex and less understandable, particularly for ana- assessment (EQA) schemes. Laboratory management
lysts in the medical laboratory. Simpler and more practical shall monitor the results of EQA and participate in the
definitions are possible based on guidance from industrial implementation of corrective actions when control crite-
leaders in quality, such as: ria are not fulfilled.
• Juran – quality is fitness for use (5). Item 5.5.4 identifies the importance of validating meas-
• Crosby – quality is conformance to requirements (6). urement performance for ‘‘intended use’’, which is the ISO
• Deming – quality should be aimed at the needs of the language for stated or implied needs, or the goals or require-
customer (7). ments for quality. Items 5.6.1 and 5.6.4 emphasize that IQC
and EQA should be related to the ‘‘intended quality of
Based on these ideas, the following definition was rec- results’’ and ‘‘control criteria’’, i.e., goals for quality. Items
ommended by a Center for Disease Control (CDC) Institute 5.6.2 and 5.6.3 can be implemented without any goals for
on Critical Issues in Health Laboratory Practice (8): quality, i.e., these are requirements to document certain char-
• CDC: 1986 – The quality of a laboratory testing service acteristics (uncertainty and traceability) that are related to the
depends on providing the totality of features and char- performance of an examination procedure (e.g., precision,
acteristics that conform to the stated or implied needs of trueness).
its users or customers. Measurement uncertainty, trueness, and traceability are
new to many medical laboratories. Trueness is most easily
This definition provides a clear focus on users or custom- understood and accepted because it is mainly a new name
ers, recognizes that their needs may be stated explicitly or for the old concept of systematic error, accuracy, or bias.
implied by intended use, and that these needs may involve Traceability is generally accepted as necessary in the pursuit
many different characteristics (such as specimen collection, of trueness or accuracy, though there are many difficulties in
turnaround time, correct test results, clear reports, etc.). The establishing traceability chains for many of our measurands.
key to making quality measurable and manageable is the idea There is less agreement on the value and usefulness of meas-
of conformance to needs. Non-conformance means some- urement uncertainty. Given the difficulties in implementing
thing is defective, e.g., the turnaround time was longer than the original recommendations in the Guide to the Expression
the required 60 min. For analytical quality, non-conformance of Uncertainty in Measurement (GUM) (10), more guidance
means that the result is in error by an amount greater than is clearly needed if laboratories are to characterize the uncer-
allowable for the stated or implied use of the test. tainty of their many different measurement procedures.
This thinking is familiar to laboratory scientists, who The system for traceability makes use of a hierarchical
know that the measure of precision is imprecision, the meas- structure of reference methods and materials that can be used
ure of accuracy is inaccuracy, and the measure of quality is to establish the accuracy or trueness of analytical measure-
unquality, i.e., defects or errors. Nonetheless, laboratory sci- ment procedures. At the highest level, there are definitive
entists need help and guidance to understand customer needs methods and primary standard reference materials, the next
Article in press - uncorrected proof
Westgard: Managing quality vs. measuring uncertainty 33

level provides reference methods and certified reference between constant and random contributions may then be
materials, and finally, there are routine methods and calibra- chosen freely within the total sum, which may include a
tion materials. The traceability chain provides a series of known procedural bias. This is one reason for the outcome
links between different reference methods and materials to of EQA where results are clustered in method-dependent
document how the results of a routine method are connected groups when measuring systems are precise rather than true.
to the truth. Consequently,
For glycated hemoglobin, for example, there is an IFCC
definitive method, three national reference methods (Sweden,
• Results from different measurement procedures are not
Japan, US), and a network of reference laboratories that val-
directly comparable;
idate reference materials and monitor the performance of the
• Biological reference intervals will depend on the proce-
national reference methods (11). These national reference
dure or have to be widened to accommodate results from
methods in turn are used to maintain and monitor routine
all procedures, leading to a loss of diagnostic capability;
service methods and calibration materials. In the US, this
• Classification of biological states by comparing a given
function is performed by the National Glycohemoglobin
result with common limits becomes hazardous;
Standardization Program (NGSP), which certifies the trace-
• Equations between different types of quantities cannot
ability of routine methods to assure their comparability
work across procedures;
(www.ngsp.org). Thus, every routine method available in the
• Movement of patients between health services requires
US has been compared and evaluated against the US national
repeat measurements.
reference method, and is calibrated to provide results that are
• Such conditions do not seem acceptable in terms of health
comparable to the US national reference method.
and resources, and may lead to complaints and loss of
business’’.
Why measure uncertainty? The remedy proposed is to avoid defining a quality
requirement that takes bias into account and, instead, correct
The intent of ISO is to make measurements transferable, or
for bias and then estimate the uncertainty of the test result:
comparable, on a global basis. This requires eliminating or
correcting biases or systematic errors between measurement
‘‘Rather than defining an allowable total error with estimated
systems, and then, according to ISO, reporting any remaining
elements of all types of systematic and random error (hith-
variance of a test result (uncertainty) to inform the user of
erto often called inaccuracy and imprecision, respectively),
its quality. These are good intentions, but they are certainly
any result should be corrected for known significant biases
not new or unique to ISO, nor are these objectives achievable
and should have a measure of uncertainty attached giving an
using the ISO proposed concepts for measurement trueness
interval comprising a large fraction of the reasonably pos-
and uncertainty only.
sible values of the measurand with a given level of
Dybkaer, a medical laboratory scientist who is one of the
confidence’’.
leading advocates of the ISO/GUM approach, argues for
trueness and measurement uncertainty and against total error
The call for correction of any known biases is true to the
because it allows systematic errors to exist, rather than
principles of metrology, but it is a risky business in medical
requiring their elimination (12):
laboratories because there are relatively few reference meth-
ods and materials. Thus, it is difficult to know what correc-
‘‘When describing the performance of procedures and the
tion is actually correct. Even after correction, there will likely
reliability of their results, ISO terminology should be used.
be some residual biases remaining, as well as differences in
Results should be universally comparable and this requires
the specificity of some measurement principles. Bias is not
metrological traceability, the concomitant uncertainty
(inversely) indicating reliability should be obtained in a uni- as simple as a correction factor or a conversion algorithm
versal and transparent fashion, and should be combinable. (or a HbA1c ‘‘master equation’’ for that matter).
Therefore, the approach of (GUM), leading to a result with For example, an EQA survey of HbA1c methods in the
known bias and a combined standard uncertainty has advan- US (13) showed that some method subgroups were biased
tages over the allowable total error concept, incorporating as much as –0.3% HbA1c low and q0.4% HbA1c high for
procedural bias’’. a sample having a value of 6.8% HbA1c according to the
NGSP reference method. The means and 95% ranges of
The phrase ‘‘incorporating procedural bias’’ is important results for 19 different method subgroups are shown in Fig-
here. It implies that the inclusion of ‘‘procedural bias’’ or ure 2 for some 2676 laboratories surveyed by the College of
systematic error in the concept of total error allows manu- American Pathologists (CAP) in 2008. Trueness is measured
facturers to avoid the goal of globally consistent test results: by a bias of 0.06, which is excellent. However, the survey
sample was reported in individual laboratories as a value as
‘‘The allowable total error (which for practical purposes low as 5.98% HbA1c and as high as 7.86% HbA1c, i.e., with
could also have been termed ‘allowable deviation’) is set for errors as large as –0.82% and q1.06% HbA1c. If a value
a given type of quantity and purpose. The distribution of 6.5% HbA1c were used for diagnosis of diabetes, there
Article in press - uncorrected proof
34 Westgard: Managing quality vs. measuring uncertainty

care. All sources of errors are important and analytical qual-


ity is still an issue in medical laboratories today.
Medical laboratories today do focus on errors and have
many tools and techniques for evaluating, measuring, mon-
itoring, detecting, correcting, and improving quality. This
begins with method validation studies to characterize report-
able range, precision, and trueness or bias, and reference
intervals. These studies might focus on verifying the manu-
facturer’s performance claims, or the laboratory may do more
extensive studies to validate that performance is satisfactory
for the intended clinical use of the test. In managing analyt-
ical quality during routine operation, laboratories analyze
internal control materials to detect medically important errors
and to assure that the desired test quality is achieved in rou-
tine operation. Periodically, external control materials are
analyzed to provide a comparison against other laboratories
and other methods and, hopefully, comparison with values
Figure 2 HbA1c survey results from 2676 laboratories for CAP
assigned by reference methods. For example, the US CAP
sample GH2-05 with target value 6.8 established by NGSP reference
method (posted on NGSP.org 12/08; accessed 4/23/2009). PT survey for HbA1c establishes the true value for survey
samples by analysis with the US national reference method.
Many of these activities for managing the quality of rou-
will be many patients misclassified and/or re-classified with tine methods have developed as part of the traditional error
sequential testing; if a difference between 7.0% and 7.5% framework. The traceability system and the error framework
HbA1c were considered significant for treatment, patients provide complementary activities, all of which are essential
will be at risk of mistreatment. The quality of laboratory for achieving quality in a medical laboratory. However, the
testing by current analytical methods is clearly inadequate two parts have developed from different directions and are
for the intended clinical uses – in spite of the fact that all based on different professional interests. Traceability has
these methods are certified by NGSP as providing equivalent been driven by metrologists and national calibration and
results. standardization organizations; the error framework has
There is no doubt that the NGSP certification process has evolved from clinical laboratory scientists, or what today is
improved the performance of HbA1c methods in the US over the field of laboratory medicine. Given the different driving
the past several years, but it is also clear that biases still exist forces, it might be expected that there are some differences
and that the differences in results from laboratory to labo- in principles and approaches.
ratory may still be quite large, as described by the observed
total errors. Thus, bias cannot simply be ignored nor can the
existence of total error be rejected to make way for meas-
How manage errors?
urement uncertainty. Even more important, the existing error
framework and its associated practical tools should not be
discarded! The new ISO concepts will not replace our exist- The traditional error framework focuses on different types of
ing method validation protocols, IQC design tools, and EQA analytical errors such as, random error (or imprecision), sys-
and proficiency testing (PT) monitoring programs that make tematic error (or bias or inaccuracy), and the net effect of
use of total error to manage analytical quality. those errors on the overall quality of a test result, i.e., the
total analytical error. The framework guides the implemen-
tation and management of a new measurement procedure, as
Why focus on errors? shown in Figure 3, which identifies four major management
responsibilities for defining quality goals, validating method
Plebani and associates have established benchmarks for performance, designing QC procedures, and surveying per-
measuring errors in the laboratory to provide guidance for formance through EQA and PT.
quality improvement (14). Many laboratory scientists have Expert groups traditionally specify desirable performance
focused on the reported distribution of errors – 60% pre- in terms of the maximum allowable imprecision and the
analytical, 15% analytical, and 25% post-analytical – and maximum allowable bias and sometimes also the allowable
concluded that it is more important to focus on pre-analytic total error. The US National Cholesterol Education Program
and post-analytic errors, rather than analytical errors! How- was one of the early models for national standardization of
ever, that same report notes that of the 51,746 tests reviewed, laboratory testing methods (15). Today’s global program for
there were 393 questionable results, 160 confirmed labora- glycated hemoglobin standardization and traceability is
tory errors, 46 of which caused inappropriate patient care. another example (11). Thus, the classical concepts of pre-
Of those 46, 24 were analytical errors, i.e., analytical errors cision and accuracy are the starting point for the error
were still the major cause (52%) of inappropriate patient framework.
Article in press - uncorrected proof
Westgard: Managing quality vs. measuring uncertainty 35

a specified level of quality which can be defined in terms of


an allowable total error (22).
EQA and PT can be scored or evaluated with use of the
same allowable total error. It is common for EQC and PT
schemes to specify the allowable total errors to guide labo-
ratories towards stated quality goals. It is also common that
peer-comparison programs characterize the precision and
bias of individual laboratory methods and report method
biases, CVs, and observed total errors.
This traditional error framework supports the management
of analytical quality from the time of specification of desir-
able method performance characteristics, development of
new methods by manufacturers, validation of new methods
in the laboratory, design of IQC procedures, and the moni-
toring of routine performance via EQC, PT, and peer-com-
parison programs. A quality goal in the form of allowable
total error can provide guidance for managing analytical
quality throughout the lifetime of an analytical measurement
procedure. There exists a large body of work concerned with
defining analytical quality goals in the form of allowable
errors (23), including a database of goals based on the bio-
logic variability of human subject, both intra-individual and
between-individuals or group variation for 300–400 quanti-
ties (24).

Figure 3 Established practices for managing analytical quality on


basis of error concepts and framework. Why preserve total error?

The estimation of the total error of a laboratory test provides


The validation of methods or measurement procedures can
a practical measure of the variation that can be expected in
be organized around errors (16). The different experiments,
routine laboratory service. It is intended to describe the
such as replication, recovery, interference, and comparison
worst-case variation that might be experienced for a labo-
of methods are used to estimate random, proportional sys- ratory test. In that context, total error is a practical and estab-
tematic, constant systematic, and the overall systematic lished estimate of measurement uncertainty. The name may
errors. The different statistics in regression analysis are use- be objectionable to some, even though the intention is the
ful for estimating different components of error, e.g., the same as that of measurement uncertainty.
slope, y-intercept, and standard deviation (SD) about the It may be helpful to understand the history of the error
regression line provide estimates of proportional error, con- framework and its current usefulness for analytical quality
stant systematic error, and the random error between the test management. Having trained in analytical chemistry, I
and comparison methods, respectively (17). The decision on encountered the clash of metrological principles with pro-
acceptability can be based on the magnitude of the observed duction laboratory practices when I became a clinical chem-
errors in comparison to the amount of error that is allowable ist. I still remember my first experience in a medical
(18). laboratory and the realization that only a single measurement
Manufacturers commonly employ process capability indi- was involved in generating a test result, rather than the mul-
ces to characterize performance of production processes, tiple measurements that were typical of most classical ana-
including measurement processes. Such indices can be relat- lytical laboratories.
ed directly to total error criteria for analytical measurement One of my first work assignments was to evaluate the
procedures (19). Furthermore, total error criteria have been performance of a new automated multi-test chemistry ana-
shown to be consistent with the Six Sigma concept and met- lyzer. Method validation protocols were being recommended
rics, thus providing industrial benchmarks for world class in the clinical pathology literature, but there was little guid-
quality (20). ance on how to interpret the statistical results from the eval-
IQC is an extension of these concepts and involves the uation experiments, particularly the comparison of methods
selection of control rules based on their capability of detect- experiment that seemed critical to the evaluation effort. Tests
ing analytical errors. A multi-rule QC procedure that is com- of significance, as well as regression and correlation statis-
monly used in medical laboratories has been structured to tics, were often calculated, then something magic seemed to
include certain control rules that are sensitive to random happen that led to a decision that performance was either
errors and others that are sensitive to systematic errors (21). acceptable or not. The process lacked the objectivity and
IQC procedures can be designed to verify the attainment of scientific rigor that would be expected in analytical chem-
Article in press - uncorrected proof
36 Westgard: Managing quality vs. measuring uncertainty

istry laboratories, though it still employed the classical con- In 1992, the US CLIA regulations (27) defined criteria for
cepts of precision and accuracy and their separate effects on acceptable performance for PT, which were, in effect, state-
test results. There was no guidance on how to judge the ments of allowable total errors because laboratories had to
acceptability of performance when a single measurement led test PT samples in the same manner they tested patient sam-
to the reported test result. ples, i.e., a single measurement. In 2002, CLSI published a
In 1974, working with R. Neill Carey and Svante Wold, consensus guideline on ‘‘Estimation of Total Analytical
we published a paper on ‘‘Criteria for judging precision and Error’’ (28) and in 2006 a guideline for IQC that outlined a
accuracy in method development and evaluation’’ (18). To step-by-step process for selecting QC rules and numbers of
my knowledge, that paper was the first publication in the control measurements based on the allowable total error
clinical chemistry literature to propose the use of total ana- for the test and the observed imprecision and bias of the
lytic error for the purpose of characterizing the quality of a measurement procedure (29). In 2008, FDA provided manu-
measurement procedure. We argued as follows: facturers with guidance on ‘‘waiver applications for manu-
facturers of in vitro diagnostic devices’’ (30), recommending
‘‘To the analyst, precision means random analytic error. that manufacturers establish performance criteria for an
Accuracy, on the other hand is commonly thought to mean allowable total error together with error grids to demonstrate
systematic analytic error. Analysts sometimes find it useful and document that test performance satisfies a defined level
to divide systematic error into constant and proportional of quality. Thus, total analytical error and its corresponding
components, hence, they may speak of constant error or pro- target in the form of an allowable total error are well accept-
portional error. None of this terminology is familiar to the ed as part of good laboratory practices, as well as the US
physician who uses the test values, therefore, he is seldom regulatory process for demonstrating performance for waived
able to communicate with the analyst in these terms. The tests.
physician thinks rather in terms of the total analytic error, Today we have well-established method validation proto-
which includes both random and systematic components. cols, yet they do not require the ISO/GUM measurement
From his point of view, all types of analytic error are accept- uncertainty. Today we have quality-planning processes and
able as long as the total analytic error is less than a specified tools that support the design of IQC procedures to verify the
amount. This total analytic error « is more useful; after all attainment of the intended quality of test results, yet they do
it makes little difference to the patient whether a laboratory not require the ISO/GUM measurement uncertainty. Today
value is in error because of random or systematic analytic we have PT and EQA schemes that monitor test quality and
error, and ultimately he is the one who must live with the assist laboratories in monitoring long-term bias, yet they do
error’’. not require the ISO/GUM measurement uncertainty. Only
accreditation under ISO 15189 requires measurement uncer-
Indeed, our purpose in introducing this concept was to tainty! The actual management of analytical quality in a
characterize the uncertainty of the measurement procedure in medical laboratory does not!
the context of the reported result, which we did using a linear
combination of bias plus two times the method SD. At that
time, the standard practice in metrology laboratories and How measure uncertainty?
standardization organizations, such as the US National
Bureau of Standards (NBS, the precursor to today’s NIST) Technically, one of the differences between total error and
was to judge method performance based on separate assess- measurement uncertainty is the mathematical formula for
ment of precision and accuracy and to consider their com- combining components of bias and imprecision. Total error
bined effects by a classification scheme: (case 1) systematic makes use of a linear combination of bias plus a certain
error and imprecision both negligible; (case 2) systematic multiple of the SD, e.g., bias q2 SD. Measurement uncer-
error not negligible, imprecision negligible; (case 3) neither tainty requires that all error components be squared to pro-
systematic error nor imprecision negligible; (case 4) system- vide variances that can be added. Then the square root is
atic error negligible, imprecision not negligible – a schema extracted and multiplied by an appropriate coverage factor,
recommended by Churchill Eisenhart from NBS (25). which is usually 2 or 3. Mcdonald (31) has proposed the use
The total error concept took time to gain acceptance from of a ‘‘root mean square measurement deviation’’ (RMSD) to
clinical chemists and clinical pathologists. In the landmark combine imprecision and bias, arguing that this estimation
‘‘1976 Aspen Conference on Analytical Goals in Clinical would be consistent with the ISO/GUM principles:
Chemistry’’, leading laboratory professionals voiced their
arguments against such a combination of errors (26). None- ‘‘The quantity « is approximately (assuming that n is large
theless, the concept survived and within the next decade enough) the square root of the variance of the combined
became well accepted in the laboratory community. Accep- distribution resulting from the convolution of these two dis-
tance came because of its usefulness in evaluating the per- tributions. In fact, this kind of symmetric interpretation of
formance of new measurement procedures, characterizing random and systematic deviations of measurement is exactly
measurement performance in peer-comparison and EQA the way measurement uncertainty is considered in the GUM,
schemes, and planning and designing internal QC which is internationally accepted today in the field of
procedures. metrology’’.
Article in press - uncorrected proof
Westgard: Managing quality vs. measuring uncertainty 37

This RMSD approach seems to have been adopted in the error possible, and who is not particularly interested in the
2008 German regulations for laboratories, where error mar- question whether the errors are systematic or random«’’
gins can be calculated by the laboratory in this manner (32).
Thus, there exists a total error model with a linear com- Note that this is the same as our original rationale justi-
bination of bias and imprecision, a RMSD model for com- fying the use of total error (18). And observe that the lan-
bining the squares of bias and imprecision, and the detailed guage is almost the same – total error, not whether the errors
GUM model which involves many different components of are systematic or random. But while the reasoning here looks
variation, along with a host of rules and recommendations to be the same, our original objective was to use this infor-
for estimating and combining variances. The GUM approach mation to manage quality to meet the physician’s needs, not
is often described as bottom-up because of its emphasis on to report such information to physicians. I have yet to hear
characterizing all individual components of variation, then any customers ask for this information, and regardless
adding them together to predict their total effect. Total error, whether it is an implied or unspoken need, I disagree with
in contrast, is top-down because it is estimated from the total the idea that reporting measurement uncertainty will improve
systematic error and total random error, which include all the the use and interpretation of laboratory tests. Our physician
effects of the many individual sources of variation. customers have limited time and patience to deal with meas-
Top-down estimates typically make use of long-term urement uncertainty. Along with our patient customers, they
imprecision, often determined over several months that expect us to provide them with test results whose quality is
include changes in reagent lots, calibrator lots, operators, managed and assured to satisfy their intended use.
operating conditions, etc. Such estimates are practical in a To do so, we must define the quality for intended use,
laboratory because they utilize routine QC data. One profes- validate that our measurement procedures can produce that
sional guideline by White and Farrance (33) recommends quality, control our measurement procedures to verify the
this approach, as well as pointing out that long-term esti- attainment of that quality every day in routine laboratory
mates of bias may be determined from EQA surveys: operation, and report test results in a manner that provides
• ‘‘Record mean long-term imprecision of QC as estimate guidance for interpretation. Fraser (34) has demonstrated a
of uncertainty of measurement ("1.96 SD or "1.96 practical reporting system that flags results as follows:
CV%). The uncertainty of the value assigned to a cali- )higher than reference limit; -lower than reference limit;
brator(s) should be included if available’’. ))higher than reference limit and likely clinically impor-
• ‘‘For some methods, test results are interpreted against tant; --lower than reference limit and likely clinically
reference or clinical decision values that have been deter- important; *significant change (95% confidence level);
mined by a different method. In this situation, the uncer- **highly significant change (99% confidence level). These
tainty of the results includes not only the analytical latter two flags relate to the calculation of a reference change
imprecision of the method, but also any systematic error value that takes into account both analytical variation and
(method bias). For such methods the long-term bias within-subject biological variation. In effect, measurement
should be recorded, ideally as full calibrator traceability uncertainty is being reported here without actually providing
and uncertainty data from the commercial supplier, or in any numerical values for uncertainty.
its absence, from proficiency testing (external quality Other advocates emphasize the importance of estimating
assurance) reports’’. measurement uncertainty to identify the need for improve-
ments in analytical methodology (37):
While this guideline does not specifically recommend how
the information on imprecision and bias should be combined, ‘‘«focusing on traceability and uncertainty has the potential
it does make recommendations for assessing fitness for pur- to increase pressure on manufacturers of assays so that they
pose by comparison with analytical goals for imprecision, increase their efforts to improve the quality of their products.
bias, and total error based on biologic variation, following This drive for quality will include both analytical quality, i.e.,
the guidance of Fraser (34, 35) where bias and imprecision the specificity of the assay, and the metrological quality of
are combined using a linear model. Thus, one might justify calibrators«’’
determining the expected total error as a practical estimate
of measurement uncertainty in a service laboratory. Of course, manufacturers should already be doing this if
they are following the ISO guidance! Laboratories do not
actually need the information required by the ISO/GUM esti-
Why NOT measure uncertainty? mation of measurement uncertainty, except for a few cases
where measurements are combined to provide a calculated
Advocates of measurement uncertainty often justify their parameter and error propagation rules must be applied to
position by emphasizing that laboratory customers need this estimate uncertainty, rather than being able to measure
information. For example, the Tietz Textbook for Clinical imprecision directly. The strongest argument is actually for
Chemistry makes the statement (36): traceability of calibration, where measurement uncertainty
plays a secondary role.
‘‘The uncertainty concept is directed towards the end user These difficulties in adapting the ISO/GUM approach to
(clinician) of the result, who is concerned about the total medical laboratories seem to be due to an attempt to force-
Article in press - uncorrected proof
38 Westgard: Managing quality vs. measuring uncertainty

fit the concepts and principles from metrology laboratories tion of the data provided’’. Effective communication, as not-
without carefully considering the practical applications that ed here, involves more than just reporting the uncertainty of
are necessary to measure and manage analytical quality in a results. Such efforts should build on the existing error frame-
high production testing laboratory. The earlier warning by work and Fraser’s system for identifying clinically significant
Horwitz and Albert (38) seems to apply again: results in laboratory reports (34), taking into account biologic
‘‘Without a refinement of concept, the metrologists risk los- variation as well as analytical variation. Laboratories should
ing a large part of their chemical constituency. The presen- assure the analytical quality of their test results and provide
tations of the metrologists suffer from a lack of clarity and aid and support to improve the interpretation of test results,
transparency to a chemical audience«’’ without burdening our physician customers and patient con-
sumers with metrological uncertainty.
‘‘We suggest that instead of trying to disentangle the various In managing quality internally, medical laboratories should
threads involved in the error budget approach to uncertainty, utilize the existing concept of total error as a practical top-
let the measurements speak for themselves. Why bother with down estimate of measurement uncertainty. Estimates of
the various individual sources of bias and imprecision when within-laboratory precision can be provided by internal QC
what the chemist wants to know is merely the final integrated data and estimates of bias from EQA and PT data, as sug-
result«’’ gested by White and Farrance (33). To provide estimates for
conditions of reproducibility (different laboratories with dif-
‘‘The major advantage of the top-down approach is that it ferent analysts, reagent lots, calibrators, etc.), EQA and PT
randomizes the locally constant individual laboratory biases data can provide estimates of the bias and variation of meth-
into interpretable critical limits that include the major sources od subgroups (41), which can be used to characterize quality
of chemical deviations. These are laboratories, analysts, on the sigma-scale (41). It is also important to have an esti-
methods, and time-factors that are left out of the uncertainty mate of error that incorporates the directional effect of bias
error budget calculations«’’ to recognize when methods produce consistently high or low
results (as illustrated in Figure 3). The ISO/GUM construct
Horwitz put it more succinctly in a later paper (39): ‘‘The of measurement uncertainty assumes symmetry, i.e., expect-
absurd and budget-busting approach (for analytical chemis- ed value"measurement uncertainty, whereas total error cal-
try) arose from metrological chemists taking over in entirety culations of upper and lower limits will reveal any
the concepts developed by metrologists for physical pro- asymmetry due to bias (42). An important drawback of the
cesses measured with 5–9 significant figures (gravitational uncertainty concept is the assumption that bias is completely
constant, speed of light, etc.) and applying them to analytical eliminated via traceability, standardization, and correction,
chemistry measurements with 2 or 3 significant figures’’. which is not true in the real world of medical laboratories.
The theory from metrology may be pure, but the appli- As long as bias exists in the real world, the ISO/GUM
cations in medical laboratories are messy, limited, and often methodology is flawed and combinable variances will not
impractical. Unless these shortcomings are overcome, the provide reliable and realistic estimates of the quality of meas-
ISO/GUM push for measurement uncertainty may cause urements and test results! At such time that bias is truly
more harm than good, violating a fundamental tenet of med- eliminated, the estimate of total error will converge with that
ical practice to ‘‘do no harm’’. Such efforts may be harmful of measurement uncertainty, will be a function of only ran-
because the need to characterize measurement uncertainty dom errors, and the variance will be able to be combined.
will consume time and resources that might be better spent Measurement uncertainty should therefore be preserved
managing analytical quality. They may be harmful because for use in the ISO/GUM tradition of a bottom-up estimate
laboratories will have the false comfort that characterizing based on detailed modeling and combination of individual
measurement uncertainty somehow assures analytical quality, components of variation. Any application where traceability
whereas it only provides a measure of how good or bad the is important should require that this estimate of measurement
results can be. And unless there is a clear rationale for defin- uncertainty be used. Those applications should be of impor-
ing goals or targets for acceptable measurement uncertainty, tance particularly to manufacturers, regulators, and perhaps
laboratories may not recognize whether the observed uncer- academic laboratory scientists. In medical laboratories, the
tainty is good or bad and whether improvement is needed or ISO/GUM approach will mainly be of use for those relatively
not. few measurands where Certified Reference Materials are
available to characterize the trueness and uncertainty of the
calibration of routine methods. And even then, many labo-
What to do? ratories may find it difficult to make these estimates unless
practical calculator tools become available.
In discussing the continued need to improve analytical qual-
ity, Plebani (40) has pointed out that the ‘‘« solution to this
is to use more stringent metrics to define and monitor ana- Concluding comments
lytical tolerance limits, on the one hand, and on the other to
meet the need for more effective communication of labora- ISO 15189 makes measurement uncertainty a certainty,
tory results to clinicians, specifying their uncertainty and where relevant and possible! I have long argued that labo-
offering advice for improving the interpretation and utiliza- ratories need to consider the total effect of all sources of
Article in press - uncorrected proof
Westgard: Managing quality vs. measuring uncertainty 39

variation, or error, or uncertainty, in order to determine Conflict of interest statement


whether or not their measurement procedures meet the
intended clinical use of laboratory tests. My experience leads Authors’ conflict of interest disclosure: The author stated that
me to believe that concepts and ideas must be relevant and there are no conflicts of interest regarding the publication of this
article. The support from Bio-Rad played no role in the study
practical if they are to advance and improve laboratory prac-
design; in the collection, analysis, and interpretation of data; in the
tices. The rigid ISO/GUM approach does not meet those writing of the report; or in the decision to submit the report for
needs and, furthermore, is not essential for managing ana- publication.
lytical quality in the medical laboratory. Research funding: None declared.
The error framework is relevant in the medical laboratory. Employment or leadership: None declared.
Error is not a bad word, rather it is particularly relevant Honorarium: None declared.
because it draws attention to a serious issue that can cause
serious problems. I want people who work in the laboratory
to think about errors, worry about errors, measure errors,
monitor errors, and manage analytical testing processes to be References
sure that the total effect of all errors is small enough that the
test results can be properly used and interpreted. 1. Westgard JO, Burnett RW, Bowers GN. Quality management
The error framework, along with its established concept science in clinical chemistry: a dynamic framework for contin-
of total error, is also practical! We already have practical uous improvement of quality. Clin Chem 1990;36:1712–6.
estimates of method variability, as well as test variability 2. ANSI/ASQC A3 Quality Systems Terminology. Milwaukee,
WI: ASQC, 1978.
(which includes within-subject biologic variation) that are
3. ISO 9000:2005. Quality management systems – fundamentals
useful in the laboratory for managing analytical quality. Prac-
and vocabulary. Geneva: ISO 2000.
tical here means the use of top-down estimates rather than 4. Institute of Medicine Committee on Quality of Health Care in
the ISO/GUM bottom-up approach. Practical means taking America. To err is human: building a safer health system.
advantage of a long history and extensive literature on targets Washington, DC: National Academy Press, 2001.
in the form of error goals. Practical means having established 5. Juran JM. The quality triology. Qual Prog 1986:19–24.
protocols for validating method performance, established 6. Crosby PB. Quality is free. New York: New American Library,
tools for designing internal QC to verify the attainment of 1979.
the intended quality of results, and available result reporting 7. Deming WE. Out of the crisis. Cambridge MA: MIT Center
formats to communicate the effects of uncertainty in a man- for Advanced Engineering Study, 1986.
ner that is useful to physicians. 8. Centers for Disease Control. Proceedings of the 1986 Institute
– Managing the quality of laboratory test results in a changing
It should not be necessary to destroy the past in order to
health care environment. DuPont Company, 1987.
proceed into the future. The future can evolve in an orderly 9. ISO 15189:2007. Medical laboratories – Particular require-
manner from the past, and in this case, measurement uncer- ments for quality and competence.
tainty needs to evolve as an addition to our error framework 10. GUM. Guide to expression of uncertainty in measurement. ISO,
while maintaining existing tools and practices for managing Geneva, 1995.
analytical quality. Unless that happens, measurement uncer- 11. The IFCC Reference Measurement System for HbA1c: a 6-year
tainty is likely to become a meaningless calculation that is progress report. Clin Chem 2008;54:240–8.
performed in laboratories to show to inspectors who then 12. Dybkaer R. Setting quality specifications for the future with
certify that the laboratory has calculated a meaningless newer approaches to defining uncertainty in laboratory medi-
parameter, in accordance with global standards, which like- cine. Scand J Clin Lab Invest 1999;59:579–84.
13. College of American Pathologists (CAP) Survey Data (12/8).
wise will lose meaning and value if they are not both correct
Accessed at www.ngsp.org, April 2009.
in principle and capable of practice.
14. Carraro P, Plebani M. Errors in a stat laboratory: changes in
That is a harsh conclusion and many will disagree with type and frequency since 1996. Clin Chem 2007;53:1338–42.
this opinion! However, I hope it will stimulate further dis- 15. National Cholesterol Education Standardization Panel. Current
cussion about the best practices for managing quality. Lab- status of blood cholesterol measurements in clinical laborato-
oratories need guidance and direction to make quality ries in the United States. Clin Chem 1988;34:193–201.
manageable and to guarantee the attainment of the intended 16. Westgard JO. Basic method validation, 3rd ed. Madison, WI:
quality of test results. Westgard QC, Inc, 2008.
17. Westgard JO, Hunt MR. Use and interpretation of common
statistical tests in method-comparison studies. Clin Chem 1973;
19:49–57.
18. Westgard JO, Carey RN, Wold S. Criteria for judging precision
Acknowledgements and accuracy in method development and evaluation. Clin
Chem 1974;20:825–33.
19. Westgard JO, Burnett RW. Precision requirements for cost-
My interest in this issue has been stimulated by discussions with effective operation of analytical processes. Clin Chem 1990;36:
many people, particularly Paulo Pereira, David Burnett, Michael 1629–32.
Nobel, Paul De Biévre, Anders Kallner, Per Hyltoft Petersen, Diet- 20. Westgard JO. Six Sigma quality design and control: desirable
mar Stöckl, Sten Westgard, and Greg Cooper. Bio-Rad Laboratories precision and requisite QC for laboratory measurement pro-
supported my participation in the Sitges conference. cesses, 2nd ed. Madison, WI: Westgard QC, 2006.
Article in press - uncorrected proof
40 Westgard: Managing quality vs. measuring uncertainty

21. Westgard JO, Barry PL, Hunt MR, Groth T. A multi-rule Shew- Devices and Radiological Health, Office of In Vitro Diagnostic
hart chart for quality control in clinical chemistry. Clin Chem Device Evaluation and Safety, January 30, 2008.
1981;27:493–501. 31. Mcdonald R. Quality assessment of quantitative analytical
22. Westgard JO. Internal quality control: planning and implemen- results in laboratory medicine by root mean square of meas-
tation strategies. Ann Clin Biochem 2003;40:593–611. urement deviation. J Lab Med 2006;30:111–7.
23. Petersen PH, Fraser CG, Kallner A, Kenny D, editors. Strate- 32. German ‘‘RilliBAK’’ regulations. www.bundesaerztekammer.
gies to set global analytical quality specifications in laboratory de/page.asp?hiss1.120.121.1047.6009. Accessed May 5, 2009.
medicine. Scand J Clin Lab Invest 1999;57:475–585. 33. White GH, Farrance C. Uncertainty of measurement in quan-
24. Ricos C, Alvarez V, Cava F, Garcia-Lario JV, Hernandez A, titative medical testing – a laboratory implementation guide.
Jimenez CV, et al. Current databases on biologic variation: Clin Biochem Rev 2004;25:S1–24.
pros, cons and progress. Scand J Clin Lab Invest 1999;59: 34. Fraser CG. Biological variation: from principles to practice.
491–500. Washington, DC: AACC Press, 2001.
25. Eisenhart C. Expression of the uncertainties in final results. 35. Fraser CG, Petersen PH, Libeer J-C, Ricos C. Proposals for
Science 1968;160:1201–4. Reprinted in NBS Special Publica- setting generally applicable quality goals solely based on biol-
tion 300, vol 1. Precision measurement and calibration. Ku HH, ogy. Ann Clin Biochem 1997;34:8–12.
editor. Washington, DC: US Government Printing Office, 1969. 36. Linnet K, Boyd J. Selection and analytical evaluation of meth-
26. Westgard JO. Development of performance standards and cri- ods – with statistical techniques. Chapter 14. Burtis CA, Ash-
wood ER, Bruns DE, editors. Tietz textbook of clinical
teria for testing the precision and accuracy of laboratory meth-
chemistry and molecular diagnostics, 4th ed. St. Louis, MO:
ods. Proceedings of the 1976 Aspen Conference on Analytical
Elsevier Saunders, 2006.
Goals in Clinical Chemistry. College of American Pathologists,
37. Kristiansen J. The guide to expression of uncertainty in meas-
Chicago, IL, 1977:105–14.
urement approach for estimating uncertainty: an appraisal. Clin
27. US Department of Health and Human Services. Medicare,
Chem 2003;49:1822–9.
Medicaid and CLIA Programs: regulations implementing the
38. Horwitz W, Albert R. The concept of uncertainty as applied to
Clinical Laboratory Improvement Amendments of 1988
chemical measurements. Analyst 1997;122:615–7.
(CLIA). Final rule. Fed Regist 1992;57:7002–186.
39. Horwitz W. The certainty of uncertainty. AOAC International
28. CLSI EP21-A. Estimation of total analytical error for clinical 2003;86:109–11.
laboratory methods; approved guideline. Clinical and Labora- 40. Plebani M. Errors in laboratory medicine and patient safety:
tory Standards Institute, 940 West Valley Road, Wayne, PA, the road ahead. Clin Chem Lab Med 2007;45:700–7.
2003. 41. Westgard JO, Westgard SA. The quality of laboratory testing
29. CLSI C24-A3. Statistical quality control for quantitative meas- today: an assessment of sigma metrics for analytic quality using
urement procedures: principles and definitions; approved performance data from proficiency testing surveys and the
guidelines – 3rd ed. Clinical and Laboratory Standards Insti- CLIA criteria for acceptable performance. Am J Clin Pathol
tute, 940 West Valley Road, Wayne, PA, 2006. 2006;125:343–54.
30. Guidance for Industry and FDA Staff: Recommendations for 42. Petersen PH, Stockl D, Westgard JO, Sandberg S, Linnet K,
Clinical Laboratory Improvement Amendments of 1988 Thienpont L. Models for combining random and systematic
(CLIA) Waiver Applications for Manufacturers of In Vitro errors. Assumptions and consequences for different models.
Diagnostic Devices, Food and Drug Administration, Center for Clin Chem Lab Med 2001;39:589–95.
Copyright of Clinical Chemistry & Laboratory Medicine is the property of De Gruyter and its content may not
be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written
permission. However, users may print, download, or email articles for individual use.

You might also like