Appendix K: Guidelines For Dietary Supplements and Botanicals
Appendix K: Guidelines For Dietary Supplements and Botanicals
Botanicals
This appendix contains three complementary documents for the e. System Suitability Checks
validation of dietary supplements and botanical methods:
3 Performance Characteristics
Part I: AOAC Guidelines for Single-Laboratory Validation of
Chemical Methods for Dietary Supplements and Botanicals 3.1 Applicability (Scope)
Part II: AOAC Guidelines for Validation of Botanical
3.2 Selectivity
Identification Methods
Part III: Probability of Identification: A Statistical Model for the 3.3 Calibration
Validation of Qualitative Botanical Identification Methods
3.3.1 External Standard Method
PART I 3.3.2 Internal Standard Method
AOAC Guidelines for Single-Laboratory Validation
of Chemical Methods for Dietary Supplements 3.3.3 Standard Addition Method
and Botanicals 3.4 Reliability Characteristics
3.4.1 Accuracy
Contents
3.4.2 Repeatability Precision (sr, RSDr)
1 Introduction
3.4.3 Measurement Uncertainty
1.1 Definitions
3.4.4 Reproducibility Precision (sR, RSDR)
1.1.1 Validation
3.4.5 Intermediate Precision
1.1.2 Method of Analysis
3.4.6 Limit of Determination
1.1.3 Performance Characteristics of a Method of Analysis
3.4.7 Reporting Low-Level Values
2 Single-Laboratory Validation Work
3.4.8 Dichotomous Reporting
2.1 Preparation of the Laboratory Sample
3.5 Controls
2.2 Identification
3.5.1 Control Charts
2.3 Method of Analysis or Protocol
3.5.2 Injection Controls
2.3.1 Optimization
3.5.3 Duplicate Controls
2.3.2 Reference Standard
3.6 Confirmation of Analyte
2.3.3 Ruggedness Trial
3.7 Stability of the Analyte
2.3.4 Specific Variables
a. Analyte Addition 4 Report (as applicable)
b. Reextraction of the Extracted Residue
4.1 Title
c. Comparison with Different Solvents
d. Comparison with Results from a Different Procedure 4.2 Applicability (Scope)
4.3 Principle
Under a 5-year contract (2003–2008) with the National Institutes 4.4 Reagents
of Health-Office of Dietary Supplements, through the U.S. Food
and Drug Administration, AOAC undertook an effort to validate 4.5 Apparatus
methods for dietary supplement ingredients of interest. As part of
the initiative, AOAC adapted and revised the traditional Official 4.6 Calibration
MethodsSM process to include single-laboratory validation (SLV). 4.7 Procedure
Methods were first validated within a single laboratory to test
their suitability and ruggedness without the complications of a 4.8 Calculations
multilaboratory collaborative study. SLVs proved to be an excellent
debugging tool for complex methods; problems found within one 4.9 Controls
laboratory could be dealt with so that a stronger method went on to
the collaborative study. The SLV process, thus, became a step in 4.10 Results of Validation
preparation for the collaborative study. 4.10.1 Identification Data
The SLV guidelines were approved by the AOAC Official
Methods Board and Board of Directors in December 2002. 4.10.2 Performance Data
be determined, not merely verified, a whole new dimension is added of bias as recovery, and variability as the standard deviation or
to the problem. This involves bringing in a laboratory or an individual equivalent terms (relative standard deviation and variance).
with skill in determining chemical structure, a highly specialized, Measurements are never exact and the “performance
expensive, and time-consuming exercise. characteristics of a method of analysis” usually reflect the degree
It is often found during the initial experience with application to which replicate measurements made under the same or different
or validation of a method that deficiencies appear, unexpected conditions can be expected or required to approach the “true”
interferences emerge, reagents and equipment are no longer or assigned values of the items or parameters being measured.
available, instruments must be modified, and other unanticipated For analytical chemistry, the item being measured is usually the
problems require returning the method to a development phase. concentration, with a statement of its uncertainty, and sometimes
Frequently a method that functions satisfactorily in one laboratory the identity of an analyte.
fails to operate in the same manner in another. Often there is no For abbreviations and symbols used in this guideline, see
clear-cut differentiation between development and validation and the Annex A.
two procedures constitute an iterative process. For that reason some
2 Single-Laboratory Validation Work
aspects of method development that provide an insight into method
performance, such as ruggedness, are included in this document.
2.1 Preparation of the Laboratory Sample
In some cases it is impossible to set specific requirements because
of unknown factors or incomplete knowledge. In such cases it is best Product and laboratory sampling are frequently overlooked
to accept whatever information is generated during development aspects of analytical work because very often product sampling is
and validation and rely upon the “improvements” that are usually not under the control of the laboratory but the sample is supplied by
forthcoming to asymptotically approach performance parameters the customer. In this case, the customer assumes the responsibility
developed for other analytes in the same or in a similar class. of extrapolating from the analytical result to the original lot. If the
1.1 Definitions laboratory is requested to sample the lot, then it must determine
the purpose of the analysis and provide for random or directed
1.1.1 Validation sampling accordingly.
The laboratory is responsible for handling the sample in the
Validation is the process of demonstrating or confirming the laboratory to assure proper preparation with respect to composition
performance characteristics of a method of analysis. and homogeneity and to assure a suitable analytical sample. The
This process of validation is separate from the question of laboratory sample is the material received by the laboratory and
acceptability or the magnitude of the limits of the characteristics it usually must be reduced in bulk and fineness to an analytical
examined, which are determined by the purpose of the application. sample from which the test portions are removed for analysis.
Validation applies to a specific operator, laboratory, and equipment Excellent instructions for this purpose will be found in the
utilizing the method over a reasonable concentration range and “Guidelines for Preparing Laboratory Samples” prepared by the
period of time. American Association of Feed Control Officials, Laboratory
Typically the validation of a chemical method of analysis Methods and Service Committee, Sample Preparation Working
results in the specification of various aspects of reliability and Group (2000) (AAFCO, Oxford, IN) that cover the preparation of
applicability. Validation is a time-consuming process and should be particularly difficult mineral and biological material. The improper
performed only after the method has been optimized and stabilized or incomplete preparation of the analytical sample is an often
because subsequent changes will require revalidation. The stability overlooked reason for the nonreproducibility of analytical results.
of the validation must also be verified by periodic examination of a If a laboratory prepares test samples for the purpose of
stable reference material. validating a method, it should take precautions that the analyst
1.1.2 Method of Analysis who will be doing the validation is not aware of the composition of
the test samples. Analysts have a bias, conscious or unconscious,
The method of analysis is the detailed set of directions, from of permitting knowledge of the identity or composition of a test
the preparation of the test sample to the reporting of the results, sample to influence the result [J. AOAC Int. 83, 399–406(2000)].
that must be followed exactly for the results to be accepted for the
2.2 Identification
stated purpose.
The term “method of analysis” is sometimes assigned to the Identification is the characterization of the substance
technique, e.g., liquid chromatography or atomic absorption being analyzed, including its chemical, mineral, or biological
spectrometry, in which case the set of specific directions is referred classification, as applicable. In many investigations the identity
to as the “protocol.” of the analyte is assumed and the correctness of the assumption is
merely confirmed. With some products of natural origin, complete
1.1.3 Performance Characteristics of a Method of Analysis
identification and characterization is not possible. In these cases
The performance characteristics of a method of analysis are identification often may be fixed by chemical, chromatographic,
the functional qualities and the statistical measures of the degree or spectrophotometric fingerprinting—producing a reproducible
of reliability exhibited by the method under specified operating pattern of reactions or characteristic output signals (peaks) with
conditions. respect to position and intensity.
The functional qualities are the selectivity (specificity), as For botanical products, provide:
the ability to distinguish the analyte from other substances; • Common or usual name of the item
applicability, as the matrices and concentration range of acceptable • Synonyms by which it is known
operation; and degree of reliability, usually expressed in terms • Botanical classification (variety, species, genus, family)
• Active or characteristic ingredient(s) (name and Chemical studies, and interlaboratory studies. No simpler explanation in
Abstracts Registry number or Merck Index number) and understandable chemical terms exists of the analysis of variance
its chemical class. If the activity is ascribable to a mixture, than that given in pages 28–31. It supplements, explaining in
provide the spectral or chromatographic fingerprint and the greater detail, the concepts exemplified in the popular “Statistical
identity of the identifiable signals. Manual of AOAC” by W.J. Youden. Other useful references are
2.3 Method of Analysis or Protocol Appendices D and E of OMA.
The protocol or method of analysis is the set of permanent 2.3.2 Reference Standard
instructions for the conduct of the method of analysis. The method All chemical measurements require a reference point. Classical
of analysis that is finally used should be the same as the one that gravimetric methods depend on standard weights and measures,
was studied and revised as a result of research, optimization, which are eventually traceable to internationally recognized
and ruggedness trials and edited to conform with principles and (SI) units. But modern analytical chemistry depends on other
practices for the production of Official Methods of Analysis of AOAC physical properties in addition to mass and length, usually optical
INTERNATIONAL (OMA). At this point the text is regarded as fixed. or electrical, and their magnitude is based upon an instrumental
Substantive changes (those other than typographical and editorial) comparison to a corresponding physical signal produced from a
can only be made by formal public announcement and approval. known mass or concentration of the “pure” analyte. If the analyte
This text should be in ISO-compatible format where the major is a mixture, the signals or components must be separated and the
heads follow in a logical progression [e.g., Title, Applicability (Scope), signal from each compound compared to the signal from a known
Equipment, Reagents, Text, Calculations, with the addition of any mass or concentration of the pure material or expressed in terms of
special sections required by the technique, e.g., chromatography, a single reference compound of constant composition.
spectroscopy]. Conventions with respect to reagents and laboratory All instrumental methods require a reference material, even
operations should follow those given in the section “Definition of those that measure an empirical analyte. An “empirical analyte” is
Terms and Explanatory Notes,” which explains that “water is distilled an analyte or property whose value is not fixed as in stoichiometric
water,” reagents are of a purity and strength defined by the American chemical compounds but which is the result of the application of
Chemical Society (note that these may differ from standards set in the procedure used to determine it; examples are moisture, ash, fat,
other parts of the world), alcohol is the 95% aqueous mixture, and carbohydrate (by difference), and fiber. It is a “method-dependent
similar frequently used working definitions. analyte.” Usually the reference material or “standard,” which are
AOAC-approved methods may be considered as “well- specific chemical compounds, can be purchased from a supplier of
recognized test methods” as used by ISO 17025. This document chemicals and occasionally from a national metrological institute.
requires that those method properties, which may be major sources When used for reference purposes, a statement should accompany
of uncertainties of measurements, be identified and controlled. In the material certifying the identity, the purity and its uncertainty, how
AOAC methods the following operations or conditions, which may this was measured (usually by spectroscopy or chromatography),
be major contributors to uncertainties, should be understood to be and its stability and storage conditions. If no reference material
within the following limits, unless otherwise specified more strictly is available, as with many isolates from botanical specimens,
or more loosely: an available compound with similar properties may serve as a
• Weights: Within ±10% (but use actual weight for calculations) surrogate standard―a compound that is stable and which behaves
• Volumes: Volumetric flasks, graduates, and transfer pipets like the analyte but which is well resolved from it. Sometimes
(stated capacity with negligible uncertainty) an impure specimen of the analyte must serve temporarily as the
• Burets: Stated capacity except in titrations reference material until a purer specimen becomes available. The
• Graduated pipets: Use volumes >10% of capacity measured values assigned to empirical analytes are determined
• Temperatures: Set to within ±2° by strict adherence to all the details of the method of analysis.
• pH: Within ±0.05 unit Even so, their bias and variability are usually larger (poorer) than
• Time: Within ±5% chemically specified analytes. In some cases, as in determining the
If the operational settings are within these specifications, composition of milk by instrumental methods, the reference values
together with any others derived from the supporting studies, for fat, protein, and lactose are established by use of reference
the standard deviation obtained from these supporting studies in methods. In routine operation, the bias and uncertainty of the final
the same units as the reported result with the proper number of values are the combination of the uncertainties and bias correction
significant figures, usually 2 or 3, may be used as the standard arising from the routine operation with that of the reference values
measurement uncertainty. used for the calibration.
Modern instrumentation is complicated and its operation
2.3.1 Optimization
requires training and experience not only to recognize acceptable
Prior to determining the performance parameters, the method performance but also to distinguish unacceptable performance,
should be optimized so that it is fairly certain that the properties of drift, and deterioration on the part of the components. Continuous
the “final method” are being tested. Validation is not a substitute instruction and testing of the instruments and operators with in-house
for method development or for method optimization. If, however, and external standards and proficiency exercises are necessary.
some of the validation requirements have already been performed The records and report must describe the reference material,
during the development phase, there is no need to repeat them the source, and the basis for the purity statement (certification
for the validation phase. A helpful introduction is the AOAC by the supplier is often satisfactory). If the reference material is
publication “Use of Statistics to Develop and Evaluate Analytical hygroscopic, it should be dried before use either in a 100°C oven, if
Methods” by Grant T. Wernimont. This volume has only three stable, or over a drying agent in a desiccator if not. The conversion
major chapters: the measurement process, intralaboratory factor of the analyte to the reference material, if different, and its
uncertainty must be established, often through spectrophotometric decrease of a component suggests that the effect of this variable
or chromatographic properties such as absorptivity or peak height should be investigated.
or area ratios. (c) Comparison with different solvents.—Solvents with different
For recovery experiments the reference standard should be the polarities and boiling points will extract different amounts of
highest purity available. In the macro concentration range (defined extractives, but the amount of active ingredient(s) must be pursued
as about 0.1–100%) the standard ordinarily approaches 100%; by chromatographic separation or by specific reactions.
in the micro or trace (defined as µg/g to 0.1%) and ultramicro (d) Comparison with results from a different procedure.—A
or ultratrace range (µg/g and below) the standard should be at number of analyte groups, e.g., pesticide residues, have several
least 95% pure. The purity of rare or expensive standards is often different standard methods available based on different principles
established, referenced, and transferred through an absorptivity to provide targets for comparison.
measurement in a specific solvent. The impurities present should (e) System suitability checks.—Chromatographic systems
not interfere significantly with the assay. of columns, solvents (particularly gradients), and detectors are
extremely sensitive to changes in conditions. Chromatographic
2.3.3 Ruggedness Trial
properties of columns change as columns age and changes in
Although the major factors contributing to variability of a polarity of solvents or temperature must be made to compensate.
method may be explored by the classical, one variable at a time Therefore the specified properties of chromatographic systems
procedure, examining the effect of less important factors can be in standard methods such as column temperatures and solvent
accomplished by a simpler Youden Ruggedness Trial [Youden, compositions are permitted to be altered in order to optimize and
W.J., & Steiner, E.H. (1975) Statistical Manual of the Association stabilize the chromatographic output—peak height or area, peak
of Official Analytical Chemists, pp 50–55]. This design permits resolutions, and peak shape. Similarly optical filters, electrical
exploring the effect of 7 factors in a single experiment requiring components of circuits, and mechanical components of instruments
only eight determinations. It also permits an approximation of the deteriorate with age and adjustments must be made to compensate.
expected standard deviation from the variability of those factors Specifications for instruments, and their calibration and operation
that are “in control.” An example of exploring the extraction step of must be sufficiently broad to accommodate these variations.
the determination of the active ingredient in a botanical is detailed
3 Performance Characteristics
in Annex B.
2.3.4 Specific Variables
The performance characteristics are required to determine if
the method can be used for its intended purpose. The number of
If a variable is found to have an influence on the results, further significant figures attached to the value of the characteristic generally
method development is required to overcome the deficiency. For indicates the reliability of these indices. They are generally limited
example, extraction of botanicals is likely to be incomplete and by the repeatability standard deviation, sr. In most analytical work
there are no reference materials available to serve as a standard for requiring calibration the best relative sr that can be achieved is about
complete extraction. Therefore various techniques must be applied 1%. This is equivalent to the use of 2 significant figures. However,
to determine when extraction is complete; reextraction with fresh in order to avoid loss of “accuracy” in averaging operations, carry
solvent is the most common. Considerable experimentation also one additional figure with all reported values, i.e., use at most 3
may be necessary to find the optimum conditions, column, and significant figures in reporting. This statement, however, does not
solvents for chromatographic isolation of the active ingredient(s). apply to recorded raw data, such as weighing or instrument readings,
(a) Analyte addition.―Addition of a solution of the active calibration, and standardization, which should utilize the full reading
ingredient to the test sample and conducting the analysis is capacity of the measurement scales. This exception is limited by the
generally uninformative because the added analyte is already measurement scale with the least reading capacity.
in an easily extractable form. The same is true for varying the The purpose of the analysis determines which attributes are
volume of the extracting solvent. These procedures do not test the important and which may be less so.
extractability of the analyte embedded in the cell structure. For this 3.1 Applicability (Scope)
purpose, other variables must be tried, such as changing the solvent
polarity or the extraction temperature. A method must demonstrate acceptable recovery and
(b) Reextraction of the extracted residue.—Reextraction after repeatability with representative matrices and concentrations
an original extraction will test for complete extraction by the to which it is intended to be applied. For single materials, use at
original procedure. It will not test for complete extraction from least three typical specimens, at least in duplicate, with different
intractable (unextractable) plant material. For this purpose a reagent attributes (appearance, maturity, varieties, age). Repeat the analyses
that will destroy fibrous cellular material without damaging the at least one day later. The means should not differ significantly
active ingredient is required. If the analytes will not be destroyed and the repeatability should approximate those listed in Section
or interfered with by cell wall disrupting or crude fiber reagents 3.4.2 for the appropriate concentration. If the method is intended
(1.25% H2SO4 and 1.25% NaOH) and are water soluble, use these to be applied to a single commodity, e.g., fruits, cereals, fats,
solutions as extractives. But since the active ingredients are likely use several representative items of the commodity with a range
to contain compounds hydrolysable by these reagents, mechanical of expected analyte concentrations. If the method is intended to
grinding to a very fine mesh will be the more likely choice. apply to “foods” in general, select representative items from the
The efficiency of extraction is checked by application of the food triangle [Sullivan, D.M., & Carpenter, D.E. (1993) “Methods
extract to TLC, GLC, or HPLC chromatography. Higher total of Analysis for Nutrition Labeling,” AOAC INTERNATIONAL,
extractables is not necessarily an indicator of better extraction. Rockville, MD, USA, pp 115–120]. In the case of residues, the
The quantification of the active ingredient(s) is the indicator of matrices are generalized into categories such as “fatty foods” and
extraction. Many natural compounds are sensitive to light and the “nonfatty foods” that require different preliminary treatments
to remove the bulk of the “inert” carrier. In all cases, select test encountered in foods and botanical specimens to a resolution of 1.5
materials that will fairly represent the range of composition and from adjacent nontarget peaks.
attributes that will be encountered in actual practice. Applicability If the product is mixed with other substances, the added
may be inferred to products included within tested extremes but substances must be tested to ensure that they do not contain any
cannot be extrapolated to products outside the tested limits. material that will interfere with the identification and determination
Similarly the range of expected concentrations should be tested of the analyte sought. If the active constituent is a mixture, the
in a number of typical matrices, spiking if necessary, to ensure that necessity for separation of the ingredients is a decision related to
there is no interaction of analyte with matrix. the complexity of the potential separation, the constancy of the
Semipermanent “house standards” for nutrients often can be relationship of the components, and the relative biological activity
prepared from a homogeneous breakfast cereal for polar analytes of the constituents.
and from liquid monounsaturated oil like olive oil for nonpolar 3.3 Calibration
analytes for use as concurrent controls or for fortification.
The authority for the authenticity of botanical specimens and their Modern instrumental methods depend upon the comparison of a
source and the origin or history of the test materials must be given. signal from the unknown concentration of an analyte to that from a
The determination of freedom from the effects of interfering known concentration of the same or similar analyte. This requires
materials is tested under selectivity, Section 3.2, and properties the availability of a reference standard, Section 2.2.2. The simplest
related to the range of quantification of the target analyte are tested calibration procedure requires preparation of a series of standard
under the reliability characteristics, Section 3.4. solutions from the reference material, by dilution of a stock solution,
covering a reasonable range of signal response from the instrument.
3.2 Selectivity
Six to 8 points, approximately equally spaced over the concentration
The term selectivity is now generally preferred by IUPAC over range of interest, performed in duplicate but measured at random
specificity. (to avoid confusing nonlinearity with drift) is a suitable calibration
Selectivity is the degree to which the method can quantify pattern. Fit the calibration line (manually or numerous statistical
the target analyte in the presence of other analytes, matrices, or and spreadsheet programs are available) and plot the residuals
other potentially interfering materials. This is usually achieved (the difference of the experimental points from the fitted line) as
by isolation of the analyte through selective solvent extraction, a function of concentration. An acceptable fit produces a random
chromatographic or other phase separations, or by application pattern of residuals with a 0 mean. For checking linearity, prepare
of analyte-specific techniques such as biochemical reactions the individual solutions by dilution from a common stock solution to
(enzymes, antibodies) or instrumentation [nuclear magnetic avoid the random errors likely to be introduced from weighing small
resonance (NMR), infrared, or mass spectrometry (MS)]. (mg) quantities for individual standards.
Methods must be tested in the presence of accompanying As long as the purity of the reference material is 95% or greater,
analytes or matrices most likely to interfere. Matrix interference is as determined by evaluating secondary peaks or spots in gas, liquid,
usually eliminated by extraction procedures and the desired analyte or thin-layer chromatography or other quantitative technique, the
is then separated from other extractives by chromatography or impurities contributes little to the final variance at micro- and ultramicro
solid-phase extraction. Nevertheless, many methods for low-level concentrations and may be neglected. (Recovery trials, however,
analytes still require a matrix blank because of the presence of require greater purity or correction for the impurities.) The identity of
persistent, nonselective background. the material used as the reference material, however, is critical. Any
The most useful separation technique is chromatography and the suggestion of nonhomogenity such as multiple or distorted peaks
most important requirement is resolution of the desired peak from or spots, insoluble residue, or appearance of new peaks on standing
accompanying peaks. Resolution, Rs, is expressed as a function of requires further investigation of the identity of the standard.
both the absolute separation distance expressed as retention times Similarly, certified volumetric glassware may also be used after
(minutes) of the two peaks, t1 and t2, and the baseline widths, W1 initial verification of their stated capacity by weighing the indicated
and W2, of the analyte and nearest peak, also expressed in terms of volume of water for flasks and the delivered volume for pipets and
times, as burets and converting the weight to the volume delivered.
Do not use serological pipets at less than 10% of their graduated
Rs = 2 (t2 – t1) / (W1 + W2) capacity. Check the stability of the stock and initial diluted
solutions, stored at room or lower temperatures, by repeating their
Baseline widths are measured by constructing tangents to the measurements several days or weeks later. Prepare the most dilute
two sides of the peak band and measuring the distance between solutions fresh as needed from more concentrated, stable solutions
the intersection of these tangents with the baseline or at another in most cases. Bring solutions stored at refrigerator or lower
convenient position such as half-height. A resolution of at least 1.5 temperatures to room temperature before opening and using them.
is usually sought and one of 1.0 is the minimum usable separation. Plot the signal response against the concentration. A linear response
The U.S. Food and Drug Administration (FDA) suggests an Rs is desirable as it simplifies the calculations, but it is not necessary
of at least 2 for all compounds accompanying active drug dosage nor should it be regarded as a required performance characteristic. If
forms, including hydrolytic, photolytic, and oxidative degradation the curve covers several orders of magnitude, weighted regression,
products. In addition, the isolated analyte should show no evidence easily handled by computer programs, may be useful. Responses
of other compounds when chromatographed on other systems from electrochemical and immunological methods are exponential
consisting of different columns and solvents, or when examined functions, which often may be linearized by using logarithms.
by techniques utilized for specificity (infrared, NMR, or MS). Some instruments perform signal-to-concentration calculations
These requirements were developed for synthetic drug substances, automatically using disclosed or undisclosed algorithms. If the
and must be relaxed for the families of compounds commonly method is not used routinely, several standards should accompany
RESPONSE
Visual examination is usually sufficient to indicate linearity or 0.1
nonlinearity, or use the residual test, Section 3.3.
If a single (parent or associated) compound is used as the -0.2 -0.1 0.1 0.2 0.3
reference material for a series of related compounds, give their
-0.1
relationship in structure and response factors.
Note that the calibration is performed directly with the analyte -0.2
reference solutions. If these reference solutions are carried through
CONCENTRATION
the entire procedure, losses in various steps of the procedure
cannot be explored but are automatically compensated for. Some
Figure 1
procedures require correction of the final result for recovery. When
16
this is necessary, use a certified reference material, a “house”
standard, or analyte added to a blank matrix conducted through the is particularly useful for addition to the eluate from an HPLC
entire method for this purpose. If several values are available from separation when the fractions are held in an autosampler that is
different runs, the average is usually the best estimate of recovery. run overnight, where it compensates for any losses of solvent by
Differences of calibration curves from day to day may be confused evaporation. An internal standard is also frequently used in GLC
with matrix effects because they are often of the same magnitude. residue methods where many analytes with similar properties are
3.3.1 External Standard Method frequently encountered.
The most common calibration procedure utilizes a separately 3.3.3 Standard Addition Method
prepared calibration curve because of its simplicity. If there is a
When the matrix effect on an analyte is unknown or variable, the
constant loss in the procedure, this is handled by a correction factor,
method of standard additions is useful. Make measurements on the
as determined by conducting a known amount of analyte through
isolated analyte solution and add a known amount of the standard
the entire procedure. The calculation is based on the ratio of the
analyte at the same level and at twice or three (or known fractions)
response of equal amounts of the standard or reference compound
times the original level. Plot the signal against the concentration
to the test analyte. This correction procedure is time consuming and
with the initial unknown concentration set at 0. Extrapolate the line
is used as a last resort since it only improves accuracy at the expense
connecting the measured responses back to 0 response and read the
of precision. Alternatives are the internal standard procedure, blank
concentration value off the (negative) x-axis. The main assumption
matrix process, and the method of standard addition.
is that the response is linear in the working region. This method is
If the method is intended to cover a substantial range of
used most frequently with emission spectroscopy, electrochemistry,
concentrations, prepare the curve from a blank and five or seven
and radiolabeled isotopes in mass spectrometric methods.
approximately equally spaced concentration levels and repeat on a
second day. Repeat occasionally as a check for drift. If an analyte See Figure 1 for example [from Rubinson, K.A. (1987)
is examined at substantially different concentration levels, such as “Chemical Analysis,” Little, Brown and Co., Boston, MA, USA,
pesticide residues and formulations, prepare separate calibration p. 205].
curves covering the appropriate range to avoid excessive Concn Cu added, µg Instrument response
dilutions. In such cases, take care to avoid cross contamination. 0.0 0.200
However, if the analyte always occurs at or near a single level as 0.10 0.320
in a pharmaceutical, a 2-point curve may be used to bracket the 0.20 0.440
expected level, or even a single standard point, if the response over Concn Cu found by extrapolation (–)0.18
to 0.00 response
the range of interest is approximately linear. By substituting an
analyte-free matrix preparation for the blank, as might be available 3.4 Reliability Characteristics
from pesticide or veterinary drug residue studies or the excipients
from a pharmaceutical, a calibration curve that automatically These are the statistical measures of how good the method is.
compensates for matrix interferences can be prepared. Different organizations use different terms for the same concept.
The important questions are:
3.3.2 Internal Standard Method
• How close is the reported value to the true, reference, or
The internal standard method requires the addition of a known accepted value?
amount of a compound that is easily distinguished from the analyte • How close are repeated values to each other as determined in
but which exhibits similar chemical properties. The response the same or different laboratories?
ratio of the internal standard to a known amount of the reference • What is the smallest amount or concentration that can be
standard of the analyte of interest is determined beforehand. recognized or measured?
An amount of internal standard similar to that expected for the Recently accreditation organizations have been requesting the
analyte is added at an early stage of the method. This method calculation of the parameter “Measurement Uncertainty” (MU).
This is a term indicative of the reliability of the particular series of in their cultivation, growth, or feeding and verified analytically.
measurements being reported. The standard uncertainty is equal to They may also be obtained from the residues of previously extracted
the standard deviation of the series of measurements of the analyte. materials or from test samples shown to be negative for the analyte.
The expanded uncertainty is two times the standard uncertainty If an analyte-free matrix is not available, the analyte standard is
and is expected to encompass about 95% of similar future added to separate test portions and the recovery is calculated from
measurements. If too few values are available in a measurement the base determined by the method of addition, Section 3.3.3. Run
series to calculate a stable MU, the standard deviation obtained from the set of such controls with each set of test samples. If a sufficient
the validation study within the laboratory, sr, may be substituted, if number of batches are expected to be run (at least 20–30), the %
it covered the same or similar analyte/matrix/concentration range. recovery can be plotted against the run number as the basis for a
If a collaboratively studied method is being validated for use control chart. Recovery also can be obtained as a byproduct of the
within a laboratory, the standard deviation among-laboratories, sR, precision determinations, Sections 3.4.2 and 3.4.4.
reported for the method from the study should be used to determine Acceptable recovery is a function of the concentration and the
if the anticipated measurement uncertainty will be satisfactory purpose of the analysis. Some acceptable recovery requirements
for the intended purpose, assuming satisfactory repeatability as for individual assays are as follows:
demonstrated by control charts or proficiency testing. In fact, the
Concentration Recovery limits, %
determination of the reliability characteristics in the validation
study should not be undertaken until the developmental work 100% 98–101
10% 95–102
demonstrates that the data are repeatable and in statistical control. 1% 92–105
The Codex Alimentarius, an international body organized by the 0.1% 90–108
Food and Agricultural Organization (FAO) and the World Health 0.01% 85–110
Organization (WHO) of the United Nations (UN) to recommend 10 µg/g (ppm) 80–115
international food standards to governments, suggests the following 1 µg/g 75–120
10 µg/kg (ppb) 70–125
“Guidelines for the Assessment of the Competence of Testing
Laboratories Involved in the Import and Export Control of Food” The Codex Alimentarius “Residues of Veterinary Drugs in
(FAO, Rome, Italy, CAC/GL 27-1997) for laboratories: Foods” [2nd Ed., Vol. 3 (1993) Joint FAO/WHO Food Standards
• Comply with the general competence criteria of ISO 17025 Program, FAO, Rome, Italy, p. 59] suggests the following limits for
• Participate in proficiency testing schemes for food analysis residues of veterinary drugs in foods:
• Utilize validated methods
Concentration, µg/kg Acceptable range
• Utilize internal quality control procedures
≤1 50–120
3.4.1 Accuracy ≥1 < 10 60–120
≥10 < 100 70–110
The term “accuracy” has been given so many meanings that it is ≥100 80–110
better to use a more specific term. Ordinarily it means closeness of
the test result to the “true” or accepted value. But the test result can These limits may be modified as needed in view of the variability
be an individual value, the average of a set of values, or the average of individual results or which set of regulatory requirements are
of many sets of values. Therefore, whenever the term is used, the referenced. (As a rough guide to typical performance, about 95%
number of values it represents and their relationship must always of normally distributed typical results in a single laboratory at
be stated, e.g., as an individual result, as the average of duplicates 1 µg/g will fall within 80–120% of the mean.) In the case of the
or n replicates, or as the average of a set of a number of trials. The examination of the general USDA pesticide residue proficiency
difference of the reported value from the accepted value, whether it study, limits of 50–150% were applied; the FDA acceptability
is an individual value, an average of a set of values, or the average criterion for recovery of drug residues at the 10 ppb level is
of a number of averages, or an assigned value, is the bias under the 70–120%. Generally, however, recoveries less than 60–70%
reported conditions. The frequently used term for bias or “accuracy” should be subject to investigations leading to improvement and
when the average of a set of values is reported is “trueness.” average recoveries greater than 110% suggest the need for better
The fraction or percentage of the analyte that is recovered separations. Most important, recoveries greater than 100% must
when the test sample is conducted through the entire method is the not be discarded as impossible. They are the expected positive
recovery. The best reference materials for determining recovery are side from a typical distribution of analytical results from analytes
analyte-certified reference materials (CRMs) distributed by national present at or near 100% that are balanced by equivalent results on
metrological laboratories, but in most cases material certified by the negative side of the mean.
a commercial supplier must be accepted. Occasionally standards If an extraction of active ingredient from a matrix with a solvent
are available from a government agency, such as pesticides from is used, test extraction efficiency by reextracting the (air-dried)
the Environmental Protection Agency (EPA). They are rarely, if residue and determining the active ingredient(s) in the residue by
ever, available in the matrix of interest but rather as a solution in the method.
a convenient solvent with a stated concentration and uncertainty. The number of units to be used to establish bias is arbitrary,
Such reference materials must then be tested in the matrix of but the general rule is the more independent “accuracy” trials,
interest. Even rarer is an isotopically labeled analyte that can be the better. The improvement, as measured by the width of the
easily followed by isotopic analytical techniques. confidence interval for the mean, follows the square root of the
The available certified or commercial analyte standard, diluted if number of trials. Once past 8–10 values, improvement comes
necessary, is added to typical analyte-free matrices at levels about 1x slowly. To fully contribute, the values must be conducted
or 2x the expected concentration. Analyte-free matrices for residues independently, i.e., nonsimultaneously, throwing in as many
are obtained from growers who certify that the chemical is not used environmental or spontaneous differences as possible, such as
different analysts, instruments, sources of reagents, time of day, typical precision. Theoretically the individual determinations
temperature, barometric pressure, humidity, power supply voltage, should be independent but this condition is practically impossible
etc. Each value also contributes to the within-laboratory precision to maintain when determinations are conducted simultaneously and
as well. A reasonable compromise is to obtain 10 values from a therefore this requirement is generally ignored.
reference material, a spiked matrix, or by the method of standard To obtain a more representative value for the repeatability
addition scattered over several days or in different runs as the basis precision perform the simultaneous replicates at different times (but
for checking bias or recovery. By performing replicates, precision the same day), on different matrices, at different concentrations.
is obtained simultaneously. Precision obtained in such a manner is Calculate the standard deviation of repeatability from at least five
often termed “intermediate precision” because its value is between pairs of values obtained from at least one pair of replicates analyzed
within-laboratory and among-laboratory precision. When reported, with each batch of analyses for each pertinent concentration level
the conditions that were held constant and those that were varied that differs by approximately an order of magnitude and conducted
must be reported as well. at different times. The object is to obtain representative values,
Note that the series of determinations conducted for the method not the “best value,” for how closely replicates will check each
of addition are not independent because they are probably prepared other in routine performance of the method. Therefore these sets
from the same standard calibration solution, same pipets, and are of replicate analyses should be conducted at least in separate
usually conducted almost simultaneously. This is satisfactory for runs and preferably on different days. The repeatability standard
their intended purpose of providing an interrelated function, but it deviation varies with concentration, C expressed as a mass fraction.
is not satisfactory for a precision function estimation intended for Acceptable values approximate the values in the following table or
future use. calculated by the formula:
Related to recovery is the matter of reporting the mean corrected
or not corrected for recovery. Unless specifically stated in the RSDr, % = 2C–0.15
method to correct or not, this question is usually considered a
“policy” matter and is settled administratively outside the unless there are reasons for using tighter requirements.
laboratory by a regulatory pronouncement, informal or formal
agreement, or by contract. If for some reason a value closest to Concentration Repeatability (RSDr), %
theory is needed, correction is usually applied. If a limit or tolerance 100% 1
has been established on the basis of analytical work with the same 10% 1.5
method correlated with “no effect” levels, no correction should be 1% 2
0.1% 3
applied because it has already been used in setting the specification. 0.01% 4
Corrections improve “accuracy” at the expense of impairing 10 µg/g (ppm) 6
precision because the variability of both the determination and the 1 µg/g 8
recovery are involved. 10 µg/kg (ppb) 15
When it is impossible to obtain an analyte-free matrix to serve as
Acceptable values for repeatability are between ½ and 2 times
a base for reporting recovery, two ways of calculating recovery must
the calculated values. Alternatively a ratio can be calculated of the
be distinguished: (1) Total recovery based on recovery of the native
found value for RSDr to that calculated from the formula designated
plus added analyte, and (2) marginal recovery based only on the added
as HorRatr. Acceptable values for this ratio are typically 0.5 to 2:
analyte (the native analyte is subtracted from both the numerator and
denominator). Usually total recovery is used unless the native analyte HorRatr = RSDr (found, %)/RSDr (calculated, %)
is present in amounts greater than about 10% of the amount added, in
which case use the method of addition, Section 3.3.3. The term “repeatability” is applied to parameters calculated
When the same analytical method is used to determine both the from simultaneous replicates and this term representing minimum
concentration of the fortified, Cf, and unfortified, Cu, test samples, variability is equated to the “within-laboratory” parameter
the % recovery is calculated as (standard deviation, variance, coefficient of variation, relative
standard deviation) of the precision model equation. It should be
Recovery, % = (Cf – Cu) × 100/Ca
distinguished from a somewhat larger within-laboratory variability
that would be induced by non-simultaneous replicates conducted
where Ca is the calculated (not analyzed) concentration of analyte
in the same laboratory on identical test samples on different days,
added to the test sample. The concentration of added analyte should
by different analysts, with different instruments and calibration
be no less that the concentration initially present and the response
curves, and with different sources of reagents, solvents, and
of the fortified test sample must not exceed the highest point of the
columns. When such an “intermediate” within-laboratory precision
calibration curve. Both fortified and unfortified test samples must
(standard deviation, variance, coefficient of variation, relative
be treated identically in the analysis.
standard deviation) is used, a statement of the conditions that
were not constant must accompany it. These within-laboratory
3.4.2 Repeatability Precision (sr, RSDr)
conditions have also been called within-laboratory reproducibility,
Repeatability refers to the degree of agreement of results when an obvious misnomer.
conditions are maintained as constant as possible with the same
3.4.3 Measurement Uncertainty
analyst, reagents, equipment, and instruments performed within a
short period of time. It usually refers to the standard deviation of Accreditation organizations have been requesting laboratories
simultaneous duplicates or replicates, sr. It is the best precision that will to have a parameter designated as “measurement uncertainty”
be exhibited by a laboratory but it is not necessarily the laboratory’s associated with methods that the laboratory utilizes. The official
metrological definition of measurement uncertainty is “a parameter 3.4.4 Reproducibility Precision (sR, RSDR)
associated with the result of a measurement that characterizes Reproducibility precision refers to the degree of agreement of
the dispersion of values that could reasonably be attributed to the results when operating conditions are as different as possible. It
measurand.” A note indicates, “the parameter may be, for example, usually refers to the standard deviation (sR) or the relative standard
a standard deviation (or a given multiple of it), or the width of a deviation (RSDR) of results on the same test samples by different
confidence interval.” laboratories and therefore is often referred to as “between-laboratory
Of particular pertinence is the fact that the parameter applies to precision” or the more grammatically correct “among-laboratory
a measurement and not to a method (see Section 3.4). Therefore precision.” It is expected to involve different instruments, different
“standard” measurement uncertainty is the standard deviation analysts, different days, and different laboratory environments
or relative standard deviation from a series of simultaneous and therefore it should reflect the maximum expected precision
measurements. “Expanded” uncertainty is typically twice the exhibited by a method. Theoretically it consists of two terms:
standard uncertainty and is considered to encompass approximately the repeatability precision (within-laboratory precision, sr) and
95% of future measurements. This is the value customarily used in the “true” between-laboratory precision, sL. The “true” between-
determining if the method is satisfactory for its intended purpose laboratory precision, sL, is actually the pooled constant bias of
although it is only an approximation because theoretically it applies each individual laboratory, which when examined as a group is
to the unknown “true” concentration. treated as a random variable. The between-laboratory precision
Since the laboratory wants to know beforehand if the method too is a function of concentration and is approximated by the
will be satisfactory for the intended purpose, it must use the Horwitz equation, sR = 0.02C0.85. The AOAC/IUPAC protocol for
parameters gathered in the validation exercises for this purpose, interlaboratory studies requires the use of a minimum of eight
substituting the measurement values for the method values after laboratories examining at least five materials to obtain a reasonable
the fact. As pointed out by M. Thompson [Analyst 125, 2020–2025 estimate of this variability parameter, which has been shown to be
(2000); see Inside Lab. Mgmt. 5(2), 5(2001)], a ladder of errors more or less independent of analyte, method, and matrix.
exist for this purpose. By definition sR does not enter into single-laboratory validation.
• Duplicate error (a pair of tests conducted simultaneously) However, as soon as a second (or more) laboratory considers the
• Replicate or run error (a series of tests conducted in the same data, the first question that arises involves reanalysis by that second
group) laboratory: “If I had to examine this or similar materials, what would
• Within-laboratory error (all tests conducted by a laboratory) I get?” As a first approximation, in order to answer the fundamental
• Between-laboratory error (all tests by all laboratories) question of validation―fit for the intended purpose―assume that
As we go down the series, the possibility of more errors being the recovery and limit of determination are of the same magnitude
included is increased until a maximum is reached with the all as the initial effort. But the variability, now involving more than
inclusive reproducibility parameters. Thompson estimates the one laboratory, should be doubled because variance, which is the
relative magnitude of the contribution of the primary sources of square of differences, is involved, which magnifies the effect of this
error as follows parameter. Therefore we have to anticipate what another laboratory
Level of variation Separate Cumulative would obtain if it had to validate the same method. If the second
Repeatability 1.0 1.0
laboratory on the basis of the doubled variance concludes the
Runs 0.8 1.3 method is not suitable for its intended purpose, it has saved itself
Laboratories 1.0 1.6 the effort of revalidating the method.
Methods 1.5 2.2 In the absence of such an interlaboratory study, the interlaboratory
Ordinarily only one method exists or is being validated so we precision may be estimated from the concentration as indicated in
can ignore the last line. Equating duplicates to replicability, runs the following table or by the formula (unless there are reasons for
to within-laboratory repeatability, and laboratories to among- using tighter requirements):
laboratories reproducibility, Thompson points out that the three
RSDR = 2C–0.15
sources of error are roughly equal and not much improvement
in uncertainty would result from improvement in any of these
sources. In any case, the last column gives an approximate relative or
relationship of using the standard deviation at any point of the
ladder as the basis for the uncertainty estimate prior to the actual
analytical measurements. SR = 0.02C0.85
In the discussion of uncertainty it must be noted that bias as
measured by recovery is not a component of uncertainty. Bias (a
Concentration, C Reproducibility (RSDR), %
constant) should be removed by subtraction before calculating
standard deviations. Differences in bias as exhibited by individual 100% 2
10% 3
laboratories become a component of uncertainty through the
1% 4
among-laboratory reproducibility. The magnitude of the uncertainty 0.1% 6
depends on how it is used―comparisons within a laboratory, with 0.01% 8
other laboratories, and even with other methods. Each component 10 µg/g (ppm) 11
adds uncertainty. Furthermore, uncertainty stops at the laboratory’s 1 µg/g 16
10 µg/kg (ppb) 32
edge. If only a single laboratory sample has been submitted and
analyzed, there is no basis for estimating sampling uncertainty. Acceptable values for reproducibility are between ½ and 2
Multiple independent samples are required for this purpose. times the calculated values. Alternatively a ratio can be calculated
of the found value for RSDR to that calculated from the formula limits and will probably provide a value of the same magnitude as
designated as HorRatR. Acceptable values for this ratio are typically that derived from the relative standard deviation formulae.
0.5 to 2: The detection limit is only useful for control of undesirable
impurities that are specified as “not more than” a specified low level
HorRatR = RSDR (found, %)/RSDR (calculated, %) and for low-level contaminants. Useful ingredients must be present
at high enough concentrations to be functional. The specification
As stated by Thompson and Lowthian (“The Horwitz Function level must be set high enough in the working range that acceptable
Revisited,” (1997) J. AOAC Int. 80, 676–679), “Indeed, a precision materials do not produce more than 5% false-positive values, the
falling within this ‘Horwitz Band’ is now regarded as a criterion for default statistical acceptance level. Limits are often at the mercy
a successful collaborative trial.” of instrument performance, which can be checked by use of pure
The typical limits for HorRat values may not apply to indefinite standard compounds. Limits of detection and determination are
analytes (enzymes, polymers), physical properties, or to the results unnecessary for composition specifications although the statistical
from empirical methods expressed in arbitrary units. Better than problem of whether or not a limit is violated is the same near zero
expected results are often reported at both the high (>10%) and low as it is at a finite value.
(<E-8) ends of the concentration scale. Better than predicted results Blank values must be monitored continuously as a control of
can also be attained if extraordinary effort or resources are invested reagents, cleaning of glassware, and instrument operation. The necessity
in education and training of analysts and in quality control. for a matrix blank would be characteristic of the matrix. Abrupt
changes require investigation of the source and correction. Taylor
3.4.5 Intermediate Precision
[J.K. Taylor (1987) “Quality Assurance of Chemical Measurements,”
The precision determined from replicate determinations conducted Lewis Publishers, Chelsea, MI, p. 127] provides two empirical rules
within a single laboratory not simultaneously, i.e., on different for applying a correction in trace analysis: (1) The blank should be no
days, with different calibration curves, with different instruments, more than 10% of the “limit of error of the measurement”, and (2) it
by different analysts, etc. is called intermediate precision. It lies should not exceed the concentration level.
between the within- and among-laboratories precision, depending on
3.4.7 Reporting Low-Level Values
the conditions that are varied. If the analysis will be conducted by
different analysts, on different days, on different instruments, conduct Although on an absolute scale low level values are miniscule,
at least five sets of replicate analyses on the same test materials under they become important in three situations:
these different conditions for each concentration level that differs by (1) When legislation or specifications decrees the absence of an
approximately an order of magnitude. analyte (zero tolerance situation).
(2) When very low regulatory or guideline limits have been
3.4.6 Limit of Determination
established in a region of high uncertainty (e.g., a tolerance of
The limit of determination is a very simple concept: It is the 0.005 µg/kg aflatoxin M1 in milk).
smallest amount or concentration of an analyte that can be (3) When dietary intakes of low-level nutrients or contaminants
estimated with acceptable reliability. But this statement contains an must be determined to permit establishment of minimum
inherent contradiction: the smaller the amount of analyte measured, recommended levels for nutrients and maximum limits for
the greater the unreliability of the estimate. As we go down the contaminants.
concentration scale, the standard deviation increases to the point Analytical work in such situations not only strains the limits of
where a substantial fraction of values of the distribution of results instrumentation but also the ability of the analyst to interpret and
overlaps 0 and false negatives appear. Therefore the definition of report the findings. Consider a blank that is truly 0 and that the
the limit comes down to a question of what fraction of values are 10% point of the calibration curve corresponds to a concentration
we willing to tolerate as false negatives. of 1 µg/kg (E-9). By the Horwitz formula this leads to an expected
Thompson and Lowthian (loc. cit.) consider the point defined RSDr in a single laboratory of about 23%. If we assume a normal
by RSDR = 33% as the upper bound for useful data, derived from distribution and we are willing to be wrong 5% of the time, what
the fact that 3RSDR should contain 100% of the data from a normal concentration levels would be expected to appear? From 2-tail
distribution. This is equivalent to a concentration of about 8 × 10–9 normal distribution tables (the errant value could appear at either
(as a mass fraction) or 8 ng/g (ppb). Below this level false negatives end), 2.5% of the values will be below 0.72 µg/kg and 2.5% will be
appear and the data goes “out of control.” From the formula, this above 1.6 µg/kg. Note the asymmetry of the potential results, from
value is also equivalent to an RSDr ≈ 20%. The penalty for operating 0.7 to 1.6 µg/kg for a nominal 1.0 µg/kg value from the nature of
below the equivalent concentration level is the generation of false the multiplicative scale when the RSD is relatively large.
negative values. Such signals are generally accepted as negative But what does the distribution look like at zero? Mathematically
and are not repeated. it is intractable because it collapses to zero. Practically, we can
An alternative definition of the limit of detection and limit of assume the distribution looks like the previous one but this time we
determination is based upon the variability of the blank. The blank will assume it is symmetrical to avoid complications. The point to
value, xBl, plus 3 times the standard deviation of the blank (xBl + be made will be the same. For a distribution to have a mean equal
3sBl) is taken as the detection limit and the blank value plus 10 to 0, it must have negative as well as positive values. But negative
times the standard deviation of the blank (xBl + 10sBl) is taken concentration values per se are forbidden but here they are merely
as the determination limit. The problem with this approach is an artifact of transforming measured signals. Negative signals are
that the blank is often difficult to measure or is highly variable. typical in electromotive force and absorbance measurements.
Furthermore, the value determined in this manner is independent of Analysts have an aversion to reporting a zero concentration
the analyte. If blank values are accumulated over a period of time, value because of the possibility that the analyte might be present,
the average is likely to be fairly representative as a basis for the but below the detection limit. Likewise, analysts avoid reporting
3.5 Controls require that the identity of the analyte of interest be confirmed by
an independent procedure. This confirmation of chemical identity
3.5.1 Control Charts is in addition to a quantitative “check analysis,” often performed
independently by a second analyst to confirm that the quantity of
Control charts are only useful for large volume or continuous
analyte found in both analyses exceeds the action limit.
work. They require starting with at least 20–30 values to calculate
Confirmation provides unequivocal evidence that the chemical
a mean and a standard deviation, which form the basis for control
structure of the analyte of interest is the same as that identified
values equivalent to the mean ± 2 sr (warning limits) and the mean
in the regulation. The most specific method for this purpose is
± 3 sr (rejection limits). At least replicate test portions of a stable
mass spectrometry following a chromatographic separation with
house reference material and a blank are run with every batch of
a full mass scan or identification of three or four fragments that
multiple test samples and the mean and standard deviations (or
are characteristic of the analyte sought or the use of multiple mass
range of replicates) of the controls and blank are plotted separately.
spectrometric (MSn) examination. Characteristic bands in the
The analytical process is “in control” if not more than 5% of
infrared can also serve for identification but this technique usually
the values fall in the warning zone. Any value falling above the requires considerably more isolated analyte than is available
rejection limit or two consecutive values in the warning region from chromatographic separations unless special examination
requires investigation and corrective action. techniques are utilized. Visible and ultraviolet spectra are too
3.5.2 Injection Controls subject to interferences to be useful, although characteristic peaks
can suggest structural characteristics.
A limit of 1 or 2% is often placed on the range of values of the Other techniques that can be used for identification, particularly
peak heights or areas or instrument response of repeated injections in combination, in approximate order of specificity, include:
of the final isolated analyte solution. Such controls are good for (1) Co-chromatography, where the analyte, when mixed with
checking stability of the instrument during the time of checking but a standard and then chromatographed by HPLC, GLC, or TLC,
give no information as to the suitability of the isolation part of the exhibits a single entity, a peak or spot with enhanced intensity.
method. Such a limit is sometimes erroneously quoted as a relative (2) Characteristic fluorescence (absorption and emission) of the
standard deviation when range is meant. native compound or derivatives.
3.5.3 Duplicate Controls (3) Identical chromatographic and spectral properties after
isolation from columns of different polarities or with different
Chemists will frequently perform their analyses in duplicate in solvents.
the mistaken belief that if duplicates check, the analysis must have Identical full-scan visible or ultra-violet spectra, with matching
been conducted satisfactorily. ISO methods often require that the peak(s).
determinations be performed in duplicate. Simultaneous replicates Furthermore, no additional peaks should appear when
are not independent—they are expected to check because the chromatographic conditions are changed, e.g., different solvents,
conditions are identical. The test portions are weighed out using columns, gradients, temperature, etc.
the same weights, aliquots are taken with the same pipets, the same
3.7 Stability of the Analyte
reagents are used, operations are performed within the same time
frame, instruments are operated with the same parameters, and the The product should be held under typical or exaggerated storage
same operations are performed identically. Under such restraints, conditions and the active ingredient(s) assayed periodically for
duplicates that do not check would be considered as outliers. a period of time judged to reasonably exceed the shelf life of
Nevertheless, the parameter calculated from duplicates within a the product. In addition, the appearance of new analytes from
laboratory is frequently quoted as the repeatability limit, r, as equal deterioration should be explored, most easily by a fingerprinting
to 2*√2*sr and is expected to encompass 95% of future analyses technique, Section 2.1.
conducted similarly. The corresponding parameter comparing two 4 Report (as applicable)
values in different laboratories is the reproducibility limit, R =
2*√2*sR. This parameter is expected to reflect more independent 4.1 Title
operations. Note the considerable difference between the • Single-Laboratory Validation of the Determination of
standard deviations, sr and sR, an average-type parameter, and the [Analyte] in [Matrix] by [Nature of Determination]
repeatability and reproducibility limits, r and R, which are 2.8 • Author, Affiliation
times larger. If duplicates do not check within the r value, look for • Other Participants
a problem—methodological, laboratory, or sample in origin. Note
4.2 Applicability (Scope)
that these limits (2*√2 = 2.8) are very close to the limits used for
rejection in control charts 3*sr. Therefore they are most useful for • Analytes (common and chemical name; CAS registry number
large volume routine work rather than for validation of methods. or Merck index number)
Note the considerable difference between the standard deviations, • Matrices used
sr and sR, an average-type parameter, and the repeatability and • In presence of
reproducibility limits, r and R, which are 2.8 times larger. • In absence of
3.6 Confirmation of Analyte • Safety statements applicable to product
4.3 Principle
Because of the existence of numerous chemical compounds,
some of which have chemical properties very close to analytes of • Preparation of test portion
interest, particularly in chromatographic separations, but different • Extraction
biological, clinical, or toxicological properties, regulatory decisions • Purification
• Separation ANNEX A
• Measurement Abbreviations and Symbols Used
• Alternatives
• Interferences CAS Chemical Abstracts Service (Registry Number)
CRM Certified Reference Material
4.4 Reagents FDA U.S. Food and Drug Administration
EPA U.S. Environmental Protection Agency
(Reagents usually present in a laboratory need not be listed.) GLC Gas-liquid chromatography
• Reference standards, identity, source, purity HPLC High-performance liquid chromatography
• Calibration standard solutions, preparation, storage, stability i (as a subscript) Intermediate in precision terms
• Solvents (special requirements) ISO International Organization for Standardization
MU Measurement Uncertainty
• Buffers
MS Mass Spectrometry
• Others MSn Multiple mass spectrometry
4.5 Apparatus NMR Nuclear magnetic resonance
r, R Repeatability, reproducibility limits: The value less than or
(Equipment usually present in a laboratory need not be listed; equal to the absolute difference between two test results
provide source, Web address, and catalog numbers of special obtained under repeatability (reproducibility) conditions is
items.) expected to be with a probability of 95% = 2*√2*sr(sR)
RSDr Repeatability relative standard deviation = sr × 100
• Chromatographic equipment (operating conditions; system RSDR Reproducibility relative standard deviation = sR × 100
suitability conditions; expected retention times, separation sr Repeatability standard deviation (within-laboratories)
times, peak or area relations) sR Reproducibility standard deviation (among-laboratories)
• Temperature-controlled equipment Mean, average
• Separation equipment (centrifuges, filters)
• Measurement instruments ANNEX B
Example of a Ruggedness Trial
4.6 Calibration
• Range, number and distribution of standards, replication, Choose seven factors that may affect the outcome of the
stability extraction and assign reasonable high and low values to them as
4.7. Procedure
follows:
Factor High value Low value
• List all steps of method, including any preparation of the test
sample. Weight of test portion A = 1.00 g a = 0.50 g
Extraction temperature B = 30° b = 20°
• Critical points Volume of solvent C = 100 mL c = 50 mL
• Stopping points Solvent D = Alcohol d = Ethyl acetate
4.8 Calculations Extraction time E = 60 min e = 30 min
Stirring F = Magnetically f = Swirl 10 min
• Formulae, symbols, significant figures intervals
Irradiation G = Light g = Dark
4.9 Controls
Conduct eight runs (a single analysis that reflects a specified
4.10 Results of Validation set of factor levels) utilizing the specific combinations of high and
low values for the factors as follows, and record the result obtained
4.10.1 Identification Data for each combination. (It is essential that the factors be combined
• Analytes measured and properties utilized (matrices tested; exactly as specified or erroneous conclusions will be drawn.)
reference standard, source, identity, purity) Run No. Factor combinations Measurement obtained
4.10.2 Performance Data 1 A B C D E F G x1
2 A B c D e f g x2
• Recovery of control material 3 A b C d E f g x3
• Repeatability (by replication of entire procedure on same test 4 A b c d e F G x4
5 a B C d e F g x5
sample)
6 a B c d E f G x6
• Limit of determination ]concentration where RSDr = 20% or 7 a b C D e f G x7
(blank + 10 * sblank)] 8 a b c D E F g x8
• Expanded measurement uncertainty 2*sr
To obtain the effect of each of the factors, set up the differences
4.10.3 Low-Level Data of the measurements containing the subgroups of the capital letters
Report instrument reading converted to a concentration through and the small letters from column 2 thus:
the calibration curve: positive, negative, or zero. Do not equate to
Effect of A and a
0, do not truncate data, or report “less than.”
[(x1 + x2 + x3 + x4)/4] – [(x5 + x6 + x7 + x8)/4] = J
Interpretation: Concentrations less than 5 µg/kg may be reported
4A/4 – 4a/4 = J
as “zero” or “less than 5 µg/kg” with a 95% probability (5% chance
of being incorrect).
Note that the effect of each level of each chosen factor is the
4.10.4 Stability Data average of four values and that the effects of the seven other factors
Effect of B and b
[(x1 + x2 + x5 + x6)/4] – [(x3 + x4 + x7 + x8)/4] = K
4B/4 – 4b/4 = K
These values are plotted on a line. In this case they are more
or less uniformly scattered along the line, but some attention
Effect of C and c should be paid to the extremes. Factor D, the highest positive
[(x1 + x3 + x5 + x7)/4] – [(x2 + x4 + x6 + x8)/4] = L value represents a difference in solvent, as expected, and this
4C/4 – 4c/4 = L factor has to be investigated further to determine if the high
values represents impurities or additional active ingredient. The
extreme value of factor G suggests that the extraction should be
Effect of D and d conducted in the dark. As discussed by Youden, considerably more
[(x1 + x2 + x7 + x8)/4] – [(x3 + x4 + x5 + x6)/4] = M
information can be obtained by utilizing several different materials
4D/4 – 4d/4 = M
and several independent replications in different laboratories, so
as to obtain an estimate of the standard deviation to be expected
Effect of E and e between laboratories. Although the ruggedness trial is primarily a
[(x1 + x3 + x6 + x8)/4] – [(x2 + x4 + x5 + x7)/4] = N method development technique, validation of the application of a
4E/4 – 4e/4 = N method to different matrices and related analytes can be explored
simultaneously by this procedure.
Comments not used (may be added later):
Effect of F and f
3.3 Calibration: Run standards from low to high to compensate
[(x1 + x4 + x5 + x8)/4] – [(x2 + x3 + x6 + x7)/4] = O
for any carryover. [Run in random order to compensate for drift
4F/4 – 4f/4 = O
is more important than allowing for carryover which should not
occur.]
Effect of G and g Independently made standards results in considerable random
[(x1 + x4 + x6 + x7)/4] – [(x2 + x3 + x5 + x8)/4] = P error in the calibration curve and is in fact the major source of
4G/4 – 4g/4 = P random error in spectrophotometry. [Therefore a common stock
solution is the preferred way of preparing the individual standards.]
Perform the eight determinations or runs carefully using the Version 54 contains revisions as a result of comments from
assigned factor level combinations and tabulate the values found.
[email protected] and McClure. Outline:
Then unscramble the 7 factors and obtain the effect of the assigned
I. Types and benefits of each method validation study without
factor as the last number. It is important to use the combination of
reproducibility
subscripts as assigned for proper interpretation.
II. Preparing for a Single-Laboratory Method Validation Study
Expt. Found, % Factors
III. Review of Performance Characteristics of a Method
x1 1.03 J (A) = 4A/4 – 4a/4 = 4.86 – 5.14 = –0.28 IV. Errors
x2 1.32 K (B) = 4B/4 – 4b/4 = 4.79 – 5.21 = –0.42
x3 1.29 L (C) = 4C/4 – 4c/4 = 4.86 – 5.14 = –0.28 V. Calibration and Types
x4 1.22 M (D) = 4D/4 – 4d/4 = 5.05 – 4.95 = +0.10 VI. Bias and Precision Estimations (no reference standard; no
x5 1.27 N (E) = 4E/4 – 4e/4 = 4.92 – 5.08 = –0.16 reproducibility)
x6 1.17 O (F) = 4F/4 – 4f/4 = 4.95 – 5.05 = –0.10
x7 1.27 P (G) = 4G/4 – 4g/4 = 4.69 – 5.31 = –0.62 VII. Detection and Quantification Limits
x8 1.43 VIII. Ruggedness
A subset of the ESF that is selected for the validation study. The A botanical material mixture that has the minimum acceptable
identity of these materials should be verified by an appropriate concentration of the target material, as specified by the SMPR. The
method or process. BIM must identify this material with a specified minimum level of
3.7 Identity Specification (IS) POI with 95% confidence. The ideal BIM would accept the SSTM
100% of the time. The SSTM will typically be high-quality target
The morphological, genetic, chemical, or other characteristics that material mixed with a small amount of worst-case (for identification)
define a target botanical material. Specifications may include, but are nontarget material.
not limited to, data from macroscopic, microscopic, genetic (e.g., 3.18 Standard Method Performance Requirements
DNA sequencing), chromatographic fingerprinting (e.g., capillary
electrophoresis, gas chromatography, liquid chromatography, Performance requirements based on the fitness-for-purpose
or thin-layer chromatography), and spectral fingerprinting (e.g., statement for each method. For BIMs, the SMPRs should include the
infrared, near-infrared, nuclear magnetic resonance, ultraviolet/ physical form of the sample, the ISF, the ESF, the SSTM, the SITM,
visible absorbance, or mass spectrometry) methods. the number of samples for the inclusivity/exclusivity panels, and the
desired probability and confidence limits for the method.
3.8 Inclusivity
3.19 Target Botanical Material
Ability of a BIM to correctly identify variants of the target material
that meet the identity specification. The botanical material of interest as described in the identity
specification.
3.9 Inclusivity Sampling Frame (ISF)
3.20 Test Portion
A list of practically obtainable botanical materials that are expected
to give a positive result when tested by the BIM. The inclusivity The portion of the laboratory sample that is subjected to analysis
frame should be sufficiently large that the botanical variation is by the method.
adequately represented. Sources of variation may include, but are not 4 Validation Study Guidelines
limited to, species, subspecies, cultivar, growing location, growing A validated BIM requires a method validation study that
conditions, growing season, and post-harvest processing. demonstrates its acceptability according to the SMPRs. The
3.10 Inclusivity Panel guidelines presented here are intended to be applied to any
qualitative BIM that returns a binary, YES/NO test result (Annex A).
A subset of the ISF that is selected for the validation study. These
The guidelines provide technical guidance in validating the method
materials should be authenticated by an appropriate method.
based on the POI model (Annex B).
3.11 Laboratory Sample
4.1 SMPRs
Sample as prepared for sending to the laboratory intended for
The SMPRs will be prepared by the appropriate AOAC body as
inspection or testing.
per AOAC policy. The SMPRs will specify (1) the target botanical
3.12 Nontarget Botanical Material material, (2) the physical form of the material, (3) a list of botanical
Any botanical material that does not meet the identity specification. materials for the ISF/ESF, (4) composition of the SSTM and
SITM, (5) maximum POI for the SITM and minimum POI for the
3.13 Physical Form
SSTM, and (6) the desired probability and confidence limits for the
Botanical materials exist in a number of physical forms. The inclusivity/exclusivity and SSTM/SITM measurements.
form(s) will be specified by the Standard Method Performance The SMPRs will consider the nature of the material being tested
Requirements (SMPRs®). and determine the necessary breadth and depth of the inclusivity and
exclusivity panels. In some cases, a few, very similar exclusivity
3.14 Probability of Identification (POI)
panel materials may require in-depth testing (more test portions of
The expected or observed fraction of test portions at a given a smaller group of materials). Conversely, the nature of the material
concentration that give a positive result when tested by the BIM. A may require greater breadth (fewer test portions of a greater number
general description is provided in Annex B. of materials).
3.15 Sample The number of test portions needed should be determined on
sound statistical grounds (Annex C) and subject matter expertise.
A small portion or quantity, taken from a population or lot that is
4.2 SLV Study
ideally a representative selection of the whole. Sample homogeneity
is usually determined with multiple samples.
4.2.1 Scope
3.16 Specified Inferior Test Material (SITM)
An SLV study is intended to determine the performance of a
A botanical material mixture that has the maximum concentration candidate method (Annex A). For validation purposes, the candidate
of target material that is considered unacceptable, as specified by the BIM may be regarded as a black box providing a binary, YES/NO
SMPRs. The BIM must reject this material with a specified minimum test result. The study is designed to evaluate performance parameters
level of (1 – POI) with 95% confidence. The ideal BIM would reject for the candidate method including (1) inclusivity/exclusivity,
the SITM 100% of the time (i.e., accept 0% of the time). The SITM (2) POI for the SSTM and the SITM, and (3) POI as a function of the
will typically be high-quality target material mixed with the worst- concentration of the target material (analytical response curve). This
case (for identification) nontarget material. last parameter may be optional as specified by the SMPRs.
4.2.2 Inclusivity/Exclusivity Study analyst(s) cannot know the identity of the samples. Analyze the test
samples following the instructions of the candidate method.
The purpose of this study is to confirm the ability of the candidate
method to provide positive results (YES answers) for botanical 4.2.3.3 Data Analysis and Reporting
materials on the inclusivity panel and negative results (NO answers)
The data will be analyzed for positive and negative responses. For
for materials on the exclusivity panel.
the SSTM and the SITM, report the POI results with 95% confidence
4.2.2.1 Inclusivity/Exclusivity Panel Selection intervals and the total number tested and the total number correctly
identified. Comparison to SMPRs should be made and discussed.
Botanical materials selected from the ISF/ESF will comprise
the inclusivity/exclusivity panels. If the ISF/ESF specified by the 4.2.4 Analytical Response Curve
SMPRs are sufficiently large, a representative subgroup will be
This study will characterize the POI curve for mixtures of SSTM
selected for the panels by the method validator. Primary requirements
and SITM.
for the panel materials are their availability and identity verification
by an appropriate method or process. All test portions should be as 4.2.4.1 Test Samples
uniform and homogeneous as possible. The level of replication of the The appropriate amount of a target material is selected from
inclusivity/exclusivity panels will be specified in the SMPRs. the inclusivity panel and is mixed with an appropriate amount of a
4.2.2.2 Study Design nontarget material from the exclusivity panel to produce mixtures
with concentrations intermediate between the SSTM and SITM. The
Prepare the test samples in a form appropriate for the candidate test materials shall be prepared using the same target and nontarget
method. All test samples will be blinded and randomized so that the botanical material samples used in the SSTM and SITM study. The
analyst(s) cannot know the identity of the samples. Analyze the test test materials may also be prepared by mixing appropriate ratios of
samples following the instructions of the candidate method. the SSTM and SITM.
4.2.2.3 Data Analysis and Reporting 4.2.4.2 Study Design
The data will be analyzed for positive and negative responses. Prepare the test samples in a form appropriate for the candidate
Unexpected results will be investigated, evaluated, and resolved method. All test samples will be blinded and randomized so that the
prior to continuing the validation. The data is reported for individual analyst(s) cannot know the identity of the samples. Analyze the test
inclusivity/exclusivity material as the number correctly identified. samples following the instructions of the candidate method.
For example, “Of the 30 specific botanical materials of the inclusivity
4.2.4.3 Data Analysis and Reporting
panel that were tested, 28 were identified correctly (gave a positive
result) and two were not identified correctly (gave a negative result). The data will be analyzed for positive and negative responses. For
Those materials not identified correctly were the following: …” or each mixture, report the POI results with 95% confidence intervals,
“Of the 30 specific botanical materials of the exclusivity panel that the total number of samples tested, and the total number of positive
were tested, 27 were identified correctly (gave a negative result) and responses. Plot the POI curve and confidence intervals.
three were not identified correctly (gave a positive result). Those 4.3 Independent Validation Study
not identified correctly were the following: …” The study report
should include a table titled “Inclusivity/Exclusivity Panel Results,” This study is identical to the SLV Study in Section 4.2.
which lists all materials tested, their source, origin, and essential 4.4 Collaborative Study
characteristics and testing outcome. The implications of each The collaborative study is a route to an Official MethodSM. The
unexpected result should be discussed and evaluated. purpose of the collaborative study is to estimate the reproducibility
4.2.3 SSTM/SITM Study and determine the performance of the candidate method among
collaborators.
The purpose of this study is to demonstrate method performance
at two concentrations, the SSTM and the SITM. 4.4.1 Number of Collaborators
and SITM as specified by the SMPRs. The test materials may be Each collaborator receives 12 replicates of each material to be
prepared using individual botanical materials from the inclusivity/ studied. At a minimum these materials will include the SSTM and
exclusivity panels or composites of materials from the two panels as SITM. Prepare the test samples in a form appropriate for the candidate
specified by the SMPRs. method. All test samples will be blinded and randomized so that the
All test portions should be as uniform and homogeneous as analyst(s) cannot know the identity of the samples. Analyze the test
possible. The level of replication of the SSTM and SITM will be samples following the instructions of the candidate method.
specified in the SMPR.
4.4.3 Data Analysis and Reporting
4.2.3.2 Study Design
The data will be analyzed by the laboratory for positive and
Prepare the test samples in a form appropriate for the candidate negative responses. For the SSTM and the SITM, report the POI
method. All test samples will be blinded and randomized so that the results with confidence intervals for each laboratory, and for the
combined results. Estimate reproducibility as in Annex C and 5 Specific Superior/Inferior Test Materials
evaluate compared to the SMPRs.
Based on the analytical parameters measured for the diluted
ANNEX A target materials, a threshold value will be established that will permit
Candidate Method (or Prevalidation Study) positive identification of the minimum acceptable concentration
of the target material with the specified confidence (e.g. 95%).
1 Scope The developer will use the threshold to determine a POI for each
concentration (Annex B). The POIs measured for each concentration
The candidate method must measure appropriate characteristics will be used to construct the POI curve.
that are suitable to the question being asked and that will meet
predetermined SMPRs. The method may be based on new principles 6 Data Analysis and Reporting
or modifications of an existing method. The identity specifications The method developer will document the candidate method and
will be based on morphological, genetic, and/or chemical the POI results.
characteristics, or any other defining feature of the botanical material.
The candidate method may use visual inspection, DNA sequencing, ANNEX B
instrumental analysis, or any other appropriate measurement. The Understanding the POI Model
measured characteristics will collectively provide a single analytical
parameter that will be used to determine the final YES or NO result. [See Official Methods of Analysis (2012) Appendix K, Part III,
The analytical parameter may be based on the degree of similarity or “Probability of Identification: A Statistical Model for the Validation
the degree of difference of the test sample and the reference material. of Qualitative Botanical Identification Methods,” by Robert
2 Inclusivity/Exclusivity Panel Selection LaBudde and James M. Harnly, J. AOAC Int. 95, 273–285 (2012).
The method developer will select representative botanical http://dx.doi.org/10.5740/jaoacint.11-266]
materials from the ISF and ESF for use as target and nontarget
botanical materials, respectively, in development of the method. ANNEX C
These materials must be authenticated by an appropriate method. Number of Test Portions
3 Analytical Parameter
See Table C1.
The method developer will prepare all the botanical samples Notes: (1) Enter the first column with the maximum error
in a form appropriate for the candidate method. The developer fraction tolerated by the SMPR, e.g., 10%.
will analyze the target and nontarget botanical materials using the (2) Select the sample size required by the number of
candidate method and develop an analytical parameter that is suitable misclassifications to be allowed, e.g., one erroneous result gives
for distinguishing between the two sets of materials. a sample size of n = 48 for a maximum error probability of 10%.
4 Probability of Identification (POI) (3) Allowing more erroneous results increases the sample size
Target materials will be mixed with systematically increasing required.
amounts of nontarget materials to produce a series of target materials (4) The last (AOQL) column indicates the maximum error
whose concentrations range from 100% to a concentration below the probability of a method which passes the SMPR for the test. For the
minimum acceptable concentration specified by the SMPRs. The example sampling plan indicated, this is 5.4%, approximately ½ of
developer will analyze the target and diluted target materials using the maximum error probability in the SMPR. Typically the AOQL
the candidate method and determine the analytical parameter for must be only 50–60% of the SMPR value to reliably pass the
each concentration. validation test. Method developers should take this into account.
Table C1
PART III
Probability of Identification:
A Statistical Model for the Validation of Qualitative
Botanical Identification Methods
features, genetic sequences, chromatographic patterns, spectral POI.—The expected or the observed fraction of test portions
patterns, or any other metric appropriate for the target material. that provide a positive result at a given concentration when tested
Botanical.—Of or relating to plants or botany. May also include by the BIM.
algae and fungi. May refer to the whole plant, a part of the plant Sample.—A small quantity, taken from a population or lot that is
(e.g., bark, woods, leaves, stems, roots, rhizomes, flowers, fruits, a representative selection of the whole.
seeds, extracts, etc.), or an extract of the plant. SITM.—A mixture of botanical materials that contains the
BIM.—A method that establishes identity specifications for a maximum concentration of target material that is considered
botanical material and determines, within a specified statistical unacceptable, as specified by the MPRs. The BIM must reject
limit, a binary result: yes, the test material is a true example of the this material with a specified minimum level of (1–POI) with
target botanical material and meets the identity specifications; or 95% confidence. The ideal BIM would reject the SITM 100%
no, it is not the target botanical. Thus, a BIM answers the question, of the time (i.e., identify 0% of the time). The SITM will
“Is the test material the same as the target material?” not “What is typically be high-quality target material mixed with worst-case (for
this material?” In most cases, the method will achieve this goal by identification) nontarget material.
comparison of the test material with materials from the inclusivity SSTM.—A mixture of botanical material that contains the
panel and will return a yes/no (or, in some cases, a consistent/ minimum acceptable concentration of the target material, as
nonconsistent) answer. specified by the MPR. The BIM must identify this material with
Candidate method.—The method to be validated. a specified minimum level of POI with 95% confidence. The ideal
Exclusivity.—Ability of a BIM to correctly reject nontarget BIM would identify the SSTM 100% of the time. The SSTM will
botanical materials. typically be high-quality target material mixed with a small amount
ESF.—A list of practically obtainable nontarget botanical of worst-case (for identification) nontarget material.
materials that have similar taxonomic, physical, or chemical Target botanical material.—The botanical material of interest as
composition characteristics that are expected to give a negative described in the identity specification.
result when tested by the BIM. Target material concentration.—The percentage, by weight, of
Exclusivity panel.—A subset of the ESF that is selected for the the target botanical material in the sample.
validation study. These materials should be authenticated by an Test portion.—The portion of the laboratory sample that is
appropriate method. subjected to analysis by the method.
False-negative fraction (FNF).—1–POI for 100% SSTM. Not Inclusivity Panel
defined for other concentrations.
When a botanical material is identified for development of a
False-positive fraction (FPF).—POI for 100% SITM. Not BIM, a target material is usually specified. Biological materials,
defined for other concentrations. however, are complex. While the genotype of a species or
Identity specification.—The morphological, genetic, chemical, subspecies may be relatively stable, the phenotype (metabolite
or other characteristics that define a target botanical material. composition) will vary with location, season, weather, and many
Specifications may include, but are not limited to, data from other variables. Thus, “target material” becomes “target materials.”
macroscopic, microscopic, genetic (e.g., DNA sequencing, Ideally, the target materials will encompass the expected botanical
barcoding), chromatographic fingerprinting (e.g., CE, GC, LC, variation.
TLC), and spectral fingerprinting (e.g., IR, NIR, NMR, MS, UV- An inclusive list of all the variations for a target material can be
Vis) methods. quite extensive and impractical. For example, the list for a specific
Inclusivity.—Ability of a BIM to correctly identify variants of botanical might ideally include samples from the last 10 years from
the target material that meet the identity specification. eight international locations (80 samples). In reality, only 25 of the
ISF.—A list of practically obtainable botanical materials that are desired samples may be practically obtainable. These 25 obtainable
expected to give a positive result when tested by the BIM. The samples comprise the ISF. Of these 25 samples, only 10 may be
inclusivity sampling frame should be sufficiently large that the selected for method development/validation. These 10 samples
botanical variation is adequately represented. Sources of variation comprise the inclusivity panel.
may include, but are not limited to, species, subspecies, cultivar, For each candidate BIM, the MPRs must provide a list of
growing location, growing conditions, growing season, and post- all necessary botanical variants that should provide a positive
harvest processing. identification. This should include species, varieties, geographic or
Inclusivity panel.—A subset of the ISF that is selected for the seasonal variants, and other variants that are believed to possibly
validation study. These materials should be authenticated by an associate with BIM identification performance. The information
appropriate method. tabulated should include variety, season, locality, source from
Laboratory sample.—Sample as prepared for sending to the which the variant is obtainable, species, variety or subclass, and
laboratory intended for inspection or testing. whether or not it is essential that the variant be tested. The age of
MPRs.—Performance requirements based on the fitness-for- the plant may also be a factor of importance. The subset of this list,
purpose statement for each method. For BIMs, the MPRs should which is practically obtainable for a validation study, is the ISF.
minimally include the physical form of the sample, the ISF, the The MPRs should identify the minimum number of materials
ESF, the SSTM, and the SITM. in the ISF that must be tested to verify identifiability (inclusivity
Nontarget botanical material.—Any botanical material that panel), as well as the number of replicates needed. If at all possible,
does not meet the identity specification. any exchangeability (choice among variants which MPRs do not
Physical form.—Botanical materials exist in a number of discriminate) should result in random selection from the ISF.
physical forms. The form(s) to be analyzed by the method will be Generally, the inclusivity panel of target variants should include
specified by the MPRs. all of the ISF if the number of variants is small. Otherwise, all
necessary variants plus additional ones randomly selected should no randomization is used, all that can be reported are the actual
comprise the inclusivity panel. More randomized replicate results obtained, but without suggestive quantitative statistics. For
variants may allow a quantitative statistical inference to be made example, without randomization, the use of percentages or other
concerning inclusivity. An inclusivity panel with no randomization, quantitative measures is inappropriate.
only subjective selection, does not permit statistical statements of Performance Requirements and the Specification and
inference with respect to inclusivity. Preparation of the SITM and SSTM
Exclusivity Panel After inclusivity and exclusivity studies have been completed,
The list of nontarget materials can be quite extensive, theoretically target and nontarget material(s) are chosen to verify that the
including all the botanicals not on the inclusivity list. However, method can discriminate between the SSTM and the SITM. Either
of prime interest are those materials that might accidentally or the worst-case nontarget materials, or perhaps the most common
intentionally be used to replace or augment the target materials. nontarget materials, would typically be chosen. In addition, a
The exclusivity list should include botanical materials that are combination of target and nontarget materials should be selected
closely related taxonomically, morphologically, or phenotypically. to challenge method performance (worst-case, most common,
Again, this list may be extensive and impractical. The ESF will etc.). The number of samples tested and the number of replicates is
comprise those botanical materials that are practically obtainable. specified by the MPRs.
The exclusivity panel will comprise those samples used for method The MPRs should identify the composition and the minimum
development and validation. POI acceptable (with 95% confidence) for the SSTM and SITM.
The MPRs must provide a list of all necessary or commonly The SSTM and SITM would be made of the target material(s)
encountered nontarget botanical materials and variants. This list mixed with the combination of nontarget material(s).
should include botanical materials that are believed to accidentally Application of the POI to an Analytical Method
or intentionally alter the composition of the target material. The Analytically, a BIM will be based on a series of measured values.
information tabulated should include variety, season, locality, These values may be derived from morphological features, genetic
source from which the variant is obtainable, species, variety or sequences, chromatographic patterns, spectral patterns, or any
subclass, and whether or not it is essential that the nontarget material other metric appropriate for the target material. These values will
be tested. The subset of this list, which is practically obtainable for be combined to provide a single AP that will be used to determine
a validation study, should then be identified as the ESF. whether the test sample does or does not match the materials from
The MPRs should identify the minimum number of nontarget the inclusivity panel. This decision is made by comparing the AP
materials of the ESF that should be included on the exclusivity of the test material to a threshold value that provides the level of
panel and be tested to verify non-identifiability, as well as the identification specified by the MPRs.
number of replicates needed. If at all possible, any exchangeability The first step in the development of the method is the selection
(choice among variants which expertise does not discriminate) of the analytical approach and the analysis of samples from the ISF
should result in random selection from the ESF. and ESF. Multiple replicates of multiple samples should, ideally,
Generally, the exclusivity panel of authentic variants should give results similar to those in Figure 2. Here, the AP, not the
include all of the ESF if the number of variants is small. Otherwise, POI, is plotted on the vertical axis. The standard deviations (SDs)
all necessary variants, plus optional ones randomly selected, are shown as sample distribution functions, rather than as error
should comprise a set as specified by the ERP. More replicates and bars. Ideally, the separation of the ISF and ESF samples should
randomization may allow a quantitative statistical inference to be be as large as possible. For the data in Figure 2, the threshold to
made concerning exclusivity. distinguish between the ISF and ESF can be placed at almost any
Inclusivity and Exclusivity Testing value of the AP.
The width of the sample distribution function will depend on the
The purpose of inclusivity/exclusivity testing is to verify that the
number of samples analyzed from the ISF and ESF. If replicates
BIM correctly identifies all of the botanical materials listed in the
ISF and correctly rejects all nontarget materials listed in the ESF.
The BIM should clearly and unequivocally discriminate between
the target and nontarget materials. Testing materials from the
inclusivity/exclusivity panels should provide sufficient confidence
that this is the case. The number of samples tested and the number
of replicates is specified by the MPRs.
Typically, inclusivity/exclusivity panel results are verified during
method development. Any unexpected results should be followed
up with a minimum number of additional replications (determined
by the MPRs) to characterize the POI on the variant quantitatively.
If the variant fails to meet minimum acceptable performance
requirements as set by the MPRs, the exception should be noted
in the study report and reviewed for acceptability by the relevant
method reviewers.
If the method development results are acceptable, inclusivity and
exclusivity should be verified in an independent laboratory, although
possibly on a less-intensive (fewer replicates or randomly selected Figure 2. Inclusivity/exclusivity and SSTM/SITM
variants) basis, as the objective is verification, not validation. If characterization.
the SSTM and SITM must be prepared. In each case, the threshold
will intersect each peak and determine the POI. As the SSTM:SITM
values change from 1:0 to 3:1 to 1:1 to 1:3 to 0:1, the POI decreases
from 1.0 to 0.9 to 0.5 to 0.1 to 0.0.
The models in Figures 2 and 3 assume that the SITM and SSTM
have the same, symmetrical distribution function and width. This is
not a reasonable assumption for real samples. However, the POI model
is valid regardless of the shape of the distribution functions involved.
A Specific Example: American Ginseng Mixed with Asian
Ginseng
with 95% confidence. Table 2 shows that, for these performance Single-Laboratory Validation
requirements, 60 replicates must be tested at each level with no Consider an example of a BIM being evaluated with respect to
more than two failures. More stringent requirements (i.e., 0.95 the performance requirements of Table 2. The internal operating
methodology of the BIM is possibly a trade-secret of the method
and 0.05, with 95% confidence) would require more replicates
developer, and may not be known at the time of validation. All that
and/or fewer failures. Conversely, less-stringent requirements is known for sure is that a test portion is utilized by the method, and
would require fewer replicates. Depending upon the desired binary result of yes = Identified or no = Not Identified is returned.
Consider testing in a single independent laboratory, or an SLV.
performance requirement for SSTM or SITM, alternative test
With respect to the performance requirements of Table 2, the SITM
plans (confidence levels) may be selected from Table 3. For and SSTM are used to prepare mixtures in the proportions 0:100%,
more plans, see LaBudde (5). 33:67%, 67:33%, and 100:0%. From each of these mixtures, 60
Table 3. Alternative test plans to obtain 1-sided upper 95% modified Wilson confidence limit at or below specified maximum value
for FNF or FPFa
Specified No. of replicates No. of failures
maximumb to be tested allowedc 1-sided 95% UCLd 2-sided 95% LCLe 2-sided 95% UCLe AOQLf
0.20 11 0 0.197 0.000 0.259 0.129
0.20 20 1 0.196 0.000 0.236 0.118
0.20 24 1 0.167 0.000 0.202 0.101
0.20 36 3 0.191 0.029 0.218 0.124
0.20 48 5 0.199 0.045 0.222 0.133
0.20 72 8 0.187 0.057 0.204 0.131
0.15 20 0 0.119 0.000 0.161 0.081
0.15 24 0 0.101 0.000 0.138 0.069
0.15 36 1 0.115 0.000 0.142 0.071
0.15 48 3 0.146 0.021 0.168 0.095
0.15 72 5 0.136 0.030 0.152 0.091
0.10 40 0 0.063 0.000 0.088 0.044
0.10 48 1 0.088 0.000 0.109 0.054
0.10 60 2 0.096 0.009 0.114 0.061
0.10 72 3 0.100 0.014 0.115 0.065
0.05 60 0 0.043 0.000 0.060 0.030
0.05 72 0 0.036 0.000 0.051 0.025
0.05 96 1 0.045 0.000 0.057 0.028
0.02 130 0 0.020 0.000 0.029 0.014
0.02 240 1 0.018 0.000 0.023 0.012
0.01 280 0 0.010 0.000 0.014 0.007
a
Excerpted from LaBudde (5).
b
Desired maximum level of FNF or FPF to attain with 95% confidence.
c
Maximum number of failures that can occur in the replicates tested and still meet specification.
d
Worst-case 1-sided 95% modified Wilson upper confidence limit on FNF or FPF if maximum failures are observed.
e
95% modified Wilson 2-sided confidence interval on FNF or FPF if maximum failures are observed.
Observed FNF or FPF corresponding to maximum failures allowed.
f
of replicates (see section on SLV). All test portions for each Table 6. Collaborative study results
collaborator would be randomly assigned IDs before distribution.
The study is masked so that collaborators cannot visually identify SSTM, % Collaborator Replicates No. identified
the composition of the test portions. Additional unmasked test 0 1 12 1
portions may be provided for proficiency training purposes. Each 0 2 12 0
collaborator would use the BIM according to instructions to analyze
0 3 12 0
each test portion provided, and report results by test portion number
and 1 = Identified or 0 = Not Identified. 0 4 12 0
Suppose a collaborative study is to be evaluated with respect 0 5 12 0
to the performance requirements of Table 2. The primary goal is 0 6 12 0
to validate that performance is sufficiently homogeneous across
0 7 12 0
collaborators and that the performance requirements are met. As
mentioned before, the number of replicate test portions for each 0 8 12 0
collaborator should be 12 or more to control the quantal repeatability 0 9 12 0
error sufficiently to allow detection of an intercollaborator effect. 0 10 12 0
Suppose the plan was to enroll 12 collaborators, with the expectation
33.33 1 12 2
that on or two might have to be removed for cause (spoilage of test
portions, failing to follow instructions, cross-contamination, etc.) 33.33 2 12 2
Consequently 144 test portions are prepared for each of the four % 33.33 3 12 2
SSTM values (0, 33.3, 66.7, and 100%). 33.33 4 12 2
After completion of the study, two collaborators are removed
33.33 5 12 0
for cause, and the results shown in Table 6 are obtained. For the
0% SSTM concentration, the statistical analysis of the data gives 33.33 6 12 1
the results in Table 7. There is no detected intercollaborator effect 33.33 7 12 1
(P-value = 0.43, point estimate = 0.00, confidence interval includes 33.33 8 12 4
0.000 and has an upper limit of 0.040), and the upper 2-sided
33.33 9 12 2
confidence limit for combined POI is 0.0457, well below the
performance requirement of 0.10. There is little evidence that the 33.33 10 12 3
method is irreproducible, and the method meets the POI (or FPF) 66.67 1 12 4
performance requirement. 66.67 2 12 9
For the 33% SSTM concentration, the statistical analysis of
66.67 3 12 5
the data gives the results in Table 8. Again, there is no detected
intercollaborator effect (P-value = 0.66), so there is little evidence 66.67 4 12 8
that the method is irreproducible. 66.67 5 12 7
For the 67% SSTM concentration, the statistical analysis of the 66.67 6 12 4
data gives the results in Table 9. Once again, there is no detected
66.67 7 12 7
intercollaborator effect (P-value = 0.18), so there is little evidence
that the method is irreproducible. 66.67 8 12 3
Finally, for the 100% SSTM concentration, the statistical 66.67 9 12 8
analysis of the data gives the results in Table 10. There is no 66.67 10 12 5
detected intercollaborator effect (P-value = 0.25, point estimate =
100 1 12 12
0.027, confidence interval includes 0.000 and has an upper limit
of 0.093), and the lower 2-sided confidence limit for combined 100 2 12 10
POI is 0.917, well above the performance requirement of 0.90. 100 3 12 11
There is little evidence that the method is irreproducible, and the 100 4 12 12
method meets the POI (or FNF) performance requirement.
100 5 12 12
Lot-Lot Variability, Time Stability, and Robustness Studies
100 6 12 11
The SLV and collaborative studies discussed above do not 100 7 12 12
represent worst-case, end-of-life conditions with respect to
100 8 12 12
method materials and parameters. For this reason, it is customary
to augment these studies with additional studies to verify proper 100 9 12 12
results despite reasonable variations among method materials, 100 10 12 12
equipment, and parameters.
A lot-lot variability study is meant to verify results across
different lots of method materials (supplies used) and sets of
equipment. Each lot would consist of a different manufactured or
prepared batch of materials (reagents, supplies, etc.) and possibly
a different set of measurement equipment. Date of manufacture is
not an issue in this study, only variation among lots, so ideally,
the lots tested should have been produced at near the same times.
Just as with collaborators in a collaborative study, estimation of Disturbances of method parameters should reflect maximum
the lot random effect requires that at least six different lots be excursions to be expected in practical use. Performance requirements
involved in the study. Each lot should result in attainment of any should be met at each of these excursions. The statistical design
BIM performance requirements, and the variation in performance should be capable of measuring at least main effects.
among lots should be immaterial in size.
A time stability study is meant to verify that there is no material Conclusions
degradation in performance over the life of lots of materials and The purpose of a qualitative BIM is to discriminate between
equipment. This may be accomplished by determination of the
acceptable target material and target material with an unacceptable
parametric aging effect by use of time-staggered lots, or simply
verifying performance on end-of-life lots. concentration of nontarget material. This concept was particularized
Note that the lot-lot variability and time-stability studies cannot to discrimination between the SSTM and SITM for the purpose
be merged into a single study unless there are sufficient replicate of method validation. A general overview of the application of
lots at or near the same time point(s) to allow separation of the the POI model and analysis was given, which allows validation
lot-lot and time effects. If lot-lot and time effects are negatively and/or characterization of qualitative BIMs. Examples are given
correlated, one factor may mask the effect of the other in an for both SLV and collaborative studies with MPRs. The use of
inadequate combined study (e.g., a different single lot at each POI statistics harmonizes statistical concepts among botanical,
different time point). Testing only end-of-life lots would be a
microbiological, toxin, and other analyte identification or detection
satisfactory combined study, even though time and lot effects could
not be resolved. methods for which binary results are obtained. The POI statistical
A robustness study (also denoted a sensitivity study) is meant to model provides a tool for graphical representation of response
verify performance under worst-case conditions of method critical curves for qualitative methods, reporting of descriptive statistics,
parameter (e.g., times, temperatures, concentrations) variation. and application of performance requirements.
ANNEX A
SIMCA