Review of The Most Common Pre-Processing Techniques For Near-Infrared Spectra
Review of The Most Common Pre-Processing Techniques For Near-Infrared Spectra
Review of The Most Common Pre-Processing Techniques For Near-Infrared Spectra
Pre-processing of near-infrared (NIR) spectral data has become an integral sample spectra. However, by applying
part of chemometrics modeling. The objective of the pre-processing is to suitable pre-processing, these effects can
remove physical phenomena in the spectra in order to improve the subseq- largely be eliminated.
uent multivariate regression, classification model or exploratory analysis. In application studies, comparisons are
The most widely used pre-processing techniques can be divided into two almost exclusively of the relative perfor-
categories: scatter-correction methods and spectral derivatives. This review mances in the calibration models
describes and compares the theoretical and algorithmic foundations of cur- developed (quantitative descriptor-re-
rent pre-processing methods plus the qualitative and quantitative conse- sponse relations). Almost no evaluation of
quences of their application. The aim is to provide NIR users with better end- the differences and the similarities be-
models through fundamental knowledge on spectral pre-processing. tween the alternative techniques has been
ª 2009 Elsevier Ltd. All rights reserved. reported, and the implications of correc-
tions (e.g., spectral descriptor data) are
Keywords: Multiplicative Scatter Correction; Near-infrared spectroscopy; Normaliza-
seldom discussed. This article aims to dis-
tion; Norris-Williams derivation; Pre-processing; Savitzky-Golay derivation; Scatter
cuss the relations between the established
correction; Spectral derivative; Standard Normal Variate; Review
pre-processing methods for NIR/NIT, more
1. Introduction specifically those techniques that are
Åsmund Rinnan*,
independent of response variables, so we
Frans van den Berg,
Søren Balling Engelsen There is no substitute for optimal data discuss only methods that do not require a
Quality and Technology, collection, but, after proper data collection, response value. We focus on both the
Department of Food Science, pre-processing of spectral data is the most theoretical aspects of the pre-processing
Faculty of Life Sciences, technique and the practical effect that the
important step before chemometric bi-lin-
University of Copenhagen,
ear modeling [e.g., Principal Component operation has on the NIR/NIT spectra.
Rolighedsvej 30, DK-1958
Frederiksberg C, Denmark Analysis (PCA) and Partial Least Squares For solid samples, undesired systematic
(PLS)]. There is substantial literature on variations are primarily caused by light
multivariate spectroscopic applications of scattering and differences in the effective
food, feed and pharmaceutical analysis, in path length. These undesired variations
which comparative pre-processing studies often constitute the major part of the total
are an integral part. Near-infrared reflec- variation in the sample set, and can be
tance/transmittance (NIR/NIT) spectros- observed as shifts in baseline (multiplica-
copy is the spectroscopic technique that tive effects) and other phenomena called
has led to by far the largest amount of and non-linearities. In general, NIR-reflectance
greatest diversity in pre-processing tech- measurement of a sample will measure
niques, primarily because the spectra can diffusively reflected and specular reflected
be significantly influenced by non-lineari- radiation (mirror-like reflections). Specu-
ties introduced by light scatter. Due to the lar reflections are normally minimized by
comparable size of the wavelengths in NIR instrument design and sampling geome-
electromagnetic radiation and particle try, as they do not contain any chemical
sizes in biological samples, NIR spectros- information. The diffusively reflected light,
copy is a battleground for undesired scatter which is reflected in a broad range of
* effects (both baseline shift and non-linear- directions, is the primary source of infor-
Corresponding author.
E-mail: [email protected] ities) that will influence the recorded mation in the NIR spectra. However, the
0165-9936/$ - see front matter ª 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.trac.2009.07.007 1201
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
diffusively reflected light will contain information on not Obviously, pre-processing cannot correct for specular
only the chemical composition of the sample (absorp- reflectance (direct scattering), since the spectra do not
tion) but also the micro-structure (scattering). The pri- contain any fine structure. Spectra dominated by specular
mary forms of light scattering (that do not include reflectance should always be removed as outliers prior to
energy transfer with the sample) are Rayleigh and Lor- multivariate data analysis, since they will remain outli-
entz-Mie. Both are processes in which the electromag- ers, even after pre-processing. Fig. 1 shows a set of 13
netic radiation is scattered (e.g., by small particles, good sucrose samples with different particle sizes plus one
bubbles, surface roughness, droplets, crystalline defects, bad sucrose example showing how (extreme) specular
microorganelles, cells, fibers, and density fluctuations). reflectance manifests itself compared to normal spectra.
Rayleigh scattering, which is strongly wavelength Fig. 1 also illustrates the general layout of most figures
dependent (1/k4), occurs when the particles are much in this article. In the upper part of the figure, a bar graph
smaller in diameter than the wavelength of the electro- shows PCA-score values on the first principal component
magnetic radiation (<k/10). (PC) for the sample set after data mean centering [1].
When the particle sizes are larger than the wave- The lower part shows the effect the pre-processing has
length, as is generally the case for NIR spectroscopy, on the dataset (or, in this case, no pre-processing). The
Lorentz-Mie scattering is predominant. In contrast to squared correlation coefficient r between the bar values
Rayleigh scattering, Lorentz-Mie scattering is aniso- and a selected reference variable is included (in this case,
tropic, dependent on the shapes of the scattering known average particle sizes of 13 sucrose samples). For
particles and not strongly wavelength dependent. the sucrose dataset, this correlation should, e.g., be low
For biological samples, the scattering properties are when assuming particle-originating scatter is a hin-
excessively complex, so soft, or model-free, spectral pre- drance; as little information as possible on the particle
processing techniques of NIR spectra, as we discuss in size should remain after the right pre-processing.
this article, are demanded to remove the scatter from the An example of the pre-processed sucrose data can be
pure, desirable absorbance spectra. seen in Fig. 2, which also contains a standard-deviation
Figure 1. Near-infrared spectra of 13 sucrose samples with different particle sizes (smallest particles at the bottom, largest at the top; particle
sizes are in the range 20–540 lm. The black spectrum shows a specular reflectance sucrose sample. Bars are the score values on the first Principal
Component of the 13 sucrose samples in a Principal Component Analysis model on the full spectra.
1202 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
Figure 2. Top: Sucrose data treated by a second-order Multiplicative Scatter Correction; Bottom: the corresponding standard deviations per
wavelength, dotted line is the raw/unprocessed data (see Fig. 1), solid is the pre-processed data.
http://www.elsevier.com/locate/trac 1203
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
plot, showing the effect that pre-processing has on the ferent recipes, with data available on the Internet (www.
variation between samples for different wavelength re- models.life.ku.dk). All marzipan samples were measured
gions. The selected pre-processing (detailed later on) with six different NIR instruments and chemical-reference
removes some, but not all, of the undesired scatter or analyses for moisture and sugar content were made. Be-
particle-size information in the spectra, as can be ob- fore building a quantitative regression model, it is impor-
served from, e.g., the first PC bars. tant to clean the predictor data from unsystematic scatter
From now on in this article, we demonstrate the effect variations, since they can have a significant impact on the
of different pre-processing techniques on a small pectin predictive model performance and the model complexity
dataset containing only seven samples with very differ- or parsimony. In this article, we use PLS to predict this
ent degrees of esterification (%DE; in the range 0–93%) quantitative response information [4].
[2]. These samples were measured in NIR-reflectance
mode in the spectral range 1100–2500 nm (collecting
every 2-nm interval; Fig. 3). We present the corre- 2. Pre-processing techniques
sponding first-factor PCA sample score after mean-cen-
tering as a bar graph, together with the centered The most widely used pre-processing techniques in NIR
absorbance value at wavelength 2244 nm. We selected spectroscopy (in both reflectance and transmittance
this peak as it should, in theory, describe the %DE mode) can be divided into two categories: scatter-
perfectly. For this article, we assume that the informa- correction methods and spectral derivatives.
tion in the spectra that is related to the pectin particle The first group of scatter-corrective pre-processing
size and shape should be removed by the pre-processing methods includes Multiplicative Scatter Correction
technique, and that the bar graph should show a linear (MSC), Inverse MSC, Extended MSC (EMSC), Extended
behavior correlated to %DE. Inverse MSC, de-trending, Standard Normal Variate
To illustrate the impact of pre-processing on quantifi- (SNV) and normalization.
cation, we use data taken from Christensen et al. [3]. They The spectral derivation group is represented by two
studied a set of 32 marzipan mixtures, based on nine dif- techniques in this article: Norris-Williams (NW)
Figure 3. Raw/unprocessed spectra of 7 pectin samples. Blue line is a pectin sample with 0% degree of esterification (DE), red line is a sample
with 93%DE. Open bars indicate the Principal Component Analysis (PCA) score values on the first PC for the full spectrum, after mean-centering,
closed bars the spectral value at wavelength 2244 nm.
1204 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
Figure 4. The sample spectrum (blue dots) plotted against a selected reference spectrum. The scalar correction terms are found as the intercept
and the slope of the black line, which is found from the least-squares regression fit through all points.
http://www.elsevier.com/locate/trac 1205
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
3. Scatter corrections where xorg is one original sample spectra measured by the
NIR instrument, xref is a reference spectrum used for pre-
Under scatter-correction methods, we consider three pre- processing of the entire dataset, e is the un-modeled part
processing concepts: MSC, SNV and normalization. of xorg, xcorr is the corrected spectra, and b0 and bref,1 are
These techniques are designed to reduce the (physical) scalar parameters, which differ for each sample. Fig. 4
variability between samples due to scatter. All three also illustrates the interpretation of the scalar parameters.
adjust for baseline shifts between samples. In most applications, the average spectrum of the cal-
ibration set is used as the reference spectrum. However, a
3.1. MSC generic reference spectrum can also be applied. In the
Multiplicative Scatter (or, in general, Signal) Correction original paper by Martens et al. [6], it was suggested to
(MSC) is probably the most widely used pre-processing use only those parts of the spectral axis that do not in-
technique for NIR (closely followed by SNV and deriva- clude relevant information (baseline). While this makes
tion). MSC in its basic form was first introduced by good spectroscopic sense, it is difficult to determine such
Martens et al. in 1983 [6] and further elaborated on by regions in practice, especially in NIR measurements,
Geladi et al. in 1985 [7]. The concept behind MSC is that where the signals from different chemical components
artifacts or imperfections (e.g., undesirable scatter effect) are strongly overlapping and correlated, and little or no
will be removed from the data matrix prior to data true baseline is found. This is the reason why, in most
modeling. MSC comprises two steps: cases, the entire spectrum is used to find the scalar cor-
1. Estimation of the correction coefficients (additive rection parameters in MSC. Fig. 5 demonstrates the
and multiplicative contributions). application of standard MSC to the pectin data. The
spectral features of the pectin powder are conserved,
xorg ¼ b0 þ bref ;1 xref þ e ð2Þ while background offsets and slopes are largely removed
2. Correcting the recorded spectrum. (compare with Fig. 3). The linear relationship between
the spectra and %DE is good, but not perfect.
The basic form of MSC has been expanded into more
xorg b0 e elaborate augmentations [8–12] commonly known as
xcorr ¼ ¼ xref þ ð3Þ
bref ;1 bref ;1 EMSC. This expansion includes both second-order
Figure 5. Data pre-processing by Multiplicative Scatter Correction using a first-order correction towards the average spectrum.
1206 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
polynomial fitting to the reference spectrum, fitting of a original idea. In this article, the xknown will not be
baseline on the wavelength axis, and uses of a priori discussed further, since, in many practical cases, refer-
knowledge from the spectra of interest or spectral ence spectra for wanted and unwanted constituents are
interferents. In this article, all these alternatives are not readily available.
called MSC for simplicity, as they can be summed up in The reference correction is most commonly performed
one single equation: with only a first-order polynomial. Even though there are
no mathematical limitations to expand to higher order
xorg ¼ ½1 xref x2ref k k2 xknown;1 xknown;2 . . .
additions, there are normally no spectroscopic arguments
bþe for doing so (except perhaps if significant Rayleigh scat-
ð4Þ tering is present in the short wavelength region).
Fig. 6 shows the result of a second order polynomial
where k is the correction vector for the wavelength-axis correction to the pectin data. The correction terms used
dependency, and xknown,i is the inclusion of a priori for the second-order polynomial reference correction are
knowledge for wanted/unwanted spectral information simply found by fitting a second-order (quadratic) poly-
(e.g., the spectrum of a known interfering species). nomial to the points in Fig. 4. Only marginal improve-
Equation (4) can readily be expanded to include any ments are achieved compared to the first-order correction
other appropriate corrections. b is a set of scalars (cor- in Fig. 5.
rection coefficients) given by Equation (5). Wavelength-axis dependency is most often included as
b ¼ ½b0 bref ;1 bref ;2 bk;1 bk;2 bknown;1 bknown;2 . . . a second-order polynomial fitting on the wavelength axis
ð5Þ to the spectra. When no reference correction is included,
this simple wavelength fitting also goes under the name
where b0 is the offset correction, bref,i is the correction of spectral de-trending [13]; it can be viewed as a base-
according the ith order of the reference, bk,i is the cor- line correction. It is important to note that including the
rection of the ith order wavelength-axis dependency, and wavelength dependency in the full correction Equation
bknown,i is the correction of the ith known information. By (4) rather than having it as a separate step leads to a
comparison with Equation (2), it can be observed that smaller corrective effect. This is due to a matrix-inver-
Equation (4) is just a higher order expansion of the sion operation performed simultaneously for all the
Figure 6. Multiplicative Scatter Correction of the spectra using a second-order reference correction towards the average spectrum.
http://www.elsevier.com/locate/trac 1207
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
correction parameters in MSC, where the different cor- Note that xorg and xref have swapped places compared to
rections will influence each other in the least squares Equation (4). An advantage of (Extended) ISC (EISC) is
fitting criterion. When a wavelength dependency is the simplicity of the correction equation:
determined independently only the wavelength axis (and
xcorr ¼ ½1 xorg x2org k k2 xknown;1 xknown;2 . . . b
not the reference spectrum) influences the correction
that will lead to a flattening of the processed spectrum. ð7Þ
This effect can be seen by comparing Figs. 7 and 8. In ISC and EISC, both the estimation of the correction
As previously mentioned, more sophisticated correc- coefficients and the correction itself are performed in
tions (e.g., higher order polynomials or other transfor- what can be described as a forward manner, making it
mations of the wavelength dependency) can easily be convenient to include additional terms and/or reference
incorporated in the MSC. Thennadil and Martin [12] signals [9]. The previously mentioned matrix-inversion
suggested using the logarithmic values of the wave- operation required for parameter estimation in MSC can
lengths, as this is judged more sound spectroscopically. easily become numerically ill-conditioned if it includes
However, the difference between using a logarithmic higher order polynomial reference corrections. This is an
transformation of the wavelengths versus using a first- argument in favor of ISC. However, ISC assumes, in the
order polynomial correction is minimal, making these least squares fitting, that the error in the recorded
two approaches identical for all practical purposes. spectrum (to be corrected) is smaller than the error for
As noted by Pedersen et al. [9], it is a quite simple the reference spectrum. In most practical applications,
procedure to apply the inverse version of MSC, called the reference is the average spectrum computed from n
Inverse Signal Correction (ISC) [14]. The estimation of samples in the dataset (e.g., the calibration set). The
the correction parameters, the b coefficients, are found pffiffiffi
expected noise level for this reference is of magnitude n
in a similar fashion to regular MSC: smaller than the individual spectra (neglecting the bias
xref ¼ ½1 xorg x2org k k2 xknown;1 xknown;2 . . . b þ e due to scatter differences in the set). This is an argument
against ISC, since a small error in the spectra will
ð6Þ
Figure 7. Multiplicative Scatter Correction with first-order polynomial reference correction towards the average spectrum and second-order poly-
nomial wavelength correction.
1208 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
Figure 8. Multiplicative Scatter Correction (MSC) with a first-order polynomial reference correction towards the average spectrum, followed by a
separate MSC with a second-order polynomial wavelength correction (detrend).
influence the pre-processing to a greater degree than the Another suggestion for finding the reference correc-
original MSC. tion in MSC was suggested by Windig et al. – so-called
The main challenge in MSC is to define an appropriate Loopy MSC [16]. This method finds the average spec-
reference spectrum. As mentioned before, this is most trum from the MSC-corrected dataset. Next, MSC is re-
often set to the average of the calibration spectra. peated multiple times updating the reference spectrum as
Gallagher et al. [15] presented a natural variation to the average of the corrected dataset after each iteration
MSC by including a weighting scheme in the pre- step.
processing step. Two alternatives were presented: Fig. 10 shows the result of Loopy MSC applied to the
the use of a pre-defined weight vector for the wave- pectin dataset – in this case the performance of Loopy
length axis; and, MSC is very similar to that of the simple MSC. In Loopy
an iterative search for the optimal weighting vector. MSC, it is possible to follow the increase in model sta-
The iterative solution is found by giving lower weight tistics and then stop upon convergence (two iteration
to variables or wavelengths with high residual differ- steps are typically sufficient). Superimposed in Fig. 10 is
ences between the raw data and the corrected solution. the change of reference spectrum from the average of the
The calculation of the weights continues until the dif- raw spectra.
ference between corrected spectra for two subsequent
iterations is less than the assumed noise level in the data. 3.2. Standard Normal Variate (SNV)
Unfortunately, this fairly straightforward method does SNV pre-processing is probably the second most applied
not always work well with NIR data, since the spread in method for scatter correction of NIR/NIT data [13]. In
the higher wavelength range typically indicates more this article, normalization (also called object-wise stan-
scatter, and should be corrected for rather than given dardization) of the spectra will be examined in the same
less weight. Fig. 9 shows that the weights used in the sub-section because of the obvious similarity between
final correction give strong emphasis to the short- the two principles. The basic format for SNV and Nor-
wavelength region, while the long-wavelength region malization correction is the same as that for the tradi-
does not contribute to the correction at all. tional MSC:
http://www.elsevier.com/locate/trac 1209
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
Figure 9. Weighted Multiplicative Scatter Correction, based on iterative weight determination. Green line shows the final weight vector (arbitrary
scale, relative contributions).
1210 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
Figure 10. Loopy Multiplicative Scatter Correction with a first-order reference correction. Green line shows the difference between the final and
the starting reference spectrum (arbitrary scale, relative contributions).
http://www.elsevier.com/locate/trac 1211
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
Figure 13. The relationship between Standard Normal Variate and Multiplicative Scatter Correction. The blue and red lines are representative of
the trend line estimated in Fig. 4.
1212 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
used in this article, the correlation of the pre-processed where x0i denotes the first derivative and x00i the second
SNV data (Fig. 11) and basic MSC corrected data (Fig. 5) derivative at point (wavelength) i. This method is ex-
after mean-centering is 0.9995. In other words, MSC and tremely simple, but, unfortunately, it is not feasible for
SNV are the same for most practical applications. most real measurements due to noise inflation; it should
almost always be avoided in practice.
Figure 14. The effect of derivation on additive (green) and additive plus multiplicative (red) effects. The blue spectrum is the spectra without any
offsets, and the black dotted line is the zero line.
http://www.elsevier.com/locate/trac 1213
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
Figure 15. Estimation of the first derivative by Norris-Williams. A 7-point window is used for smoothing, and a gap size of 3 is applied in der-
ivation.
1214 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
Figure 16. Norris-Williams second derivative using a 9-point smoothing and a gap size of 3.
Figure 17. Estimation of the first derivative by Savitzky-Golay. A 7-point window and a second-order polynomial is used for smoothing.
http://www.elsevier.com/locate/trac 1215
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
Figure 18. Savitzky-Golay estimate of the second derivative using 9 points and a second-order polynomial for the smoothing.
subsequent polynomial fits will give the same estimate of more points than SG derivation. If the spectral vector is
the coefficients. For the first derivative, a first-degree long (i.e. more than 500 points), this issue is not
polynomial and second-degree polynomial will give the important, but, for shorter spectra (e.g., diode-array
same answer (as will the third and fourth degrees). For instruments), this loss of wavelengths can be important.
the second derivative, a second and third-degree poly- Proctor and Sherwood in 1980 [24] and Gorry in
nomial will give the same answer (as will the fourth and 1990 [25] suggested a solution that involves using a
fifth degrees), etc. When this method was first introduced fitted polynomial based on an asymmetric window for
by Savitzky and Golay [21], it was still computationally the end-points. In practice, this means that the m first
cumbersome to calculate the parameters in estimating points of the spectra are estimated from the 2m+1 first
the derivative. For that reason, the authors reported a set points in the spectra, and a similar estimate for the last m
of tabulated values for several different types of deriva- points. However, such a solution will evidently introduce
tives and polynomial combination. However, errors were artifacts, as the accuracy of the derivative decreases with
introduced in their first article, and Steinier et al. [22] the distance from the centre point (m+1). Furthermore,
published a corrected and expanded version of the ori- the estimation of the end-points does not possess the
ginal tables. The tables were later even further expanded inherent redundancy mentioned for SG: no two sub-
by Madden [23]. However, with modern computers, sequent polynomial order fittings will give the same
there is no longer any real need for these tables. estimates. In addition to this, the estimate of the dth
The original forms of NW and SG derivation use a derivative will be equal for all the end-points if the
symmetric window smoothing, requiring the number of spectrum is smoothed by a dth-order polynomial.
data points on each side of the center point to be the NW derivation is similar to finite differences, but
same. As a consequence, the techniques neglect a introduces smoothing and gap-size as counteractions in
number of points at each end of the spectrum during the the estimated derivate spectra to preserve the signal-to-
pre-processing. For NW derivation, the number of points noise ratio. These two steps in NW derivation are more
lost equals the number of points used for smoothing plus or less independent. However, SG derivation uses more
the size of the gap minus one. For SG derivation, the common filtering techniques to estimate the derivative
number of points lost equals the number of points used spectra, and, instead of using the finite-difference ap-
for smoothing minus one. NW derivation thus absorbs proach, fits a polynomial through a number of points to
1216 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
maintain an acceptable signal-to-noise ratio. In general, window size is crucial and it is far from trivial to do
the NW and SG derivations do not give the same this correctly. Too small a window will lead to the
estimates. The only pair of settings that gives identical introduction of large artifacts in the corrected spectra
results is three smoothing points for both, SG using a and to a reduced signal-to-noise ratio. However, the
first-order polynomial fit, and the gap size in NW equal larger the size of the window, the smaller the distinc-
to 1. However, more complex (and realistic) settings for tion between full and moving-window pre-processing
SG and/or NW automatically lead to (slightly) different (see Fig. 19). Local window pre-processing can be
derivation results. useful, especially in cases where the recorded spectra
are measured all the way from the visual range or
shortwave NIR up to the mid-IR range. In this wide
5. Interval and combined versions spectral region, several different scatter issues coexist,
and the spectra should be divided accordingly, per-
Of the pre-processing techniques mentioned thus far, forming separate scatter corrections on the different
only the estimation of the derivatives operates by a parts. However, since this is not essentially different
moving-window operation, where only a local part from dividing the spectra in regions and applying the
(window) of the spectra is used at any time to estimate pre-processing methods independently, we do not dis-
the correction. However, all the other methods can cuss it further.
equally well be performed in a window-wise fashion. The use of combinations of pre-processing methods is
Isaksson and Kowalski [26] suggested this for MSC, abundant in literature, and, in principle, any sequence of
and named it piecewise MSC (PMSC). Andersson [27] pre-processing is possible. However, the following simple
compared alternative pre-processing methods with two rules can serve as initial guidelines.
versions of PMSC: moving-window or local pre-process- Scatter correction (with the exception of normaliza-
ing (dividing the wavelength axis into a few sections and tion) should always be performed prior to differentia-
performing pre-processing on each section separately). tion. These techniques are all designed for correction
The moving-window version of the pre-processing of raw spectrum, and have never been thought of as
techniques has received little interest from the NIR corrections to a differentiated or baseline-corrected
community, probably because the right choice of spectrum.
Figure 19. Moving window Standard Normal Variate using a 129-nm moving window (65 measurement points).
http://www.elsevier.com/locate/trac 1217
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
Figure 20. 32 marzipan samples measured by Instrument 2, in the interval 1100–2500 nm.
Normalization can be used at both ends of the correc- The marzipan NIR dataset was treated with the dif-
tion, although it is easier to assess the effect of Nor- ferent pre-processing techniques described in this article.
malization if it is done prior to any other operation. In addition to the settings used in the theoretical sec-
The basic difference between SNV with subsequent de- tions, some more extreme parameter settings were
trending and MSC with reference and baseline correc- investigated to estimate piecewise MSC, to show the
tion is that, in MSC, both corrections are applied importance of using reasonable choices. No samples
simultaneously, not consecutively. Thus, MSC will were removed as outliers, as all samples behaved well in
generally give a smaller baseline correction than the initial exploratory analysis. Bootstrapping-error
SNV plus de-trending. estimates [28] were used as the validation method. A
Performing de-trending followed by SNV was not rec- total of 1000 bootstrap drawings were performed for
ommended by Barnes et al. [13], and, for the reasons each combination of dataset, reference and pre-process-
given above, it is not recommended to perform de- ing. The same set of drawings was used for all datasets,
trending first. except for Instrument 1, where only 15 of 32 samples
were measured. The 0.632-bootstrap estimate of the
6. A quantitative example prediction error was calculated as shown in Equation
(14), in accordance with Wehrens et al. [28].
We will now apply all the pre-processing methods dis-
RMSEf ¼ 0:368 RMSEf þ 0:632 RMSEPf ð14Þ
cussed to a quantitative spectroscopic task involving 32
marzipan samples measured on six very distinct spec- where RMSEf is the estimated prediction error, RMSEf
trometers as predictor variables for two different re- and RMSEP f are the average calibration (samples se-
sponse variables: moisture and sugar content. The data lected per one bootstrap draw) and prediction (samples
are taken from a study by Christensen et al. [3]. Fig. 20 not selected per one bootstrap draw) errors across all
shows one of the spectral sets. For a summary of the bootstrap drawings. The optimal number of factors, f, is
data, see Table 1. Here, we show the PLS-regression determined based on the 0.632-bootstrap estimates,
models, which were built for all the six NIR instru- selecting the first minimum or the point where the
ments, and responses separately (so-called PLS1 models RMSEf curve as a function of factors flattens out (where
[5]). the slope of the RMSEf curve is constant).
1218 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
By applying all the pre-processing techniques to the parsimonious (i.e. use fewer PLS factors) than the global
same sample set recorded with six different instruments model, independent of the spectrometer set-up and
and/or optical measurement geometries (dispersive, independent of the response variable.
interferometer, reflectance, transmission and fiber-probe) A second general observation is that filter-based
using two different responses (moisture and sugar), some Instrument 1 in reflectance mode is not competitive in
general performance differences are revealed (see Tables measuring the marzipan samples (RMSEmoisture = 0.75, 4
2 and 3). latent variables (LVs) and RMSEsugar = 2.30, 3 LVs; but,
As a very first observation, it is consoling that we need to keep in mind that only 15 of 32 samples are
nearly all pre-processed models are simpler or more measured) and pre-processing does not help to get it
Table 1. Overview of the instruments used for the measurements of 32 marzipan samples [3]
Table 2. Prediction results for the moisture content (w/w%) of the 32 marzipan samples on six different NIR instruments with different pre-pro-
cessing, given as RMSE with number of latent variables in brackets. Moisture content range Instrument 1 = 7.4–18.6 (w/w%), Instruments 2–6 =
6.8–18.6 (w/w%)
1 2 3 4 5 6
None - 0.75 (4) 0.42 (7) 0.42 (11) 0.46 (6) 0.48 (7) 0.48 (4)
MSC First-order reference correction 0.64 (3) 0.36 (6) 0.37 (10) 0.45 (4) 0.35 (8) 0.46 (4)
MSC Second-order reference correction 0.62 (4) 0.41 (5) 0.41 (7) 0.50 (3) 0.37 (5) 0.45 (4)
MSC First-order reference correction + second-order 0.61 (4) 0.39 (5) 0.48 (9) 0.43 (5) 0.40 (5) 0.50 (5)
wavelength correction
SNV+Detrend - 0.61 (4) 0.35 (5) 0.49 (9) 0.42 (4) 0.40 (5) 0.45 (3)
WMSC First-order reference correction 0.75 (3) 0.39 (5) 0.40 (10) 0.43 (5) 0.41 (5) 0.44 (4)
LMSC First-order reference correction 0.64 (3) 0.36 (5) 0.37 (10) 0.43 (5) 0.41 (5) 0.46 (4)
SNV - 0.65 (3) 0.35 (6) 0.37 (10) 0.42 (5) 0.41 (5) 0.46 (4)
Norm Euclidean 0.70 (3) 0.35 (6) 0.39 (10) 0.42 (6) 0.52 (6) 0.38 (3)
NW Seven-point smoothing, gap-size 3, first derivative 0.66 (4) 0.38 (5) 0.40 (9) 0.41 (5) 0.43 (7) 0.45 (3)
SG Seven-point smoothing, second-order polynomial, 0.58 (3) 0.38 (5) 0.41 (8) 0.40 (5) 0.47 (6) 0.46 (3)
first derivative
SG 15-point smoothing, second-order polynomial, 3.73 (4) 0.41 (7) 0.40 (10) 0.46 (7) 0.49 (7) 0.49 (3)
first derivative
Finite difference First derivative 0.71 (4) 0.34 (5) 0.39 (9) 0.58 (4) 0.73 (4) 0.46 (3)
NW Nine-point smoothing, gap size 3, second derivative 0.88 (4) 0.38 (5) 0.38 (9) 0.39 (7) 0.51 (6) 0.45 (3)
NW Three-point smoothing, gap size 3, second derivative 0.58 (4) 0.35 (6) 0.41 (9) 0.53 (4) 0.71 (6) 0.46 (3)
SG Nine-point smoothing, second-order polynomial, 0.58 (4) 0.34 (6) 0.39 (9) 0.51 (4) 0.67 (5) 0.46 (3)
second derivative
Finite difference Second derivative 0.76 (4) 0.72 (4) 0.65 (5) 0.88 (5) 0.91 (5) 0.42 (6)
PSNV Window width 129 - (-) 0.30 (4) 0.57 (6) 0.38 (4) 0.45 (5) - (-)
PMSC Window width 129 - (-) 0.40 (6) 0.62 (7) 0.37 (5) 0.42 (6) - (-)
First-order reference correction
PMSC Window width 65 - (-) 0.33 (6) 4.17 (1) 0.35 (6) 0.45 (6) 0.47 (4)
First-order reference correction
PMSC Window width 17 0.64 (3) 5.06 (1) 7.54 (1) 3.79 (1) 3.83 (1) 3.26 (11)
First-order reference correction
PMSC Window width 129 - (-) 0.38 (4) 0.71 (4) 0.40 (6) 0.52 (4) - (-)
Second-order reference correction
http://www.elsevier.com/locate/trac 1219
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
Table 3. Sugar prediction results for the sugar content of the 32 marzipan samples on six different NIR instruments with different pre-processing,
given as RMSE with number of latent variables in brackets. Sugar content range 32.2–68.7 (w/w%)
down to the level of the other instruments. Filter-based small NIR region contains all hydrogen-bond stretches
instruments are not really compatible with spectral from the sample and a normalized approach thus cor-
derivation techniques, but the other pre-processing responds to integrating all the proton signals and setting
techniques also fail to reach the desired performance. the proton-density equal between samples. Besides the
A third general comment can be made on the holo- normalization approach, it would appear that derivatives
graphic information content of NIR, in which informa- are a good pre-processing strategy for this type of data,
tion (the overtones) is repeated multiple times. The small as they can consistently simplify the models, as is par-
spectral region 850–1050 nm (covered by Instrument 6) ticularly obvious in the case of sugar prediction.
that contains the second overtone of the O-H and N-H For the remaining full NIR-region Instruments 2–5,
stretches and the third overtone of the C-H stretches is we find some interesting and strong differences depen-
fully competitive with the more sophisticated instru- dent on the response variable, presumably because the
ments covering the complete or traditional NIR region. moisture content is a low-resolution spectral task while
Moreover, it is interesting to note that the models created sugar content is a high-resolution problem.
from spectra from transmission-based Instrument 6 in In the case of the moisture models, dispersive Instru-
general are the simplest, even before pre-processing. ments 2 and 3 are almost consistently better than the
Apparently, the scatters from the density fluctuations in models based on Fourier-transform Instruments 4 and 5.
the samples measured in transmission mode are less The best overall model is found for Instrument 2 with
demanding than the reflective scatters measured in PSNVwindow-width 129 pre-processing (RMSE = 0.30, 4 LVs)
reflection mode. When it comes to pre-processing, it is and the best Fourier-transform model is found for
surprising that, in contrast to all other instruments, the Instrument 5 with fiber optics using MSC with second-
Euclidean norm works very well and gives the best order reference correction (RMSE = 0.37, 5 LVs). When
results for Instrument 6 (RMSEmoisture = 0.38, 3 LVs, and adding a fiber probe to Instrument 2 (= Instrument 3), the
RMSEsugar = 1.39, 5 LVs). The reason might be that this complexity of the models increases (on average by 3 LVs).
1220 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 28, No. 10, 2009 Trends
This large difference can be assigned to the more complex use normalization for short-wave NIR-transmission
optical geometry of the later system. Furthermore, the spectra and to use MSC (with first-order reference cor-
performance without pre-processing remains the same rection) or standard SNV for most other cases.
(RMSE = 0.42), but the pre-processed performance of While it is difficult to find the best pre-processing, it is
Instrument 3 is inferior (RMSE = 0.37, 10 LVs using SNV indeed possible to use wrong pre-processing. This will
and MSC) to that of the best model of Instrument 2. primarily happen due to incorrect parameter settings of
In the case of sugar models, the situation is almost window size and/or smoothing functions in the esti-
reversed. Here, interferometer-based Instrument 4 dis- mate of the derivative and the moving-window tech-
plays consistently the best models, albeit more complex, niques.
presumably due to the much better spectral resolution of Finally, we emphasize that the maximum improve-
this instrument. The best overall model is found for ment of any pre-processed model when compared to the
Instrument 4 with MSC1st order ref, 2nd order wave pre-pro- global model is approximately 25% in RMSE in our
cessing (RMSE = 0.92, 9 LVs), which is far better than study. While a 25% reduction might be important for
the best dispersive results (RMSE = 1.30, 4 LVs for industrial applications [29], this is hardly what makes
Instrument 2) but also much more complex. Again, in the difference in the many multivariate feasibility studies
the case of the sugar models, adding a fiber probe to that flourish in the scientific literature, for which we
Instrument 2 (= Instrument 3) makes the models infe- could recommend selecting pre-processing so as to
rior and much more complex (on average two more LVs achieve parsimonious, interpretable models.
and an increase in RMSE between the two best models
from 1.22 for Instrument 2 to 1.81 for Instrument 3).
The moving-window versions of SNV and MSC show References
[1] S. Wold, K. Esbensen, P. Geladi, Chemom. Intell. Lab. Syst. 2
varying results. In general, moving-window versions
(1987) 37.
give results similar to or better than the best remaining [2] S.B. Engelsen, E. Mikkelsen, L. Munck, Progr. Colloid Polym. Sci.
pre-processing method. However, the RMSE is at best 108 (1998) 166.
10% better than the best normal pre-processing tech- [3] J. Christensen, L. Nørgaard, H. Heimdal, J.G. Pedersen, S.B.
nique, but window selection could easily become a crit- Engelsen, J. Near Infrared Spectrosc. 12 (2004) 63.
[4] S. Wold, H. Martens, H. Wold, Lect. Notes Math. 973 (1983) 286.
ical parameter. For comparison, some sub-optimal
[5] H. Martens, T. Næs, Multivariate Calibration, Wiley, New York,
moving-window approaches are included as the last USA, 1989.
three rows in Tables 2 and 3. [6] H. Martens, S.A. Jensen, P. Geladi, Multivariate linearity trans-
Discrepancy between the finite-difference approach to formations for near infrared reflectance spectroscopy, in: O.H.J.
derivation and the more sophisticated methods is not Christie (Editor), Proc. Nordic Symp. Applied Statistics, Stokkland
Forlag, Stavanger, Norway, 1983, pp. 205–234.
apparent in the estimate of the first derivative for some
[7] P. Geladi, D. MacDougal, H. Martens, Appl. Spectrosc. 39 (1985)
measurements (Instruments 2, 3 and 6). This fits well 491.
with the smooth behavior of these systems, indicating [8] H. Martens, E. Stark, J. Pharm. Biomed. Anal. 9 (1991) 625.
that additional smoothing is not necessary. The inter- [9] D.K. Pedersen, H. Martens, J.P. Nielsen, S.B. Engelsen, Appl.
ferometers (Instrument 4 and 5) have a better spectral Spectrosc. 56 (2002) 1206–1214.
[10] H. Martens, J.P. Nielsen, S.B. Engelsen, Anal. Chem. 75 (2003)
resolution, giving rise to a higher degree of fine struc-
394.
ture, which leads to a lower signal-to-noise ratio in the [11] M. Decker, P.V. Nielsen, H. Martens, Appl. Spectrosc. 59 (2005)
estimate of the first derivative by the finite-difference 56.
method. This, in the end, leads to inferior models. The [12] S.N. Thennadil, E.B. Martin, J. Chemom. 19 (2005) 77.
results for using the finite difference for the second [13] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Appl. Spectrosc. 43 (1989)
772.
derivatives are generally that they are all inferior to the
[14] I.S. Helland, T. Næs, T. Isaksson, Chemom. Intell. Lab. Syst. 29
more sophisticated methods. This indicates that the (1995) 233.
signal-to-noise ratio in the spectra has been lowered [15] N.B. Gallagher, T.A. Blake, P.L. Gassman, J. Chemom. 19 (2006)
significantly (as expected), letting noise influence the 271.
models to a greater degree. However, even though finite [16] W. Windig, J. Shaver, R. Bro, Appl. Spectrosc. 62 (2008) 1153.
[17] Q. Guo, W. Wu, D.L. Massart, Anal. Chim. Acta 382 (1999) 87.
difference may lead to good models for the estimate of
[18] M.S. Dhanoa, S.J. Lister, R. Sanderson, R.J. Barnes, J. Near
the first derivative, it is safer to always stick with Infrared Spectrosc. 2 (1994) 43.
smoothing derivation methods. [19] K.H. Norris, Extracting information from spectrophotometric
curves - Predicting chemical composition from visible and near
infrared spectra, in: H. Martens, H. RusswurmJr. (Editors), Food
7. Concluding remarks Research and Data Analysis–Proc. IUFOST Symposium, Applied
Science Publishers, London, UK, 1983, pp. 95–113.
[20] K.H. Norris, P.C. Williams, Cereal Chem. 61 (1984) 158.
Obviously, our quantitative example does not give the [21] A. Savitzky, M.J.E. Golay, Anal. Chem. 36 (1964) 1627.
authoritative answer about which pre-processing to use [22] J. Steinier, Y. Termonia, J. Deltour, Anal. Chem. 44 (1972) 1906.
in any given case. However, it does appear sensible to [23] H.H. Madden, Anal. Chem. 50 (1978) 1383.
http://www.elsevier.com/locate/trac 1221
Trends Trends in Analytical Chemistry, Vol. 28, No. 10, 2009
[24] A. Proctor, P.M.A. Sherwood, Anal. Chem. 52 (1980) 2315. [28] R. Wehrens, H. Putter, L.M.C. Buydens, Chemom. Intell. Lab. Syst.
[25] P.A. Gorry, Anal. Chem. 62 (1990) 570. 54 (2000) 35.
[26] T. Isaksson, B.R. Kowalski, Appl. Spectrosc. 47 (1993) 702. [29] C.B. Zachariassen, J. Larsen, F. van den Berg, S.B. Engelsen,
[27] C.A. Andersson, Chemom. Intell. Lab. Syst. 47 (1999) 51. Chemom. Intell. Lab. Syst. 76 (2005) 149.
1222 http://www.elsevier.com/locate/trac