Decision tree–based identifcation of Staphylococcus aureus

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Analytical and Bioanalytical Chemistry (2022) 414:1049–1059

https://doi.org/10.1007/s00216-021-03729-2

RESEARCH PAPER

Decision tree–based identification of Staphylococcus aureus


via infrared spectral analysis of ambient gas
Hidehiko Honda1 · Masato Yamamoto1 · Satoru Arata1 · Hirokazu Kobayashi1 · Masahiro Inagaki1

Received: 25 May 2021 / Revised: 10 September 2021 / Accepted: 11 October 2021 / Published online: 23 October 2021
© The Author(s) 2021

Abstract
In this study, eight types of bacteria were cultivated, including Staphylococcus aureus. The infrared absorption spectra
of the gas surrounding cultured bacteria were recorded at a resolution of 0.5 ­cm−1 over the wavenumber range of 7500–
500 ­cm−1. From these spectra, we searched for the infrared wavenumbers at which characteristic absorptions of the gas
surrounding Staphylococcus aureus could be measured. This paper reports two wavenumber regions, 6516–6506 ­cm−1 and
2166–2158 ­cm−1. A decision tree–based machine learning algorithm was used to search for these wavenumber regions. The
peak intensity or the absorbance difference was calculated for each region, and the ratio between them was obtained. When
these ratios were used as training data, decision trees were created to classify the gas surrounding Staphylococcus aureus
and the gas surrounding other bacteria into different groups. These decision trees show the potential effectiveness of using
absorbance measurement at two wavenumber regions in finding Staphylococcus aureus.

Keywords Bacteria identification · Decision tree · Infrared absorption spectra · Machine learning · Staphylococcus aureus

Introduction confirmed by laser spectroscopy, reproduction monitoring


could be realized by measuring the surrounding air with-
The presence of the Staphylococcus (S.) aureus species in out establishing contact with the concerned bacteria. How-
humans is associated with the occurrence of several health ever, this method has not been applied to the reproduction
hazards, including bacteremia, infective endocarditis, osteo- monitoring of S. aureus. A similar method to analyze the
myelitis, pulmonary infections, gastroenteritis, meningitis, reproduction of S. aureus would facilitate the realization
toxic shock syndrome, and urinary tract infections [1–4]. of healthy human life, but appropriate investigations in this
Annually, the USA alone records up to 20,000 deaths, regard are yet to be undertaken. This study aims to address
which are partly attributable to the presence of S. aureus this gap in the existing literature.
[5]. Accordingly, precise, effective prevention and treatment Bacterial species possess unique characteristic odors.
methods must be developed to neutralize the effects of such These odors arise because of the release of volatile organic
lethal bacteria. From a medical research perspective, this compounds characteristic of each species into the atmos-
entails the development of effective procedures for monitor- phere. Previous studies have utilized gas chromatography
ing the strain and reproduction rate of S. aureus. and mass spectrometry to demonstrate that the S. aureus
Infrared laser radiations and optical parametric genera- metabolites contain isovaleric acid and 2-methyl-butanal [9].
tions that oscillate at a specific wavenumber have been pro- However, infrared spectroscopy fails to detect these vola-
posed as a means of facilitating easy and continuous analy- tile metabolites because their derived peak does not always
sis of the volatile organic compounds contained in human appear solely at the position of the strong infrared absorp-
exhaled breath [6–8]. If the presence of bacteria could be tion peak obtained from their stable molecular structure
[10]. This might result in the aggregation of several char-
acteristic molecules that absorb infrared rays. Furthermore,
* Hidehiko Honda
[email protected] any overlap with the absorption peaks of other molecules
can potentially impede metabolite quantification [11]. This
1
Faculty of Arts and Sciences at Fujiyoshida, Showa necessitates the availability of the characteristic-spectrum
University, 4562, Kami‑yoshida, Fuji‑yoshida‑shi,
Yamanashi 403‑0005, Japan

13
Vol.:(0123456789)
1050 Honda H. et al.

wavenumber information to realize accurate metabolite ­ −1, soybean protein digest 5.0 g ­L−1, growth factors 2.2 g
L
detection via infrared spectroscopy [12, 13]. ­L−1, NaCl 5.0 g L ­ −1, agar 13.0 g L
­ −1, and 5% defibrinated
The proposed research employs a decision tree–based sheep blood (Nissui Plate Sheep Blood Agar 51,001, Nissui
machine learning algorithm to detect the wavenumber Pharmaceutical Co., Ltd.). Only one type of bacteria was
range across which the infrared absorption spectra peculiar planted in each Petri dish. Two Petri dishes planted with the
to S. aureus can be recorded. Recent advances in machine same type of bacteria were placed in an airtight container,
learning technology have demonstrated its significant poten- and the bacteria were cultured in an incubator while inside
tial with regard to efficiently handling classification tasks the airtight container. The culturing was performed at a tem-
involving large quantities of data that cannot be analyzed perature of 37 °C over a period of approximately 40 h. Post-
by humans; moreover, it can handle data that are too small culturing, the gas surrounding the bacterium was aspirated
in value to be distinguished using ordinary analysis meth- into gas bags (smart bag PA, smart bag 2F, Tedlar bag or
ods [14–20]. This paper presents an approach to classify ANALYTIC-BARRIER Bag, GL Sciences) by the indirect
infrared absorption spectra via machine learning. A dataset sampling method using a dry pump, as shown in Fig. 1. Each
comprising approximately 5 × ­109 data points was used to gas bag was equipped with a standard sleeve (6–7 mm O.D.)
this end. The results obtained in this study demonstrate the connected to a silicone tube within 1 m. The other end of the
realization of accurate bacterium-propagation detection in silicone tube was placed within 5 cm of the object.
the gas surrounding S. aureus via infrared irradiation of the The number of cultures is shown in the “Number of sam-
odorous surrounding space. ples” column in Table 1. One gas sample was obtained from
one airtight container. Absorbance measurements were per-
formed multiple times for each sample. The number of infra-
Materials and methods red absorption spectra obtained for each sample is shown in
parentheses in the “Total number of measurements” column.
Sample collection For example, S. aureus was cultured four times, each gas
sample obtained in each culture was measured four times,
Eight common types of bacteria were cultivated in this and a total of 16 spectra were obtained.
study. Table 1 lists these bacteria types considered along
with their specifications. The S. aureus species considered
in this study include the standard type (Id: Sa) and its drug-
resistant strain (mrSa). All bacteria types were cultivated
in sheep-blood medium consisting of casein peptone 13.0 g

Table 1  Eight facultative anaerobic bacteria species cultured in this was cultivated seven times, and a sample of the gas around the bac-
study in sheep-blood medium. Staphylococcus aureus and Pseu- terium was collected in each culture. In the case of mrSa, the absorb-
domonas aeruginosa were cultivated in two types, a standard bac- ance was measured once to four times for each sample. The number
terium and a drug-resistant bacterium. For the other six species of in parentheses in “Total number of measurements” is the number of
bacteria, either standard bacteria or drug-resistant bacteria were used absorption spectra measurements for each sample, and the total num-
as samples. “Number of samples” is equal to the number of cultures. ber of infrared absorption spectra (Total number of measurements) is
For example, methicillin-resistant Staphylococcus aureus (ID: mrSa) 1 + 1 + 4 + 4 + 4 + 4 + 4 = 22 spectra

Air around bacteria Genus species ID Strain/origin Number Total number of measurements Incuba-
of sam- tion time
ples (h)

Standard strains Staphylococcus aureus Sa Type strain (NBRC 13,276) 4 16 (4 + 4 + 4 + 4) 39


Hospital isolates Methicillin-resistant mrSa Clinical isolate 7 22 (1 + 1 + 4 + 4 + 4 + 4 + 4) 44
Staphylococcus aureus
Standard strains Escherichia coli Ec Type strain (NBRC 3972) 2 8 (1 + 7) 32
Pseudomonas aeruginosa Pa1 Type strain (NBRC113275) 2 11 (5 + 6) 38.5
Hospital isolates Pseudomonas aeruginosa Pa2 Hospital environment 4 20 (5 + 5 + 5 + 5) 39
Klebsiella pneumoniae Kp Clinical isolate 2 8 (1 + 7) 32
Enterobacter cloacae Enc Clinical isolate 2 8 (1 + 7) 39
Acinetobacter baumannii Abb Clinical isolate 10 53 (5 + 6 + 5 + 5 + 5 + 5 + 5 + 5 + 38.5
6 + 6)
Streptococcus pneumoniae Scp Clinical isolate 4 18 (4 + 6 + 4 + 4) 37.5
Enterobacter aerogenes Ena Clinical isolate 2 10 (4 + 6) 37.5

13
Decision tree–based identification of Staphylococcus aureus via infrared spectral analysis… 1051

Methods gas bag


intake exhaust
Infrared spectroscopy
sample
vacuum pump
A gas cell characterized by a 10 m optical-path length was
set in the sample compartment of the VERTEX 70v FT-IR gas cell
spectrometer (Bruker Corporation; USA), as shown in
Fig. 2. There was an intake port and an exhaust port at the
infrared ray
top of the gas cell. The air in the cell was exhausted from the
exhaust port with the PM28309-950.50 vacuum pump (KNF
Neuberger GmbH; Germany), and background measurement
was performed in a vacuum state. Subsequently, by closing
the exhaust port and opening the intake port, the sample was
introduced into the gas cell by suction until the pressure in
the cell reached atmospheric pressure. The infrared absorp-
tion spectrum corresponding to the gas was recorded under FT-IR spectrometer (VERTEX 70v)
atmospheric pressure. The measured wavenumber range was
500–7500 ­cm−1, while the corresponding spectral resolution
Fig. 2  Infrared absorption spectrum measurement. A vertical gas cell
and integration time were 0.5 ­cm−1 and 10 min, respectively. was installed in the sample compartment of the infrared spectropho-
tometer (VERTEX 70v). At the top of the gas cell were an intake and
Machine learning an exhaust. A gas bag was connected to the intake port, and a vacuum
pump was connected to the exhaust port. Infrared rays were reflected
multiple times in the gas cell as illustrated by the line with arrows
A decision tree algorithm was used to detect the absorp-
tion peaks specific to Staphylococcus aureus from the
infrared absorption spectrum group of the gas around the absorbance cannot be measured without using a device
bacteria. The classification and regression tree was used with a high signal-to-noise absorbance ratio. To increase
as the decision tree algorithm, and Scikit-learn was used the versatility of the results, the decision tree with a rela-
as the library [21]. The details concerning the decision tively high peak intensity and which was judged to be use-
tree algorithm parameters are listed in Table 2. Training ful was selected from the 150 decision trees; the results are
data were created by the following three methods. The discussed in the following section. Test data were not used
learning was conducted 150 times, and the calculation was in this study as our intention was to classify the training
repeated under the condition that the data used to create data. We used Microsoft Excel 2016 and 2019 and Visual
one decision tree would not be used again. The learner was Studio 2019 to create the training data. For machine learn-
trained with data for all peak intensity ratios or all ratios of ing, we used RAPIDS Docker (21.08-cuda11.0-runtime-
absorbance differences. Therefore, a combination of very ubuntu18.04-py3.8) and Scikit-learn (version 0.23.1).
small peak intensities or values of very small absorbance
differences were sometimes selected. Slight differences in Classification by peak intensity ratio Numerous peaks were
observed in the infrared absorption spectra obtained for
each bacteria type. The wavenumber and absorbance values
container as well as minimum values on both sides of the absorp-
tion peaks were extracted. The points corresponding to the
minimum absorbance values were considered to constitute
the baseline. The difference between the peak and baseline
gas bag absorbance values was calculated and used as the peak inten-
vacuum pump sity. A maximum of 7667 peak intensities were detected
petri dish
airtight case

Table 2  Decision tree Parameter Value


Fig. 1  Collection of mixed gas. Petri dishes inoculated with bacteria parameters. Arguments
were placed in a closed container and placed in an incubator. After specified by fit function of
the culture time shown in Table 1 had elapsed, a silicone tube was ccp_alpha 0
scikit-learn
inserted through the insertion port of the closed container. Subse- criterion Gimi
quently, the gas around the Petri dish was recovered in the gas bag, splitter Best
which was inside an airtight case that gave the gas bag a negative class_weight None
pressure via a vacuum pump

13
1052 Honda H. et al.

Depth = 0
Peak intensity (6514.3 ± 1.3 cm-1) / Peak intensity (2163.4 ± 0.6 cm-1) <= 2.441
gini = 0.341
samples = 174
value = [38, 136]
True False
Depth = 1
Peak intensity (6893.6 ± 1.4 cm-1) / Peak intensity (2164.2 ± 0.2 cm-1) <= 829.091
gini = 0.05
gini = 0.0
samples = 39 samples = 135
value = [38, 1] value = [0, 135]

gini = 0.0 gini = 0.0


samples = 1 samples = 38
value = [0, 1] value = [38, 0]

Fig. 3  Decision tree generated by Method 1 (using peak intensity groups depending on whether the peak intensity ratio was less than
ratio as training data). The spectrum of S. aureus, Sa and mrSA (38 or exceeded 2.441. All Staphylococcus aureus spectra were classified
spectra), and the spectrum of other bacteria (136 spectra) were sepa- in the group with a peak intensity ratio of 2.441 or less, and only one
rated. The numbers in parentheses are the wavenumber ranges that spectrum of other bacteria was included in this group
include the peaks. At Depth 0, the spectra were divided into two

for each spectrum. The absolute peak intensity value was


proportional to the sample gas concentration, i.e., the partial
pressure of gas released by the bacteria. Consequently, the 0.012
intensity values differed from one gas sample to another.
To eliminate the influence of sample-gas concentration,
0.010
the ratio of any two peak intensities was used to classify
Peak intensity (6514.3 ± 1.3 cm-1)

the infrared absorption spectra. In other words, the infrared


absorption spectra were characterized using the mixing ratio 0.008
of the two partial structures that absorbed infrared rays. The
ratio of all peak intensities to other peaks was calculated and
considered as learning data. The peak intensity ratio data 0.006
Sa
were a maximum of 2.94 × ­107 real array. Because this study mrSa
used 174 infrared absorption spectra, the training data had Ec
Pa1
a structure of 2.94 × ­107 columns × 174 rows. Each row of 0.004
Pa2
the training data was labeled (i) “gas surrounding S. aureus Kp
Enc
(Sa and mrSa)” and (ii) “others”—comprising 38 and 136 0.002 Abb
infrared absorption spectra, respectively. Scp
Ena

Classification based on absorbance difference at two 0


wavenumbers When generating training data in Method 1, 0 0.001 0.002 0.003 0.004 0.005 0.006
the peak intensity—that is, the difference between the peak and Peak intensity (2163.4 ± 0.6 cm )
-1

baseline absorbance values—was determined from the infrared


absorption spectra. Therefore, when calculating the peak Fig. 4  Peak intensity data used to generate the decision tree shown
intensity, it is necessary to determine the peak absorbance value in Fig. 3. The vertical axis is the peak intensity value in the range of
6514.3 ± 1.3 ­cm−1, and the horizontal axis is the peak intensity value
along with the corresponding minimum values on both sides
in the range of 2163.4 ± 0.6 ­cm−1. The dark black markers indicate
of the peak. The wavenumbers that determine the absorbance results obtained for the gas surrounding S. aureus, and other marks
values at three points differ across spectra. Therefore, to calculate are data for bacteria other than S. aureus. The slope of the straight
the peak intensity, it is necessary to measure the absorbance by line passing through each point and the origin is the peak intensity
ratio. The slope of the solid line is 2.441, which is the boundary value
changing the wavenumber in the wavenumber range where the
obtained by the decision tree. All S. aureus data are plotted in the
peak is detected. When measuring infrared absorption outside area to the right of the border. There is one □ plot in this area. This is
laboratory settings and classifying the infrared absorption Escherichia coli data

13
Decision tree–based identification of Staphylococcus aureus via infrared spectral analysis… 1053

0.015
6515.3 cm-1 (a) Sa (b) mrSa (c) Kp (d) Ec (e) Enc
6514.2 cm-1
6515.6 cm-1 6507.6 cm-1
0.010
bsorbance

0.005

0.015
(f) Abb (g) Pa1 (h) Pa2 (i) Scp (j) Ena

0.010
Absorbanc

0.005

6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506
Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1

0.035

0.030 2163.4 cm-1


(k) Sa (l) mrSa (m) Kp (n) Ec (o) Enc

0.025
-1 2158.8 cm-1
Absorbance

2166.3 cm
0.020
2158.4 cm-1
0.015

0.010

0.005

0
0.035
(p) Abb (q) Pa1 (r) Pa2 (s) Scp (t) Ena
0.030
Absorbance

0.025

0.020

0.015

0.010

0.005

0
2166 2164 2162 2160 2158 2166 2164 2162 2160 2158 2166 2164 2162 2160 2158 2166 2164 2162 2160 2158 2166 2164 2162 2160 2158
Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1

Fig. 5  Absorbance from 6518.7 to 6504.2 ­cm−1 (a–j) and that from (6514.3 ± 1.3 ­cm−1 and 2163.4 ± 0.6 ­cm−1) where the peaks selected
2167.0 to 2157.5 ­cm−1 (k–f). All 174 spectra used in this study are by the decision tree shown in Fig. 3 were observed. The absorb-
plotted. The colored curves are the spectrum of the gas around each ances measured from 6515.6 to 6507.6 ­cm−1 and from 2158.4 to
bacterium, and the solid black curves are the curves obtained by 2166.3 ­cm−1 were used when creating training data in Methods 2
averaging the absorbance. These wavenumber regions (from 6518.7 and 3. Vertical axis values were adjusted to ensure the absorbance at
to 6504.2 ­cm−1, from 2167.0 to 2157.5 ­cm−1) include the regions 6515.3 and 2158.8 ­cm−1 equals zero

spectra of mixed gases on the spot, it is desirable to fix the Classification performed using three absorbance val-
wavenumber when measuring the absorbance; for this, an ues The absorbance values measured at three points within
infrared-light-emitting diode can be used. each wavenumber domain (six points in total) were used
to classify the infrared absorption spectra. This approach
Considering this application, we also formulated a procedure is similar to that considering peak intensity. However, the
for classifying the difference between absorbance values at two wavenumbers corresponding to the absorbance values used
points with constant wavenumbers. In this method, training data in this calculation were maintained constant. This method
were created using the absorbance values in the wavenumber used the difference in the shape of the spectrum to classify
region where the peak specific to S. aureus was detected, the spectra. Similar to Method 2, the absorbance values
which was found in Method 1. The wavenumber regions were at three points were selected for the wavenumber region
6515.6–6507.6 ­cm−1 and 2158.4–2166.3 ­cm−1. Each region where the peak intensity was detected. We calculated a
contained 67 absorbance data. The difference between each straight line connecting the point with the lowest wave-
absorbance value and another absorbance value was calculated for number and the point with the highest wavenumber. A
each of the two regions. The difference in absorbance was an array vertical line was drawn from the remaining one point to a
of 2211 values in each region. We calculated the ratio of the values straight line, and the difference in absorbance was calcu-
in both arrays and created an array of 2211 × 2211 = 4.89 × ­106 lated along the vertical line. Training data were created by
values. Subsequently, as in Method 1, the arrays corresponding replacing the ratio of the difference values on the vertical
to each of the 174 spectra were arranged in the row direction to be line with the ratio of the difference in absorbance used in
training data, and classes (i) and (ii) labels were assigned. Method 2.

13
1054 Honda H. et al.

Results and discussions

0.000807

0.000410
0.006046

0.002878
Ena
Classification by peak intensity ratio

Table 3  Average and standard deviation of the absorbance of 6514.2 ­cm−1 and 2163.4 c­ m−1 of the gas around the bacteria. The average values are the points on the solid curves in Fig. 5
Figure 3 illustrates the decision tree generated by Method

0.001230

0.000789
0.005676

0.002133
1. The spectral separation was performed in two stages—
“Depths 0–1.” The training data comprised 174 spectra,
Scp
including two classes—peripheral gases obtained by cul-
turing (i) S. aureus and (ii) other bacteria, containing 136
0.001330

0.000722
0.006105

0.001986
and 38 spectra, respectively. During Depth 0 classification,
99% separation was achieved, and the spectra were classi-
Pa2

fied based on the ratio of peak intensities corresponding to


the (6514.3 ± 1.3) and (2163.4 ± 0.6) ­cm−1 wavenumbers.
0.000206

0.000107
0.007007

0.003493

The 174 spectra at Depth 0 were divided into two groups


comprising 38 + 1 and 135 spectra, depending on whether
Pa1

the above-described peak ratio was less than or exceeded


2.441. The former group comprised 38 and 1 spectra cor-
responding to classes (i) and (ii), respectively. Accord-
0.001085

0.000531
0.006627

0.002959

ingly, these 38 + 1 spectra were divided into two classes


Abb

referred to as “Depth 1.”


Figure 4 depicts the peak intensity distribution observed
in the (6514.3 ± 1.3) and (2163.4 ± 0.6) ­cm−1 wavenumber
0.000536

0.000222
0.007611

0.003808

ranges. The dark black markers indicate values obtained


Enc

from the gas surrounding S. aureus. The gradient of the


straight line depicted in the figure is 2.441, and it cor-
responds to the value on the right side of the classifica-
0.000282

0.000272
0.007461

0.003905

tion relation for Depth 0 (Fig. 3). Most spectra with peak
intensity ratios of 2.441 or less were included in the class
Ec

containing S. aureus.
Figure 5 shows the spectrum at approximately 6514.3
0.000506

0.000370
0.007277

0.003459

and 2163.4 ­cm−1. All spectra were overlaid for each bacte-
ria type. The black lines are the average curves of absorb-
Kp

ance. (Hereinafter, the curves are used as curves repre-


senting the characteristics of the spectral shape for easy
viewing.) Table 3 shows the values of the mean and the
0.000576

0.000292
0.008933

0.005409

standard deviation of the absorbance at 6514.2 ­cm−1 and


mrSa

2163.4 ­cm−1.
0.000735

0.000602
0.007842

0.004632

Classification based on absorbance difference at two


wavenumbers
Sa

Figure 6 shows a decision tree generated using the ratio


Standard deviation

Standard deviation

of the absorbance differences as training data. The value


selected at Depth 0 was
Average

Average

Absorbance at 6514.2 cm−1 − Absorbance at 6509.8 cm−1


Absorbance at 2163.1 cm−1 − Absorbance at 2164.9 cm−1
(1)
This value was divided into two groups with a threshold
−1

2163.4 ­cm−1

of 3.341. Thirty-eight spectra of class (i) and two spec-


6514.2 ­cm

tra of class (ii) were mixed in the group with the value
less than 3.341, but any spectra other than the two spectra
ID

13
Decision tree–based identification of Staphylococcus aureus via infrared spectral analysis… 1055

Fig. 6  Decision tree gener- Depth = 0


ated by Method 2 (using
the ratio of the difference in Absorbance (6514.2 cm-1—6509.8 cm-1) /
absorbance at two points as Absorbance (2163.1 cm-1—2164.9 cm-1) >= 3.341
training data). “Absorbance
(6514.2 ­cm−1—6509.8 ­cm−1)” gini = 0.341
shows the absorbance at samples = 174
6514.2 ­cm−1 minus the absorb- value = [38, 136]
ance at 6509.8 ­cm−1
True False
Depth = 1
Absorbance (6515.3 cm-1—6507.6 cm-1) /
gini = 0.0
samples = 134 Absorbance (2159.0 cm-1—2164.6 cm-1) >= -445.822
value = [0, 134] gini = 0.095
samples = 40
value = [38, 2]
True False

gini = 0.0 gini = 0.0


samples = 38 samples = 2
value = [38, 0] value = [0, 2]

belonging to class (ii) were classified according to the The spectra in this figure are curves of the average value
label. Figure 7 shows the distribution of the absorbance of absorbance shown by the black line in Fig. 5. In Figs. 8a
difference. The slope of the straight line is 3.341, which and b, the absorbance values are translated so that the
is the threshold at Depth 0. absorbances of 6509.8 ­c m −1 and 2164.9 ­c m −1 are zero.
Figure 8 shows the four wavenumbers (6514.2, 6509.8, Therefore, the value in Eq. (1) is the ratio of the value on
2163.1, and 2164.9 ­cm−1) extracted by machine learning. the vertical axis of 6514.2 ­cm−1 to the value on the vertical
axis of 2163.1 ­cm−1.

0.010
Classification performed using three absorbance
values
Absorbance (6514.2 cm-1—6509.8 cm-1)

0.008 Figure 9 shows a decision tree created using the absorbance


at three points for each region. The wavenumbers selected
by machine learning were replaced as follows:
0.006
w1 = 6514.4 cm−1 , w2 = 6512.8 cm−1 , w3 = 6511.8 cm−1 ,
Sa
mrSa
w4 = 2164.0 cm−1 , w5 = 2163.1 cm−1 , w6 = 2162.6 cm−1
Ec (2)
0.004 Pa1
Pa2 The index selected by the machine to classify the spectra
Kp
Enc
was the ratio of
0.002 Abb
Scp Absorbance at w2 −
Ena {
Absorbance at w3 − Absorbance at w1 ( )
}
(3)
w2 − w1 − Absorbance at w1
w3 − w 1
0
0.0 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035 to
Absorbance (2163.1 cm-1—2164.9 cm-1)
Absorbance at w5 −
{ }
Absorbance at w6 − Absorbance at w4 (
Fig. 7  Difference in absorbance used to create the decision tree
)
w5 − w4 − Absorbance at w4
w6 − w 4
shown in Fig. 6. The slope of the solid line is the boundary value
shown in Fig. 6, 3.341. The slope of the straight line connecting the (4)
origin and each point is the ratio of the difference in absorbance

13
1056 Honda H. et al.

0.014
(a) The boundary value for this indicator is − 3.189. Fig-
0.012
6514.2 cm-1 6509.8 cm-1
ure 10 depicts the distribution obtained by this method. Fig-
ure 11 shows the wavenumbers used to create the decision
0.010
tree. 6514.4 and 6511.8 ­cm−1 are the wavenumbers that give
the peak absorbance, and 6513.2 ­cm−1 is the wavenumber
bsorbance

0.008
Sa
mrSa
Kp
Enc
near the bottom. It can be seen that the classification was
0.006 Ec
Pa1
Abb
Scp performed using the absorbance of the two peaks on both
sides of 6513.2 ­cm−1. By contrast, it can be seen that the
Pa2 Ena
0.004

0.002
wavenumbers of 2164.0, 2163.1, and 2162.6 ­cm−1 were
selected to emphasize the absorbance change near the peak
0
with 2163.4 ­cm−1.
-0.002
6516 6514 6512
Wavenumber /cm-1
6510 6508
Relationship between infrared absorption spectrum
and volatile organic compounds
0.025
(b)
2163.1 cm-1
Mass spectrometer studies have shown that S. aureus
0.020 2164.9 cm-1 releases isovaleric acid and 2-methyl-butanal [9]. Therefore,
the infrared absorption spectra of the two volatile organic
compounds and the spectra of the gas around the S. aureus
Absorbanc

0.015

were compared. Isovaleric acid and 2-methyl-butanal were


0.010 purchased from Tokyo Chemical Industry at purities > 99.0%
(GC) and > 95.0% (GC), respectively. After filling the PA
0.005 bag with nitrogen (purity > 99.99995%), each reagent was
injected into the bag using a pipette. In addition, a sample
0 containing pure water (filtered with Direct-Q 3UV, Merck
2166 2164 2162 2160 2158 Millipore S.A.S.) in a bag containing each reagent was also
Wavenumber /cm-1
prepared. At the time of measurement by Fourier transform
infrared (FTIR), the water vapor in the bag was saturated
Fig. 8  Relationship between 6514.2, 6509.8, 2163.1, and
2164.9 ­cm−1 selected in the decision tree. The absorbance at
because the water remained as droplets on the inside surface
6509.8 ­cm−1 and 2164.9 ­cm−1 was plotted to be zero. Therefore, the of the bag.
values on the vertical axis at 6514.2 ­cm−1 and 2163.1 ­cm−1 corre- The spectra of the gas around the bacteria were fitted
spond to the values of the difference in absorbance with the spectrum of VOCs. The fitting curves are shown

Fig. 9  Decision tree gener- Depth = 0


ated by Method 3 (method of
creating training data using Absorbance (6514.4 cm-1—6512.8 cm-1—6511.8 cm-1) /
the 3 values of absorbance). Absorbance (2164.0 cm-1—2163.1 cm-1—2162.6 cm-1) <= -3.189
“Absorbance (6514.4 ­cm−1—
6512.8 ­cm−1—6511.8 ­cm−1 gini = 0.341
)” is the difference between samples = 174
the absorbance obtained at the value = [38, 136]
wavenumber marked in the True False
center, 6512.8 ­cm−1, and the
line connected by the two points Depth = 1
measured at the underlined Absorbance (6514.4 cm-1—6513.2 cm-1—6512.0 cm-1) /
wavenumbers, 6514.4 and gini = 0.0
Absorbance (2164.0 cm-1—2162.5 cm-1—2162.3 cm-1) <= 11.39
6511.8 ­cm−1 samples = 130
value = [0, 130] gini = 0.236
samples = 44
value = [38, 6]
True False

gini = 0.0 gini = 0.0


samples = 38 samples = 6
value = [38, 0] value = [0, 6]

13
Decision tree–based identification of Staphylococcus aureus via infrared spectral analysis… 1057

in Fig. 12. The spectra of isovaleric acid and 2-methyl- exist alone, but are bound to other molecules, including
butanal are displayed superimposed on the spectra of the water.
gas around the bacteria. The absorbances of isovaleric acid 2. By synthesizing the spectra of isovaleric acid and
and 2-methyl-butanal were each multiplied by a certain 2-methyl-butanal, each containing water, it was possible
magnification. The magnifications were determined by to create a curve with shapes close to the spectra of Sa
the least-squares method using the absorbances of Sa and and mrSa. Therefore, the partial pressure of both VOCs
mrSa. In regions I, III, and V, the wavenumbers that give is thought to affect the absorbance in the wavenumber
peaks in the spectra of Sa and mrSa and the wavenum- region extracted by machine learning.
bers that give peaks in the spectra of isovaleric acid and 3. The black line did not completely reproduce the Sa
2-methyl-butanal were the same. However, regarding the and mrSa spectra of regions II, IV, and V. In particu-
absorbance of region II, the absorbance of the reagent was lar, region II is an important region for distinguishing
higher than that of Sa and mrSa. In addition, in regions S. aureus from other bacteria. We surmise it could not
IV and VI, the absorbance of isovaleric acid was low, and be reproduced because many molecules other than
the absorbance of 2-methyl-butanal was higher than that isovaleric acid, 2-methyl-butanoic, and water were pre-
of Sa and mrSa. The black solid lines in Fig. 10 (a) and (d) sent in the mixed gas, and the molecules influenced each
are curves obtained by adding the spectra of the mixed gas other. In other words, it can be said that it is impossible
of isovaleric acid and water and that of the mixed gas of to distinguish bacteria by a deductive method that pre-
2-methyl-butanal and water. The absorbance of isovaleric
acid and water and the absorbance of 2-methyl-butanal (a)
0.020
and water were multiplied by different magnifications. All 6514.4 cm-1 6511.8 cm-1
coefficients were calculated by the least-squares method.
The shape of the black curve (isovaleric acid + 2-methyl- 0.010 6512.8 cm-1
bsorbance

butanal + water) was closer to the shape of the spectrum


of Sa and mrSa than the shape of the red curves (isovaleric
acid) and blue curves (2-methyl-butanal). 0

From the above, the following can be considered. Kp


Sa
mrSa Enc
Ec Abb
1. In the mixed gas around bacteria, it is highly possible -0.010 Pa1
Pa2
Scp
Ena

that the VOCs detected by a mass spectrometer do not


6516 6514 6512 6510 6508
Wavenumber /cm-1
0
2162.6 cm-1 (b)
0.020
Sa
Absorbance (6514.4 cm-1—6512.8 cm-1—6511.8 cm-1)

2163.1 cm-1
-0.001 mrSa 2164.0 cm-1
Ec 0.015
Pa1
Pa2
Kp
Absorbanc

0.110 2163.4 cm-1


-0.002
Enc
Abb
Scp 0.005

-0.003 Ena
0

-0.004 -0.005

2166 2164 2162 2160 2158


Wavenumber /cm-1
-0.005

Fig. 11  Relationship between 6514.4, 6512.8, 6511.8, 2164.0,


-0.006 2163.1, and 2162.6 ­cm−1 selected in the decision tree. The values on
the vertical axis were calculated such that the values of absorbance
0 0.0005 0.0010 0.0015 0.0020 at the underlined wavenumbers in Fig. 9 became zero. Therefore, the
values on the vertical axis of 6512.8 and 2163.1 ­cm−1 correspond
Absorbance (2164.0 cm-1—2163.1 cm-1—2162.6 cm-1)
to the values of the difference in absorbance used when creating
the training data. As shown in a, because the absorbance values of
Fig. 10  Data of absorbance difference used when creating the deci- 6514.4 and 6511.8 ­cm−1 were larger than the absorbance value of
sion tree shown in Fig. 9. The slope of the solid line is − 3.189, which 6512.8 ­cm−1, the difference was negative, and the boundary value
is the boundary value at Depth 0 obtained by the decision tree was a negative value, − 3.189

13
1058 Honda H. et al.

Ⅰ Ⅱ Ⅲ Ⅰ Ⅱ Ⅲ Ⅰ Ⅱ Ⅲ
0.015 Isovaleric acid + water (a) (b) (c)
2-Methylbutanal
utanall + wate
water Isovaleric acid 2-Methylbutanal
anal
Isovaleric acid + 2-Met
2-Methylbutanal
utanal + water
Sa Kp
mrSa Enc
0.010 Ec Abb
Pa1 Scp
bsorbance

6509.8 cm-1
Pa2 Ena

0.005

6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506
Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1
Ⅳ Ⅴ Ⅵ Ⅳ Ⅴ Ⅵ Ⅳ Ⅴ Ⅵ
0.030
(d) (e) (f)
0.025 m-1
2164.9 cm

0.020
Absorbanc

0.015

0.010

0.005

2166 2164 2162 2160 2158 2166 2164 2162 2160 2158 2166 2164 2162 2160 2158
Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1

Fig. 12  Absorbance curves of isovaleric acid, 2-methyl-butanoic, curves. The red dotted curve is the average value of absorbance of
a mixture of isovaleric acid and water, and a mixture of 2-methyl- Sa, and the blue dotted curve is that of mrSa. The red solid line in b
butanoic and water. Three thick solid curves are drawn in a. The three is the absorbance of isovaleric acid, and the blue solid line in c is the
curves are the absorbance curve of isovaleric acid and water, that of absorbance of 2-methyl-butanoic
2-methyl-butanal and water, and that obtained by combining their two

dicts the spectrum of a mixed gas by superimposing the 1. Considering the peak absorbance value and correspond-
infrared absorption spectra of molecules detected by a ing minima on either side of this peak. The baseline
mass spectrometer. Therefore, an inductive method for (minimum) values corresponding to the two adjacent
measuring the spectrum of an actual sample and search- points were subtracted from the peak value to determine
ing for a characteristic wavenumber using machine the peak intensity.
learning is necessary for the classification of mixed 2. Considering the absorbance values at two points (cor-
gases. responding to fixed wavenumbers) and calculating the
difference between them.
3. Considering the absorbance values at three points with
Conclusions constant wavenumbers, and of these points, we calcu-
lated a straight line connecting the two points on both
This paper presented an approach for analyzing the infra- sides. Subsequently, a vertical line was drawn from the
red absorption spectra of gases surrounding bacteria using remaining points to a straight line, and the difference in
a decision tree–based machine learning algorithm. The absorbance was calculated on the vertical line.
proposed method offers an effective means to determine
the presence (or absence) of the S. aureus bacteria. The The results of this study reveal that all three methods are
wavenumber corresponding to the characteristic absorb- equally capable of identifying the gas produced by S. aureus.
ance value of a spectrum can be determined using the deci- Thus, this study is the first of its kind to confirm the feasi-
sion tree algorithm. In this study, spectral classification was bility of using infrared adsorption spectra to measure and
performed considering the differences between absorbance monitor the growth of S. aureus. The findings of this study
values corresponding to two or three points in the infrared are expected to afford humanity the realization of several
absorption spectra. These differences were calculated using health benefits arising from the use of such technologies in
the following methods. medical practice.

13
Decision tree–based identification of Staphylococcus aureus via infrared spectral analysis… 1059

Acknowledgements We would like to thank Editage (www.​edita​ge.​ 7. Gisbert JP, Pajares JM. 13C-urea breath test in the diagnosis of Helicobacter
com) for their English language-editing support. pylori infection - a critical review. Aliment Pharmacol Ther.
2004;20:1001–17. https://​doi.​org/​10.​1111/j.​1365-​2036.​2004.​02203.x.
Funding This work was supported by the Showa University Fujiyoshida 8. Arslanov DD, Castro MPP, Creemers NA, Neerincx AH, Spu-
Fund for the Advancement of Research and Education (2019–2022). nei M, Mandon J, Cristescu SM, Merkus P, Harren FJM. Optical
parametric oscillator-based photoacoustic detection of hydrogen
cyanide for biomedical applications. J Biomed Opt. 2013;18(10):
Declarations 107002. https://​doi.​org/​10.​1117/1.​JBO.​18.​10.​107002.
9. Bos LD, Sterk PJ, Schultz MJ. Volatile metabolites of pathogens:
Ethics approval This study was approved by the research ethics com- a systematic review. PLoS Pathog. 2013;9(5): e1003311. https://​
mittee of Showa University School of Medicine (Approval No. 371 doi.​org/​10.​1371/​journ​al.​ppat.​10033​11.
and 2510). 10. Kim S, Young C, Vidakovic B, Gabram-Mendola SGA, Bayer
CW, Mizaikoff B. Potential and challenges for mid-infrared sen-
sors in breath diagnostics. IEEE Sens. 2010;10(1):145–58. https://​
Conflict of interest The authors declare no competing interests.
doi.​org/​10.​1109/​JSEN.​2009.​20339​40.
11 Lemke KH, Seward TM. Ab initio investigation of the structure,
Open Access This article is licensed under a Creative Commons Attri- stability, and atmospheric distribution of molecular clusters con-
bution 4.0 International License, which permits use, sharing, adapta- taining ­H2O, ­CO2, and ­N2O. J Geophys Res. 2008;113:D19304.
tion, distribution and reproduction in any medium or format, as long https://​doi.​org/​10.​1029/​2007J​D0091​48.
as you give appropriate credit to the original author(s) and the source, 12. Huang WE, Hopper D, Goodacre R, Beckmann M, Singer
provide a link to the Creative Commons licence, and indicate if changes A, Draper J. Rapid characterization of microbial biodegrada-
were made. The images or other third party material in this article are tion pathways by FT-IR spectroscopy. J Microbiol Methods.
included in the article's Creative Commons licence, unless indicated 2006;67(2):273–80. https://d​ oi.o​ rg/1​ 0.1​ 016/j.m​ imet.2​ 006.0​ 4.0​ 09.
otherwise in a credit line to the material. If material is not included in 13. Preisner O, Lopes JA, Guiomar R, MacHado J, Menezes JC.
the article's Creative Commons licence and your intended use is not Fourier transform infrared (FT-IR) spectroscopy in bacteriology:
permitted by statutory regulation or exceeds the permitted use, you will towards a reference method for bacteria discrimination. Anal
need to obtain permission directly from the copyright holder. To view a Bioanal Chem. 2007;387(5):1739–48. https://​doi.​org/​10.​1007/​
copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/. s00216-​006-​0851-1.
14. Vlahou A, Schorge JO, Gregory BW, Coleman RL. Diagnosis of
ovarian cancer using decision tree classification of mass spectral
data. J Biomed Biotechnol. 2003;2003(5):308–14. https://d​ oi.o​ rg/​
References 10.​1155/​S1110​72430​32100​32.
15. Menze BH, Petrich W, Hamprecht FA. Multivariate feature selec-
1. Hennekinne JA, De Buyser ML, Dragacci S. Staphylococcus tion and hierarchical classification for infrared spectroscopy:
aureus and its food poisoning toxins: characterization and out- serum-based detection of bovine spongiform encephalopathy.
break investigation. FEMS Microbiol Rev. 2012;36(4):815–36. Anal Bioanal Chem. 2007;387:1801–7. https://​doi.​org/​10.​1007/​
https://​doi.​org/​10.​1111/j.​1574-​6976.​2011.​00311.x. s00216-​006-​1070-5.
2. Stewardson AJ, Allignol A, Beyersmann J, Graves N, Schumacher M, 16. Bailey DJ, Rose CM, McAlister GC, Brumbaugh J, Yu P, Wenger
Meyer R, Tacconelli E, De Angelis G, Farina C, Pezzoli F, Bertrand CD, Westphall MS, Thomson JA, Coon JJ. Instant spectral assign-
X, Gbaguidi-Haore H, Edgeworth J, Tosas O, Martinez JA, Ayala- ment for advanced decision tree-driven mass spectrometry. Proc
Blanco MP, Pan A, Zoncada A, Marwick CA, Nathwani D, Seifert Natl Acad Sci USA. 2012;109(22):8411–6. https://​doi.​org/​10.​
H, Hos N, Hagel S, Pletz M, Harbarth S, TIMBER Study Group. 1073/​pnas.​12052​92109.
The health and economic burden of bloodstream infections caused 17. Li Y, Zhang JY, Wang YZ. FT-MIR and NIR spectral data fusion:
by antimicrobial-susceptible and non-susceptible Enterobacteriaceae a synergetic strategy for the geographical traceability of Panax
and Staphylococcus aureus in European hospitals, 2010 and 2011: a notoginseng. Anal Bioanal Chem. 2018;410:91–103. https://​doi.​
multicentre retrospective cohort study. Euro Surveill. 2016;21:30319. org/​10.​1007/​s00216-​017-​0692-0.
https://​doi.​org/​10.​2807/​1560-​7917.​ES.​2016.​21.​33.​30319. 18. Ali MH, Rakib F, Alsaad K, Al-Saady R, Lyng FM, Goormaghtigh
3. Elmonir W, Abo-Remela E, Sobeih A. Public health risks of E. A simple model for cell type recognition using 2D-correlation
Escherichia coli and Staphylococcus aureus in raw bovine analysis of FTIR images from breast cancer tissue. J Mol Struct.
milk sold in informal markets in Egypt. J Infect Dev Ctries. 2018;1163:472–9. https://d​ oi.o​ rg/1​ 0.1​ 016/j.m​ olstr​ uc.2​ 018.​03.​044.
2018;12:533–41. https://​doi.​org/​10.​3855/​jidc.​9509. 19. Sick-Samuels AC, Goodman KE, Rapsinski G, Colantouni E, Mil-
4. Hardtstock F, Heinrich K, Wilke T, Mueller S, Yu H. Burden stone AM, Nowalk AJ, Tamma PD. A decision tree using patient
of Staphylococcus aureus infections after orthopedic surgery in characteristics to predict resistance to commonly used broad-
Germany. BMC Infect Dis. 2020;20:233. https://​doi.​org/​10.​1186/​ spectrum antibiotics in children with gram-negative bloodstream
s12879-​020-​04953-4. infections. J Pediatric Infect Dis Soc. 2019;9(2):142–9. https://​
5. Kourtis AP, Hatfield K, Baggs J, Mu Y, See I, Epson E, Nadle doi.​org/​10.​1093/​jpids/​piy137.
J, Kainer MA, Dumyati G, Petit S, Ray SM, Ham D, Capers C, 20. Diehn S, Zimmermann B, Tafintseva V, et al. Discrimination of
Ewing H, Coffin N, McDonald LC, Jernigan J, Cardo D. Vital grass pollen of different species by FTIR spectroscopy of indi-
signs: epidemiology and recent trends in methicillin-resistant and vidual pollen grains. Anal Bioanal Chem. 2020;412:6459–74.
in methicillin-susceptible Staphylococcus aureus bloodstream https://​doi.​org/​10.​1007/​s00216-​020-​02628-2.
infections - United States. MMWR Morb Mortal Wkly Rep. 21. Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New
2019;68(9):214–9. https://​doi.​org/​10.​15585/​mmwr.​mm680​9e1. York: Wiley; 2000. p. 394–410.
6. Maniscalco M, Vitale C, Vatrella A, Molino A, Bianco A, Maz-
zarella G. Fractional exhaled nitric oxide-measuring devices: tech- Publisher's note Springer Nature remains neutral with regard to
nology update. Med Devices (Auckl). 2016;9:151–60. https://d​ oi.​ jurisdictional claims in published maps and institutional affiliations.
org/​10.​2147/​MDER.​S91201.

13

You might also like