Decision tree–based identifcation of Staphylococcus aureus
Received: 25 May 2021 / Revised: 10 September 2021 / Accepted: 11 October 2021 / Published online: 23 October 2021
© The Author(s) 2021
In this study, eight types of bacteria were cultivated, including Staphylococcus aureus. The infrared absorption spectra
of the gas surrounding cultured bacteria were recorded at a resolution of 0.5 cm−1 over the wavenumber range of 7500–
500 cm−1. From these spectra, we searched for the infrared wavenumbers at which characteristic absorptions of the gas
surrounding Staphylococcus aureus could be measured. This paper reports two wavenumber regions, 6516–6506 cm−1 and
2166–2158 cm−1. A decision tree–based machine learning algorithm was used to search for these wavenumber regions. The
peak intensity or the absorbance difference was calculated for each region, and the ratio between them was obtained. When
these ratios were used as training data, decision trees were created to classify the gas surrounding Staphylococcus aureus
and the gas surrounding other bacteria into different groups. These decision trees show the potential effectiveness of using
absorbance measurement at two wavenumber regions in finding Staphylococcus aureus.
Keywords Bacteria identification · Decision tree · Infrared absorption spectra · Machine learning · Staphylococcus aureus
wavenumber information to realize accurate metabolite −1, soybean protein digest 5.0 g L−1, growth factors 2.2 g
detection via infrared spectroscopy [12, 13]. L−1, NaCl 5.0 g L −1, agar 13.0 g L
−1, and 5% defibrinated
The proposed research employs a decision tree–based sheep blood (Nissui Plate Sheep Blood Agar 51,001, Nissui
machine learning algorithm to detect the wavenumber Pharmaceutical Co., Ltd.). Only one type of bacteria was
range across which the infrared absorption spectra peculiar planted in each Petri dish. Two Petri dishes planted with the
to S. aureus can be recorded. Recent advances in machine same type of bacteria were placed in an airtight container,
learning technology have demonstrated its significant poten- and the bacteria were cultured in an incubator while inside
tial with regard to efficiently handling classification tasks the airtight container. The culturing was performed at a tem-
involving large quantities of data that cannot be analyzed perature of 37 °C over a period of approximately 40 h. Post-
by humans; moreover, it can handle data that are too small culturing, the gas surrounding the bacterium was aspirated
in value to be distinguished using ordinary analysis meth- into gas bags (smart bag PA, smart bag 2F, Tedlar bag or
ods [14–20]. This paper presents an approach to classify ANALYTIC-BARRIER Bag, GL Sciences) by the indirect
infrared absorption spectra via machine learning. A dataset sampling method using a dry pump, as shown in Fig. 1. Each
comprising approximately 5 × 109 data points was used to gas bag was equipped with a standard sleeve (6–7 mm O.D.)
this end. The results obtained in this study demonstrate the connected to a silicone tube within 1 m. The other end of the
realization of accurate bacterium-propagation detection in silicone tube was placed within 5 cm of the object.
the gas surrounding S. aureus via infrared irradiation of the The number of cultures is shown in the “Number of sam-
odorous surrounding space. ples” column in Table 1. One gas sample was obtained from
one airtight container. Absorbance measurements were per-
formed multiple times for each sample. The number of infra-
Materials and methods red absorption spectra obtained for each sample is shown in
parentheses in the “Total number of measurements” column.
Sample collection For example, S. aureus was cultured four times, each gas
sample obtained in each culture was measured four times,
Eight common types of bacteria were cultivated in this and a total of 16 spectra were obtained.
study. Table 1 lists these bacteria types considered along
with their specifications. The S. aureus species considered
in this study include the standard type (Id: Sa) and its drug-
resistant strain (mrSa). All bacteria types were cultivated
in sheep-blood medium consisting of casein peptone 13.0 g
Table 1 Eight facultative anaerobic bacteria species cultured in this was cultivated seven times, and a sample of the gas around the bac-
study in sheep-blood medium. Staphylococcus aureus and Pseu- terium was collected in each culture. In the case of mrSa, the absorb-
domonas aeruginosa were cultivated in two types, a standard bac- ance was measured once to four times for each sample. The number
terium and a drug-resistant bacterium. For the other six species of in parentheses in “Total number of measurements” is the number of
bacteria, either standard bacteria or drug-resistant bacteria were used absorption spectra measurements for each sample, and the total num-
as samples. “Number of samples” is equal to the number of cultures. ber of infrared absorption spectra (Total number of measurements) is
For example, methicillin-resistant Staphylococcus aureus (ID: mrSa) 1 + 1 + 4 + 4 + 4 + 4 + 4 = 22 spectra
Air around bacteria Genus species ID Strain/origin Number Total number of measurements Incuba-
of sam- tion time
ples (h)
Depth = 0
Peak intensity (6514.3 ± 1.3 cm-1) / Peak intensity (2163.4 ± 0.6 cm-1) <= 2.441
gini = 0.341
samples = 174
value = [38, 136]
True False
Depth = 1
Peak intensity (6893.6 ± 1.4 cm-1) / Peak intensity (2164.2 ± 0.2 cm-1) <= 829.091
gini = 0.05
gini = 0.0
samples = 39 samples = 135
value = [38, 1] value = [0, 135]
Fig. 3 Decision tree generated by Method 1 (using peak intensity groups depending on whether the peak intensity ratio was less than
ratio as training data). The spectrum of S. aureus, Sa and mrSA (38 or exceeded 2.441. All Staphylococcus aureus spectra were classified
spectra), and the spectrum of other bacteria (136 spectra) were sepa- in the group with a peak intensity ratio of 2.441 or less, and only one
rated. The numbers in parentheses are the wavenumber ranges that spectrum of other bacteria was included in this group
include the peaks. At Depth 0, the spectra were divided into two
6515.3 cm-1 (a) Sa (b) mrSa (c) Kp (d) Ec (e) Enc
6514.2 cm-1
6515.6 cm-1 6507.6 cm-1
(f) Abb (g) Pa1 (h) Pa2 (i) Scp (j) Ena
6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506
Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1
-1 2158.8 cm-1
2166.3 cm
2158.4 cm-1
(p) Abb (q) Pa1 (r) Pa2 (s) Scp (t) Ena
2166 2164 2162 2160 2158 2166 2164 2162 2160 2158 2166 2164 2162 2160 2158 2166 2164 2162 2160 2158 2166 2164 2162 2160 2158
Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1
Fig. 5 Absorbance from 6518.7 to 6504.2 cm−1 (a–j) and that from (6514.3 ± 1.3 cm−1 and 2163.4 ± 0.6 cm−1) where the peaks selected
2167.0 to 2157.5 cm−1 (k–f). All 174 spectra used in this study are by the decision tree shown in Fig. 3 were observed. The absorb-
plotted. The colored curves are the spectrum of the gas around each ances measured from 6515.6 to 6507.6 cm−1 and from 2158.4 to
bacterium, and the solid black curves are the curves obtained by 2166.3 cm−1 were used when creating training data in Methods 2
averaging the absorbance. These wavenumber regions (from 6518.7 and 3. Vertical axis values were adjusted to ensure the absorbance at
to 6504.2 cm−1, from 2167.0 to 2157.5 cm−1) include the regions 6515.3 and 2158.8 cm−1 equals zero
spectra of mixed gases on the spot, it is desirable to fix the Classification performed using three absorbance val-
wavenumber when measuring the absorbance; for this, an ues The absorbance values measured at three points within
infrared-light-emitting diode can be used. each wavenumber domain (six points in total) were used
to classify the infrared absorption spectra. This approach
Considering this application, we also formulated a procedure is similar to that considering peak intensity. However, the
for classifying the difference between absorbance values at two wavenumbers corresponding to the absorbance values used
points with constant wavenumbers. In this method, training data in this calculation were maintained constant. This method
were created using the absorbance values in the wavenumber used the difference in the shape of the spectrum to classify
region where the peak specific to S. aureus was detected, the spectra. Similar to Method 2, the absorbance values
which was found in Method 1. The wavenumber regions were at three points were selected for the wavenumber region
6515.6–6507.6 cm−1 and 2158.4–2166.3 cm−1. Each region where the peak intensity was detected. We calculated a
contained 67 absorbance data. The difference between each straight line connecting the point with the lowest wave-
absorbance value and another absorbance value was calculated for number and the point with the highest wavenumber. A
each of the two regions. The difference in absorbance was an array vertical line was drawn from the remaining one point to a
of 2211 values in each region. We calculated the ratio of the values straight line, and the difference in absorbance was calcu-
in both arrays and created an array of 2211 × 2211 = 4.89 × 106 lated along the vertical line. Training data were created by
values. Subsequently, as in Method 1, the arrays corresponding replacing the ratio of the difference values on the vertical
to each of the 174 spectra were arranged in the row direction to be line with the ratio of the difference in absorbance used in
training data, and classes (i) and (ii) labels were assigned. Method 2.
Classification by peak intensity ratio
Table 3 Average and standard deviation of the absorbance of 6514.2 cm−1 and 2163.4 c m−1 of the gas around the bacteria. The average values are the points on the solid curves in Fig. 5
Figure 3 illustrates the decision tree generated by Method
1. The spectral separation was performed in two stages—
“Depths 0–1.” The training data comprised 174 spectra,
including two classes—peripheral gases obtained by cul-
turing (i) S. aureus and (ii) other bacteria, containing 136
and 38 spectra, respectively. During Depth 0 classification,
99% separation was achieved, and the spectra were classi-
tion relation for Depth 0 (Fig. 3). Most spectra with peak
intensity ratios of 2.441 or less were included in the class
containing S. aureus.
Figure 5 shows the spectrum at approximately 6514.3
and 2163.4 cm−1. All spectra were overlaid for each bacte-
ria type. The black lines are the average curves of absorb-
2163.4 cm−1.
Standard deviation
2163.4 cm−1
tra of class (ii) were mixed in the group with the value
less than 3.341, but any spectra other than the two spectra
belonging to class (ii) were classified according to the The spectra in this figure are curves of the average value
label. Figure 7 shows the distribution of the absorbance of absorbance shown by the black line in Fig. 5. In Figs. 8a
difference. The slope of the straight line is 3.341, which and b, the absorbance values are translated so that the
is the threshold at Depth 0. absorbances of 6509.8 c m −1 and 2164.9 c m −1 are zero.
Figure 8 shows the four wavenumbers (6514.2, 6509.8, Therefore, the value in Eq. (1) is the ratio of the value on
2163.1, and 2164.9 cm−1) extracted by machine learning. the vertical axis of 6514.2 cm−1 to the value on the vertical
axis of 2163.1 cm−1.
Classification performed using three absorbance
Absorbance (6514.2 cm-1—6509.8 cm-1)
(a) The boundary value for this indicator is − 3.189. Fig-
6514.2 cm-1 6509.8 cm-1
ure 10 depicts the distribution obtained by this method. Fig-
ure 11 shows the wavenumbers used to create the decision
tree. 6514.4 and 6511.8 cm−1 are the wavenumbers that give
the peak absorbance, and 6513.2 cm−1 is the wavenumber
near the bottom. It can be seen that the classification was
0.006 Ec
Scp performed using the absorbance of the two peaks on both
sides of 6513.2 cm−1. By contrast, it can be seen that the
Pa2 Ena
wavenumbers of 2164.0, 2163.1, and 2162.6 cm−1 were
selected to emphasize the absorbance change near the peak
with 2163.4 cm−1.
6516 6514 6512
Wavenumber /cm-1
6510 6508
Relationship between infrared absorption spectrum
and volatile organic compounds
2163.1 cm-1
Mass spectrometer studies have shown that S. aureus
0.020 2164.9 cm-1 releases isovaleric acid and 2-methyl-butanal [9]. Therefore,
the infrared absorption spectra of the two volatile organic
compounds and the spectra of the gas around the S. aureus
in Fig. 12. The spectra of isovaleric acid and 2-methyl- exist alone, but are bound to other molecules, including
butanal are displayed superimposed on the spectra of the water.
gas around the bacteria. The absorbances of isovaleric acid 2. By synthesizing the spectra of isovaleric acid and
and 2-methyl-butanal were each multiplied by a certain 2-methyl-butanal, each containing water, it was possible
magnification. The magnifications were determined by to create a curve with shapes close to the spectra of Sa
the least-squares method using the absorbances of Sa and and mrSa. Therefore, the partial pressure of both VOCs
mrSa. In regions I, III, and V, the wavenumbers that give is thought to affect the absorbance in the wavenumber
peaks in the spectra of Sa and mrSa and the wavenum- region extracted by machine learning.
bers that give peaks in the spectra of isovaleric acid and 3. The black line did not completely reproduce the Sa
2-methyl-butanal were the same. However, regarding the and mrSa spectra of regions II, IV, and V. In particu-
absorbance of region II, the absorbance of the reagent was lar, region II is an important region for distinguishing
higher than that of Sa and mrSa. In addition, in regions S. aureus from other bacteria. We surmise it could not
IV and VI, the absorbance of isovaleric acid was low, and be reproduced because many molecules other than
the absorbance of 2-methyl-butanal was higher than that isovaleric acid, 2-methyl-butanoic, and water were pre-
of Sa and mrSa. The black solid lines in Fig. 10 (a) and (d) sent in the mixed gas, and the molecules influenced each
are curves obtained by adding the spectra of the mixed gas other. In other words, it can be said that it is impossible
of isovaleric acid and water and that of the mixed gas of to distinguish bacteria by a deductive method that pre-
2-methyl-butanal and water. The absorbance of isovaleric
acid and water and the absorbance of 2-methyl-butanal (a)
and water were multiplied by different magnifications. All 6514.4 cm-1 6511.8 cm-1
coefficients were calculated by the least-squares method.
The shape of the black curve (isovaleric acid + 2-methyl- 0.010 6512.8 cm-1
2163.1 cm-1
-0.001 mrSa 2164.0 cm-1
Ec 0.015
-0.003 Ena
-0.004 -0.005
0.015 Isovaleric acid + water (a) (b) (c)
utanall + wate
water Isovaleric acid 2-Methylbutanal
Isovaleric acid + 2-Met
utanal + water
Sa Kp
mrSa Enc
0.010 Ec Abb
Pa1 Scp
6509.8 cm-1
Pa2 Ena
6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506 6518 6516 6514 6512 6510 6508 6506
Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1
(d) (e) (f)
0.025 m-1
2164.9 cm
2166 2164 2162 2160 2158 2166 2164 2162 2160 2158 2166 2164 2162 2160 2158
Wavenumber /cm-1 Wavenumber /cm-1 Wavenumber /cm-1
Fig. 12 Absorbance curves of isovaleric acid, 2-methyl-butanoic, curves. The red dotted curve is the average value of absorbance of
a mixture of isovaleric acid and water, and a mixture of 2-methyl- Sa, and the blue dotted curve is that of mrSa. The red solid line in b
butanoic and water. Three thick solid curves are drawn in a. The three is the absorbance of isovaleric acid, and the blue solid line in c is the
curves are the absorbance curve of isovaleric acid and water, that of absorbance of 2-methyl-butanoic
2-methyl-butanal and water, and that obtained by combining their two
dicts the spectrum of a mixed gas by superimposing the 1. Considering the peak absorbance value and correspond-
infrared absorption spectra of molecules detected by a ing minima on either side of this peak. The baseline
mass spectrometer. Therefore, an inductive method for (minimum) values corresponding to the two adjacent
measuring the spectrum of an actual sample and search- points were subtracted from the peak value to determine
ing for a characteristic wavenumber using machine the peak intensity.
learning is necessary for the classification of mixed 2. Considering the absorbance values at two points (cor-
gases. responding to fixed wavenumbers) and calculating the
difference between them.
3. Considering the absorbance values at three points with
Conclusions constant wavenumbers, and of these points, we calcu-
lated a straight line connecting the two points on both
This paper presented an approach for analyzing the infra- sides. Subsequently, a vertical line was drawn from the
red absorption spectra of gases surrounding bacteria using remaining points to a straight line, and the difference in
a decision tree–based machine learning algorithm. The absorbance was calculated on the vertical line.
proposed method offers an effective means to determine
the presence (or absence) of the S. aureus bacteria. The The results of this study reveal that all three methods are
wavenumber corresponding to the characteristic absorb- equally capable of identifying the gas produced by S. aureus.
ance value of a spectrum can be determined using the deci- Thus, this study is the first of its kind to confirm the feasi-
sion tree algorithm. In this study, spectral classification was bility of using infrared adsorption spectra to measure and
performed considering the differences between absorbance monitor the growth of S. aureus. The findings of this study
values corresponding to two or three points in the infrared are expected to afford humanity the realization of several
absorption spectra. These differences were calculated using health benefits arising from the use of such technologies in
the following methods. medical practice.
