PaperThe following article is Open access

Machine-learning-based diabetes classification method using blood flow oscillations and Pearson correlation analysis of feature importance

, , , and

Published 25 October 2024 © 2024 The Author(s). Published by IOP Publishing Ltd
, , Citation Hanbeen Jung et al 2024 Mach. Learn.: Sci. Technol. 5 045024DOI 10.1088/2632-2153/ad861d

2632-2153/5/4/045024

Abstract

Diabetes is a global health issue affecting millions of people and is related to high morbidity and mortality rates. Current diagnostic methods are primarily invasive, involving blood sampling, which can lead to infection and increased patient stress. As a result, there is a growing need for noninvasive diabetes diagnostic methods that are both accurate and fast. High measurement accuracy and fast measurement time are essential for effective noninvasive diabetes diagnosis; these can be achieved using diffuse speckle contrast analysis (DSCA) systems and artificial intelligence algorithms. In this study, we use a machine learning algorithm to analyze rat blood flow signals measured using a DSCA system with simple operation, easy fabrication, and fast measurement for helping diagnose diabetes. The results confirmed that the machine learning algorithm for analyzing blood flow oscillation data shows good potential for diabetes classification. Furthermore, analyzing the blood flow reactivity test revealed that blood flow signals can be quickly measured for diabetes classification. Finally, we evaluated the influence of each blood flow oscillation data on diabetes classification through feature importance and Pearson correlation analysis. The results of this study should provide a basis for the future development of hemodynamic-based disease diagnostic methods.

Export citation and abstractBibTeXRIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Diabetes is a serious global health issue with high morbidity and mortality, and its incidence is increasing. Diabetes is characterized by hyperglycemia, and it can cause various complications such as diabetic neuropathy, diabetic kidney disease, and cardiovascular disease (CVD) [15]. CVD is the leading cause of death in diabetics, accounting for 44% of people with type 1 diabetes and 52% of people with type 2 diabetes [6]. Diabetics have a threefold increase in cardiovascular mortality compared to nondiabetics, and younger diabetics have an even higher risk of CVD [7, 8]. Early diagnosis and ongoing diabetes monitoring to prevent diabetes complications are crucial in diabetes management [9].

According to the American Diabetes Association, the following criteria are currently used to diagnose diabetes: A1C $\unicode{x2A7E}$6.5%, fasting plasma glucose $\unicode{x2A7E}$126 mg dl−1, 2 h plasma glucose during an oral glucose tolerance test $\unicode{x2A7E}$200 mg dl−1, and random plasma glucose $\unicode{x2A7E}$200 mg dl−1 [10]. Diagnostic methods test for blood components, making it essential to disinfect the measurement site with alcohol and collect blood. However, repeated blood collection can cause additional stress in diabetics.

To address these issues, noninvasive diabetes diagnostic methods have been developed recently to enable bloodless diabetes monitoring. These methods analyze biological materials such as patient secretions, including breath, urine, tears, and blood glucose levels, all of which are changed in diabetics [1113]. Diabetics typically have high blood glucose levels that cause changes in the electrical and optical properties of their bodies. Therefore, methods are being developed for noninvasive diabetes monitoring [1416]. However, these methods have limitations such as a low signal-to-noise ratio, low sensitivity, long measurement time, and discontinuity. Therefore, more precise and advanced analysis techniques are required.

To overcome the abovementioned limitations, rapidly advancing artificial intelligence (AI) algorithms have recently been applied to monitor diseases such as diabetes [1719]. Most AI-based diabetes diagnosis studies use medical data, including age, pulse rate, blood pressure, body circumference, and cholesterol level, to classify the presence of diabetes [2022]. Their high accuracy and area under the curve (AUC) scores have demonstrated the usefulness of these AI algorithms. However, most of these algorithms require long measurement times and many different types of instruments to obtain medical data.

Microcirculatory blood flow reflects physiologic and pathologic changes, and its signals can be measured noninvasively and easily for real-time monitoring and periodic measurements [23]. Thus, such signals can be beneficial for diagnosing diabetes because doing so requires long-term, rather than episodic, monitoring. A diffuse speckle contrast analysis (DSCA) system was developed to noninvasively monitor changes in microcirculatory blood flow in deep tissues by using near-infrared light to measure and analyze the fluctuations in speckles caused by the movement of red blood cells [24]. DSCA systems afford advantages such as low cost, straightforward analysis, and simple experimental setup, and they can be used to noninvasively measure and analyze blood flow changes in humans and animals [2530]. DSCA systems can be further advanced in various ways to enhance their clinical applications and diagnostic capabilities [3135]. The DSCA system shows great potential for clinical applications because it can be used to find the correlation between blood flow and metabolism in humans and animals.

The correlation between diabetes and microcirculatory blood flow has been studied in humans and animals. Some studies have demonstrated the potential for early diagnosis of diabetes by analyzing diabetes-induced changes in blood flow reactivity [36, 37]. Other studies have investigated whether diabetes affects specific oscillations in microcirculatory blood flow [38]. Microcirculatory blood flow signals have specific oscillations, and the correlation between these oscillations and metabolism has been studied in humans and animals. Human blood flow oscillations have five characteristic peaks between 0.01 and 2 Hz that are associated with heart rate, respiratory, myogenic, neurogenic, and metabolic activities [39]. Similarly, rats show five characteristic peaks associated with metabolic activity between 0.01 and 5 Hz [40, 41]. Although studies have actively investigated the correlation between diabetes and microcirculatory blood flow, studies of how blood flow signals can be used to diagnose diabetes remain lacking.

In this study, a machine learning approach was developed and validated to help classify and diagnose diabetes based on the blood flow signals measured using a DSCA system. This approach was applied to control and diabetes rats, and a diabetes classification AUC score of 0.853 was obtained by classifying blood flow oscillation data calculated using a wavelet transform with a machine learning algorithm. We further analyzed the reactive hyperemia test results to show that control and diabetes blood flow signals can be classified using machine learning algorithms without performing additional blood flow reactivity experiments. We used Pearson correlation analysis to confirm the correlation between diabetes classification performance and blood flow oscillation data of cardiac activity. These results suggest that machine learning analysis of blood flow signals measured over a short time could be a helpful tool for diabetes classification.

2. Materials and methods

2.1. Experimental animal preparation

The Animal Experiment Ethics Committee of Daegu Gyeongbuk Institute of Science and Technology approved this experimental protocol (approval no. DGIST-IACUC-21040201-0002). Twenty male Sprague-Dawley rats (7–8 weeks old, with an average weight of approximately 240 g at the beginning of the experiment) were divided into two groups of 10 animals each: a control group and a diabetes group. In the diabetes group, diabetes was induced by intraperitoneal injection of streptozotocin (STZ; 65 mg kg−1, Sigma-Aldrich Co.) dissolved in 0.01 M sodium citrate buffer (pH 4.5) after fasting for at least 12 h [42]. Rats in the control group were injected with only 0.01 M sodium citrate buffer (pH 4.5). Forty-eight hours after the STZ injection, blood was collected from the tail vein of rats in the diabetes group, and the blood glucose level was measured using a glucose meter (Accu-Chek Performa, Roche Diagnostic). A 270 mg dl−1 or higher blood glucose level indicated diabetes [43]. Then, the rats' blood glucose levels, body weight, and blood flow signals were measured weekly for ten weeks.

2.2. Blood flow measurement system and signal acquisition

The blood flow signals of rats in the control and diabetes groups were measured using our custom-built DSCA system. This system consisted of an 830 nm wavelength laser (DL-830-100SO, CrystaLaser, 830 nm, 100 mW) and a charge-coupled device camera (F-033B, Stingray, cell size: $9.9\,\,\mu\mathrm{m}$) [44]. The DSCA system probe was made of polydimethylsiloxane with a distance of 5 mm between the source and detector. It was placed on the rat's right hind paw (figure 1). The DSCA system measured blood flow signals from all rats weekly for ten weeks after anesthesia with isoflurane (2.0%; 1.5 L min−1). The DSCA signals were recorded at a sampling rate of 60 Hz using a fast pulsatile blood flow measurement method, and a reactive hyperemia test was performed to induce blood flow changes [44]. The reactive hyperemia test is used to noninvasively assess the peripheral microvascular function by briefly inducing ischemia through arterial occlusion [45]. In this study, an elastic band was used to briefly induce ischemia in the thigh of a rat during blood flow signal measurements (figure 1).

Figure 1. Refer to the following caption and surrounding text.

Figure 1. Measurement of a rat's right hind paw blood flow signal using a DSCA system. An elastic band was used to create blood flow signal changes.

Standard image High-resolution image

The blood flow signals measured in the reactive hyperemia test were divided into three periods: baseline, occlusion, and release (figure 2). The rats' blood flow signals were measured at 1 min 40 s (baseline), 2 min 20 s (occlusion), and 6 min (release). The baseline is the usual period without stimulation; occlusion is when the rat's thigh is compressed with an elastic band to cause ischemia; and release is the period after the end of ischemia when blood flow temporarily increases significantly and stabilizes.

Figure 2. Refer to the following caption and surrounding text.

Figure 2. Blood flow signal measurement in rats with reactive hyperemia. The measured blood flow signal can be divided into three periods: baseline, occlusion, and release.

Standard image High-resolution image

2.3. Blood flow signals preprocessing

The blood flow signals in the baseline, occlusion, and release periods were calculated as metabolism-related blood flow oscillations from 0.01 to 5 Hz by continuous wavelet transform using Morlet wavelets, which were used as input data for machine learning algorithms (figure 3).

Figure 3. Refer to the following caption and surrounding text.

Figure 3. Preprocessing to calculate relative magnitude from blood flow signals. Obtaining blood flow oscillation data by wavelet transform of the blood flow signal and calculating the average magnitude of the blood flow oscillation data over frequency ranges.

Standard image High-resolution image

The blood flow oscillation data is calculated as the relative magnitude, the ratio of the average magnitude of five frequency bands related to the rat's metabolism [46]. Rat blood flow oscillations of 2–5 Hz, 0.7–2 Hz, 0.2–0.74 Hz, 0.08–0.2 Hz, and 0.01–0.08 Hz are associated with cardiac activities, respiratory activities [40], myogenic activities, neurogenic activities, and endothelial-related metabolic activities [41], respectively. All preprocessing code was implemented using MATLAB.

2.4. Blood flow signal classification in machine learning algorithms

We analyzed and compared the performance of three machine learning algorithms, namely, gradient tree boosting, random forest, and Adaboost, to determine the classification potential of control and diabetes blood flow signals. The gradient tree boosting algorithm increases the model performance by sequentially combining weak learners to reduce the residual [47, 48]. It suffers from being easy to overfit; however, it generally performs better than the random forest algorithm [49]. The random forest algorithm builds a model with better performance by constructing multiple decision trees. Each decision tree is trained with bootstrap samples extracted from the training set. The tree is constructed by searching for the best split by randomly selecting a subset of the input variables at each node split. Consequently, the generalization error converges to a limit as the number of decision trees increases [50]. The Adaboost algorithm was the first in which the boosting method was used to increase the model performance by adjusting the sample weights while combining weak learners [51]. Like the gradient tree boosting algorithm, the Adaboost algorithm uses a boosting method to create a strong learner by sequentially combining weak learners; however, it performs poorer than the gradient tree boosting algorithm [52].

The 200-input data (100 control, 100 diabetes) were randomly divided into 160 (80 control, 80 diabetes) training data and 40 (20 control, 20 diabetes) test data for use in the machine learning algorithm (figure 4). The data was split using the train_test_split function, and then normalized using the MinMaxScaler function to ensure consistent data scaling across all models. The hyperparameters of the machine learning algorithm were tuned by applying fivefold cross-validation to the training data. For all three models–gradient tree boosting, random forest, and Adaboost–three hyperparameters were adjusted: min_samples_split (ranging from 2 to 101), max_features (ranging from 1 to 5), and n_estimators (ranging from 1 to 150), to improve the model's performance. Additionally, for gradient tree boosting and Adaboost, the learning_rate (ranging from 0.01 to 1) was also optimized. The hyperparameters that maximized the average validation accuracy were adopted. The validation accuracy, test accuracy, and AUC score were calculated to evaluate the classification performance of the three algorithms on the training and test data. Validation accuracy refers to the accuracy obtained by fivefold cross-validation using training data, and test accuracy refers to the accuracy obtained with the test data using the trained model. The AUC score is an effective way to evaluate the overall classification performance of a model; it is calculated as the area under the receiver operating characteristic curve. It takes a value between 0 and 1 [53]. We obtained ten validation accuracy, test accuracy, and AUC scores and evaluated the classification performance in terms of the average value of these scores. The best machine learning algorithm was selected for feature importance and Pearson correlation analysis.

Figure 4. Refer to the following caption and surrounding text.

Figure 4. Flowchart showing the creation and machine learning analysis of a blood flow oscillation dataset using blood flow signals. Blood flow signals were obtained from 20 rats measured for 10 weeks.

Standard image High-resolution image

Through feature importance analysis, we confirmed the change in importance according to the machine learning algorithm and input data period. We obtained the Pearson correlation coefficient between the importance of each feature and the diabetes classification performance. By determining the correlation between the importance of blood flow oscillation data and the diabetes classification performance, we investigated which features most impact the classification of control and diabetes blood flow signals. The feature importance was calculated based on the Gini impurity. All code was implemented using Python and MATLAB.

3. Result & discussion

3.1. Blood glucose level, body weight, and blood flow signal analysis result

This study measured blood glucose and body weight to confirm that diabetes was maintained during the experimental period. After STZ injection, the blood glucose levels of rats in the diabetes group increased significantly. In contrast, that of rats in the control group showed little change (figure 5(A)). For ten weeks after the STZ injection, the blood glucose level of rats in the control group remained between 100 and 200 mg dl−1. By contrast, rats in the diabetes group consistently remained above 500 mg dl−1 on average. In addition, the body weight of the rats in the control group continued to increase significantly. By contrast, that of the rats in the diabetes group increased very slowly (figure 5(B)). In a previous study in which STZ was used to induce diabetes in rats, the body weight of the diabetes group increased much slower than that of the control group. The blood glucose level of the diabetes group remained above 400 mg dl−1 [42]. Observations of the changes in the blood glucose level and body weight before and after the STZ injection showed that diabetes was maintained in the rats in the diabetes group throughout the experiment.

Figure 5. Refer to the following caption and surrounding text.

Figure 5. Results of measuring blood glucose level and body weight of rats for 10 weeks. x-axis indicates the number of weeks after STZ injection. (A) Blood glucose level measurement result and (B) body weight measurement result.

Standard image High-resolution image

When analyzing blood flow signals for diabetes classification, the anesthesia status can be considered a factor influencing the results. Humeau et al (2007) reported that deeper anesthesia significantly reduces myogenic and neurogenic activities compared to lighter anesthesia, suggesting that anesthesia could affect diabetes classification based on blood flow signals [54]. However, in our study, since the blood flow oscillations corresponding to cardiac activities primarily affected diabetes classification, we think that the impact of anesthesia on classification performance was minimal.

Additionally, during the reactive hyperemia test, we observed a temporary increase in blood flow at the end of the baseline period and the start of the occlusion period. According to Rosenberry et al (2020), when a specific area is compressed with a cuff or a device with similar functionality, the tissue experiences hypoxia, strongly inducing vasodilation [45]. This vasodilation leads to a transient increase in blood flow, followed by a rapid decrease when additional blood flow cannot be supplied, explaining the observed phenomenon.

Moreover, to determine the influence of body weight on blood flow signals, we analyzed the correlation between body weight and the mean values of raw blood flow signals and the five types of blood flow oscillations (cardiac, respiratory, myogenic, neurogenic, and metabolic). The Pearson correlation analysis revealed little to no correlation, indicating that body weight does not influence the raw blood flow signals or blood flow oscillations. Thus, body weight is not expected to impact the classification results in our study significantly.

3.2. Classification results of machine learning algorithms

In this study, our custom-built DSCA system was used to measure the blood flow signals of rats in the control and diabetes groups. The measured blood flow signals were calculated as the ratio data of blood flow oscillations through wavelet transform, and they were used as input data for machine learning algorithms. The performance of the machine learning algorithms was evaluated in terms of the validation accuracy, test accuracy, and AUC score. The importance of blood flow oscillations and the correlation between the classification performance of blood flow signals in the control and diabetes groups were determined using feature importance and Pearson correlation analysis.

3.2.1. Performance evaluation results of machine learning algorithms for diabetes classification

As mentioned earlier, the three machine learning algorithms were trained by randomly splitting the calculated input data into training and test data in an 8:2 ratio. The input data was calculated from the blood flow signal data of four periods (baseline, occlusion, release, entire), where the entire period includes blood flow oscillation data of the baseline, occlusion, and release periods. The classification results were validated in terms of the validation accuracy, test accuracy, and AUC score by repeating the above process ten times. Figures 6(A)–(C) respectively show the classification results of the gradient tree boosting, random forest, and Adaboost algorithms. The gradient tree boosting algorithm using the baseline period data showed the highest classification performance among the results of the three machine learning algorithms, with validation accuracy, test accuracy, and AUC scores of 0.814 ± 0.012, 0.823 ± 0.025, and 0.853 ± 0.027, respectively. Figure 6(D) shows the ROC curves of the gradient tree boosting algorithm for the four periods with the highest classification performance, visualizing the classification accuracy across each period. AUC scores of approximately 0.5, 0.8–0.9, and more than 0.9 indicate poor, excellent, and outstanding classification performance, respectively [55]. The diabetes classification performance of the gradient tree boosting algorithm can be considered excellent. A comparison of the classification results obtained using input data from the baseline, occlusion, release, and entire periods showed that the baseline period data had the best performance. This is because each part of the blood flow signal obtained by the reactive hyperemia test has different characteristics [45]. The baseline period blood flow reflects stable metabolic information. The occlusion period blood flow contains very little metabolic information. The release period blood flow fluctuates over time, leading to considerable variations in the obtained results. Therefore, blood flow signals from the occlusion and release periods are unsuitable input data for machine learning algorithms to classify diseases. We performed a feature importance analysis to evaluate the impact of each feature in the input data on the obtained diabetes classification results.

Figure 6. Refer to the following caption and surrounding text.

Figure 6. Evaluation score changes over four periods for three machine learning algorithms: (A) gradient tree boosting, (B) random forest, and (C) Adaboost. (D) ROC curves for the gradient tree boosting algorithm across four periods: baseline, occlusion, release, and entire.

Standard image High-resolution image

3.2.2. Feature importance analysis result

The impact of diabetes classification results on five features of input data, namely, the blood flow oscillations of cardiac, respiratory, myogenic, neurogenic, and metabolic activities, was observed by feature importance analysis. In feature importance analysis, a higher importance indicates that a feature plays a more critical role in the classification result [50, 56]. Figure 7(A) shows the feature importance results of five blood flow oscillation features (cardiac, respiratory, myogenic, neurogenic, and endothelial-related metabolic activities) for the three machine learning algorithms (gradient tree boosting, random forest, and Adaboost) using baseline period data. From the three machine learning algorithms, the importance of cardiac activities is 0.386 ± 0.041, 0.310 ± 0.017, and 0.266 ± 0.049, respectively, and that of respiratory activities is 0.435 ± 0.104, 0.299 ± 0.030, and 0.333 ± 0.070, respectively. The importance of cardiac and respiratory activities in the random forest and Adaboost algorithms accounted for more than 60% of the total, and the gradient tree boosting algorithm accounted for more than 80%. This is confirmed by other results indicating that features corresponding to cardiac and respiratory activities contribute the most to the explanation of diabetes classification data. The gradient tree boosting algorithm, which showed the best classification performance in figures 6(A)–(C), was adopted to classify diabetes blood flow signals for additional feature importance analysis. Figure 7(B) shows the feature importance results of the gradient tree boosting algorithm from three individual period data (baseline, occlusion, and release). The importance of cardiac and respiratory activities with the baseline period data is much more significant than the other features.

Figure 7. Refer to the following caption and surrounding text.

Figure 7. Feature importance analysis results. (A) Result of three machine learning algorithms using baseline period data and (B) result of the gradient tree boosting algorithm from data of three individual periods (baseline, occlusion, release).

Standard image High-resolution image

In the occlusion period data results, the average importance of all features was not different, with values between 0.15 and 0.25. With the release period data, the importance of respiratory activities accounted for more than 50% of the total. Figures 8(A) and (B) shows the feature importance results of the gradient tree boosting algorithm using the entire period data. As mentioned earlier, the entire period refers to combined blood flow oscillation data for the baseline, occlusion, and release periods. Figure 8(A) shows that the sum of the importance for the baseline, occlusion, and release periods is 0.721 ± 0.059, 0.119 ± 0.026, and 0.159 ± 0.050, respectively. The importance of the occlusion and release period data is less than 30% of the total, and the performance increases when these data are not used. It means that even when classifying the control and diabetes blood flow signals using data from the baseline, occlusion, and release periods, the occlusion and release period data had little impact on the classification results, and the baseline period data played a vital role in the classification. In addition, in the total importance, the importance of cardiac and respiratory activities in the baseline period is 0.298 ± 0.019 and 0.327 ± 0.073, respectively (figure 8(B)). It means that these two features accounted for more than 60% of the total importance and had a more significant impact on the classification of diabetes blood flow signals than other features. The feature importance analysis results demonstrate that our proposed disease classification method does not require additional experiments to observe the blood flow reactivity, such as reactive hyperemia. Therefore, the time required for measuring blood flow signals for disease classification is significantly less than that in conventional methods. We performed Pearson correlation analysis to evaluate the impact of each feature in the input data on the diabetes classification performance.

Figure 8. Refer to the following caption and surrounding text.

Figure 8. Feature importance analysis results using the entire period data. (A) sum of the importance of the baseline, occlusion, and release periods for the gradient tree boosting algorithm and (B) importance of all features in the gradient tree boosting algorithm.

Standard image High-resolution image

3.2.3. Pearson correlation coefficient and linear regression analysis

The Pearson correlation coefficient, the blood flow oscillation data, and diabetes classification performance were obtained. This coefficient takes a value between −1 and 1 [57]. Generally, an absolute value of around zero indicates no correlation between the features, whereas a value above 0.7 indicates strong correlation. We used 30 classification results from 10 baseline, 10 occlusion, and 10 release period data to calculate Pearson's correlation coefficient. The correlation values of the three machine learning algorithms using the 30 results are shown in table 1. As shown in table 1(A)–(C), Pearson's correlation coefficients between feature importance corresponding to cardiac activities and diabetes performance were above 0.8. Moreover, the gradient tree boosting algorithm showed Pearson's correlation coefficient above 0.9 as well as the best diabetes classification performance. In contrast, the absolute values of the correlation coefficient between the importance of features other than cardiac activities and diabetes classification performance were all below 0.7. The higher the diabetes classification performance, the higher the Pearson correlation coefficient. These results show that the blood flow oscillation data corresponding to cardiac activity in the blood flow signal was strongly correlated with the diabetes classification performance, and other blood flow oscillation data do not significantly impact the diabetes classification.

Table 1. Pearson correlation coefficient between test accuracy, AUC score, and feature importance calculated using the three machine learning algorithms: gradient tree boosting, random forest, and Adaboost.

 Feature importanceCardiac activitiesRespiratory activitiesMyogenic activitiesNeurogenic activitiesMetabolic activities
Gradient tree boostingTest accuracy0.9330.349−0.629−0.519−0.547
 AUC score0.9280.399−0.699−0.538−0.550
Random forestTest accuracy0.8880.685−0.449−0.349−0.371
 AUC score0.8880.696−0.467−0.288−0.410
AdaboostTest accuracy0.8580.287−0.653−0.4360.378
 AUC score0.8390.216−0.686−0.2890.297

Finally, we performed a linear regression analysis with the best-performing gradient tree-boosting algorithm. Figures 9(A)–(E) shows the results of the linear regression analysis between the AUC score of the gradient tree boosting algorithm and the feature importance of the five blood flow oscillations. Each linear regression graph was visualized using the data to calculate the Pearson correlation coefficient. Figure 9(A) shows the results of the linear regression analysis between the AUC score and the feature importance of cardiac activities; the R2 value is 0.8608. Figures 9(B)–(E) shows the results of the linear regression analysis between the AUC score and the feature importance of the other activities; all R2 values were below 0.5. The linear regression graph analysis showed that changes in blood flow related to cardiac activity contain important information for classifying diabetes, as shown in table 1. Considering that CVD is the leading cause of death in diabetics, the findings on blood flow oscillations corresponding to cardiac activity are noteworthy. This is because the blood flow oscillation data reflects metabolic information, and diabetes has the most significant impact on metabolism related to cardiac activity.

Figure 9. Refer to the following caption and surrounding text.

Figure 9. Results of linear regression analysis between AUC score and feature importance obtained by gradient tree boosting algorithm. (A) Cardiac feature importance, (B) respiratory feature importance, (C) myogenic feature importance, (D) neurogenic feature importance, and (E) metabolic feature importance.

Standard image High-resolution image

4. Conclusion

In this study, a machine learning approach was developed to diagnose diabetes using blood flow signals. We validated the diabetes classification potential of machine learning algorithms by using blood flow signals measured from rats. A comparison of the classification performance of three machine learning algorithms, namely, gradient tree boosting algorithm, random forest, and Adaboost, confirmed that the gradient tree boosting algorithm shows the best performance for diabetes classification. A comparison of the classification results using baseline, occlusion, and release blood flow signals obtained through reactive hyperemia tests showed that diabetes classification is possible with a shorter measurement time compared to that of conventional methods. The impact of blood flow oscillations (cardiac activities, respiratory activities, myogenic activities, neurogenic activities, and metabolic activities) on diabetes classification was investigated using the feature importance and Pearson correlation analysis. The analysis indicated that blood flow oscillation data corresponding to cardiac activities played the most crucial role in diabetes classification. A previous study of STZ-induced diabetes rats showed a significant decrease in heart rate within two weeks of STZ injection, followed by a gradual decrease in heart rate over the remaining ten weeks [58]. Previous studies of changes in heart rate in rats with STZ-induced diabetes support our findings that blood flow oscillation data, which corresponds to cardiac activity, is highly influential in classifying diabetes. This study aims to develop a diabetes diagnosis and classification method based on machine learning algorithms using blood flow oscillation data. However, further testing is needed before it can be used in human experiments. This study was conducted on rats, and only 200 blood flow signals were acquired; therefore, the feasibility of the proposed method for classifying diabetes in humans with many blood flow signals needs further verification. The obtained results should provide a basis for the future development of hemodynamic-based disease diagnosis methods. The presence or absence of disease can be quickly and noninvasively confirmed without any preparation other than the measurement system.

Acknowledgments

This study was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea government (MSIT) (10171504, RS-2024-00333986) and the DGIST R&D Program of the Ministry of Science, ICT and Technology of Korea (24-N-HRHR-02).

Data availability statement

Another study uses the data. The data that support the findings of this study are available upon reasonable request from the authors.

Please wait… references are loading.
10.1088/2632-2153/ad861d