MZZ 135

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

International Journal for Quality in Health Care, 2020, 32(2), 99–112

doi: 10.1093/intqhc/mzz135
Advance Access Publication Date: 11 March 2020
Research Article

Research Article

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
Prediction of medical expenditures of diagnosed
diabetics and the assessment of its related
factors using a random forest model, MEPS
2000–2015
JING WANG1 ,2 , LEIYU SHI2
1
Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Meishan road,
Shushan district, Hefei city,230032, P.R. China 2 Department of Health Policy and Management, Bloomberg School of
Public Health, Johns Hopkins University, Baltimore, MD 21205-1999, USA

Address reprint requests to: Leiyu Shi, Tel:(410)614-6507, Fax:(410)614-9046; E-mail: [email protected]
Editorial Decision 20 November 2019; Accepted 18 December 2019

Abstract
Objective: To predict the medical expenditures of individual diabetics and assess the related factors
of it. Design and setting: Cross-sectional study. Setting and participants: Data were collected from
the US household component of the medical expenditure panel survey, 2000–2015. Main outcome
measure: Random forest (RF) model was performed with the programs of randomForest in R
software. Spearman correlation coefficients (rs ), mean absolute error (MAE) and mean-related
error (MRE) was computed to assess the prediction of all the models. Results: Total medical
expenditure was increased from $105 Billion in 2000 to $318 Billion in 2015. rs , MAE and MRE
between the predicted and actual values of medical expenditures in RF model were 0.644, $0.363
and 0.043%. Top one factor in prediction was being treated by the insulin, followed by type of
insurance, employment status, age and economical level. The latter four variables had no impact
in predicting of medical expenditure by being treated by the insulin. Further, after the sub-analysis
of gender and age-groups, the evaluating indicators of prediction were almost identical to each
other. Top five variables of total medical expenditure among male were same as those among all
the diabetics. Expenses for doctor visits, hospital stay and drugs were also predicted with RF model
well. Treatment with insulin was the top one factor of total medical expenditure among female, 18-
, 25- and 65-age-groups. Additionally, it indicated that RF model was little superior to traditional
regression model. Conclusions: RF model could be used in prediction of medical expenditure of
diabetics and assessment of its related factors well.

Key words: diabetics, medical expenditure, random forest, predict, assessment

Introduction
of population size, the high prevalence of obesity and sedentary
Diabetes mellitus (DM) is one of the most common and serious lifestyle [1]. DM is associated with a number of comorbid conditions
chronic diseases around the globe and the global prevalence of including cardiovascular disease, kidney disease and stroke [4], which
it in adults has been increasing over recent decades [1,2]; it was ranks seventh leading cause of death in the USA [5].
estimated that the number of diabetes in adults would be increasing The growing prevalence of diabetes causes a heavy financial
to 693 million by 2045 [3] due to an aging population, growth burden on the treatment of individuals [6,7]. Medical expenditures

© The Author(s) 2019. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved.
For permissions, please e-mail: [email protected] 99
100 Wang and Shi

in the USA would be increased with the increasing of the prevalence tion, which provides nationally representative estimates of health care
of diabetes type 2 in youth from 0.27/1000 in 2010 to 0.58/1000 in use, expenditures, payment sources and health insurance coverage.
2050 [8]. We merged the full year consolidated data of MEPS-HC from 2000
There were some studies on the medical expenditures of dia- to 2015 (https://meps.ahrq.gov/survey_comp/household.jsp).
betes in different countries [9–11] such as US, Canada, Sweden and This sample had the representativeness of the US population.
so on. For example, Ozieh et al. [9] analyzed the US adults aged Because the respondents in this data were randomly selected from
more than 17 years from MEPS data of 10 years (2002–2011) and the households and responded to the National Health Interview
found that individuals with diabetes had $2558 direct incremental Survey (NHIS) data in each year and the NHIS was a complex

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
expenditures. Bilandzic et al. predicted that the developing diabetes multi-stage sample design; meanwhile, the person-level weight
of predicted 10-year for the Canadian population as a whole is 2 variable “PERWT” provided in the data was used to estimate
156 000 new cases between 2011/2012 and 2021/2022 with the the prevalence of the diagnosed diabetes and average of medical
estimated total health care cost of these new cases was Can $15.36 expenditures.
billion [10]. Bolin et al. used a register-based approach to estimate
healthcare cost of diabetes in Sweden in 1987 and 2005 and the
estimated total costs for Sweden in 2005 was Euro 920 million Definition of diagnosed diabetes
[11]. At the same time, researchers explored the related factors of Diagnosed diabetes in adults (>17-years-old) was defined based on
diabetes’ medical expenditure [12–18]. It found that women paid the self-reporting of “yes” according to the question of “Have you
more than $50 out-of-pocket for office-based-visits and more than ever been diagnosed with diabetes (excluding gestational diabetes)?”
$55 total expenditures for home healthcare compared to men after
adjusting for covariates [12]; the total healthcare expenditure or
Definition of medical expenditures
outpatient expenditure increased when the age increased [13,14].
The total medical expenditure was just the sum of direct payments
Outpatient expenditure of diabetes with diabetic foot was more
for care provided during a year which included expenses of doctor
than those with diabetic nephropathy and cardiac-cerebrovascular
visits, hospital stay, drugs and the others.
disease [13]; the most important diabetes-related chronic complica-
tion in Sweden was cardiovascular disease [11]. Additionally, the
health care behavior would be related to the medical expenditure Definition of potential factors
[18]. Socioeconomic demographic information included: sex, age, marital
If the medical expenditure of diabetes was predicted by these status, education year, race/ethnicity, region, metropolitan statistical
potential factors, it would be a benefit to make some policy on the area, if born in USA, the years living in USA, employment status,
assignment of health sources. status of retired, body mass index, health insurance and income level.
Due to the particularity that the medical expenditure followed Detail situation of sub-categories of them could be seen from the
the positive skewness distribution and there had many zero values of Table 2.
it, studies from USA [9,12,18–20] applied a two-part general linear Course of diagnosed diabetes was categorized into two groups of
model, which were not done from the individual level. Meanwhile, “<9 years” and “no <9 years” by the median value of 9 years in our
several researchers used the decision tree model to predict the medical study.
expenditures of diabetes or gastric cancer from the individual level Co-morbidities included high blood pressure, coronary heart
[13, 21]; but it has no advantage in application to a data with a large disease, angina, heart failure and etc., which were listed in Table 2.
number of predicted variables and many missing values. On the other The number of co-morbidities of diabetes was obtained depending
hand, random forest (RF), a proper ensemble learning algorithm, is on the information further.
a machine learning method which has advantages in treating with Kidney problem or eye problem appeared led by diabetes with
many missing values and having no restrictions on the conditions of the answer of yes or no. Diet, oral medicines and insulin injection
variables [22], which had a higher accuracy, sensitivity and specificity therapy were treated with yes or no respectively. Five ways were
than decision tree [23]. In an addition, RF could be used to predict the applied to learn the care of diabetes as primary-care-provider/the-
continuous variable and obtained the predictions with no significant other-providers/communication-with-provider-through-telephone/
bias [24]. browsing-the-network/taking-part-in-a-group-class with yes or no.
So, we aimed to predict the individual medical expenditure and Another item of having confidence in the care of diabetes had four
assess its related factors by using the RF method based on the degrees of not at all, “somewhat, moderate or high.
US Medical Expenditure Panel Survey (MEPS) data with diagnosed Access to health care and doctor’s attitudes were listed in Table 2
diabetics, 2000–2015, in this study. Prediction of medical expenditure with “never/sometimes/usually/always.”
of diagnosed diabetics in adults from the individual level would lead
to some policy implications and our study could provide a new
idea about the research of the health policy and management in Data preprocessing
methodology. Merged data were constrained to adults; and we obtained 35 174
adult diagnosed diabetics from the US MEPS data, 2000–2015.
In order to balance the expenditures in different years, medical
expenditures of diagnosed diabetes from the year 2000 to 2015 were
Methods
all adjusted to the 2015 dollar value using the consumer price index
Data source and sample (CPI) provided by the US Bureau of Labor Statistics http://data.bls.
The US household component of MEPS (MEPS-HC) data was used gov/cgi-bin/cpicalc.pl.
in the prediction of medical expenditure. MEPS-HC is a nationally Medical expenditure was described with median and quartiles
representative survey of the US civilian non-institutionalized popula- because it followed the positive skewed-distribution; additionally,
Prediction with RF model • Research Article 101

mean and standard deviation of it were also provided. Rank-sum test Univariate analyses between the total medical
was performed to analyze the correlation between the characteristics expenditure and potential variables
of patients and the expenditures. Correlation between the total medical expenditure and potential
Merging, simple description and rank-sum test of the data were variables were shown in Table 2.
performed in Statistical-Analysis-System (SAS) software. It was found that the total medical expenditures were different
among sex, age, region, marital status, education, BMI, economical
level, if born in USA, co-morbidities or complications, treatment and
Random forest analysis

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
access to health care. Detail illustration could be seen in Appendix 1
RF was composed of numerous decision trees using a stochastic
below.
method [24]. These trees in a RF model had no correlation with
each other [25]. Each tree had three types of nodes: root, internal and
leaf. Each tree began with all observations forming the root node and
successive splits determined the order of importance of the predictor Prediction of the medical expenditure of diagnosed
variables [24]. The Gini measure of impurity in RF model was used diabetics with RF model
to select the split with the lowest impurity at every node, which was Individual medical expenditure was the output in the RF model.
a measure of the class label distribution in the node [26]. The Gini Medical expenditures (defined as Y) were transformed into the new
index of a split was the weighted average of the Gini measure over values (defined as Y  ) with the transformation of Y  = log10 (Y + 10).
the different values of variable [26]. The decision of the splitting Potential predictor variables mentioned before were inputs in the RF
criterion would be based on the lowest Gini impurity value computed model.
among a certain number of variables. Each tree employed a different Data were randomly divided into the training dataset with 23 450
set of these variables to construct the splitting rules in RF model individuals and test dataset with 11 724 individuals based on the
[26]. proportion of two-thirds and one-third approximately. The training
Out-of-bag error (OOB) estimation was used to detect the gener- dataset was used to build the RF model in which the number of trees
alization error of the model accurately. Data were randomly divided was defined based on the trend of the values of absolute errors with
into two parts, namely training dataset and test dataset with the different numbers of trees. About 200 trees were defined in the RF
proportion of two-thirds and one-third approximately, respectively. model; once the absolute error was almost constant when the number
Training dataset was applied to build the RF model with three of trees was more than 200 (Figure 1, 1-1); then, the RF model was
parameters, including the number of trees generated, number of built.
predictor variables used in each tree and the node size [23], then Next, the log-transformed medical expenditures of diabetics in
the test dataset was used to validate this model. Predictions could be the dataset were predicted based on the RF model built. Spearman
done with the test dataset in which the last prediction result was the correlation coefficients, MAEs and MREs between the predicted
average value of all the predicted values of the multiple classification values and actual values in the dataset were 0.644, $0.363 and
trees [24]. 0.043%.
Mean absolute error (MAE), mean related error (MRE) and Moreover, the mean decrease in accuracies of top five variables
spearman correlation coefficients (rs ) between the predicted values in prediction of total medical expenditure was shown in Figure 1 (1-
and actual values were used to evaluate the prediction performance. 2) that the top five important predictors were: being treated by the
Mean decrease in accuracy was used to assess the related importance insulin > type of insurance > employment status > AGE > econom-
of variables within the RF model [27]. Local importance [28] was ical level.
calculated to reflect the impact of a variable in predicting that medical Additionally, boxplots of the local importance were displayed
expenditure was caused by another variable. in Figure 1 (1-3) to (1-6). We did not find that type of insurance,
Sub-analyses of RF were performed on total medical expenditures employment status, age and economical level had no impact in
of different gender and age-groups; expenses for doctor visits, hospi- predicting of medical expenditure by being treated by the insulin..
tal stay and drugs were also predicted with RF model. Then, total medical expenditures were predicted by gender and
The RF was completed with the programs in R software. age-groups separately. It found from the Table 3 that the prediction
of medical expenditures of adults’ diabetics who were no <65-years-
old was the best than those of the others in which its MAE and
Results
MRE were the smallest ($0.341, 0.016%) and rs was nearly the
General description highest (0.644). Generally speaking, MAEs, MREs and rs among
There were 35 174 diagnosed diabetics and weighted prevalence of different gender or age groups were nearly stable with a little
diagnosed diabetics was 8.31% among the adults in the USA, 2000– variation.
2015. Prevalence of the diagnosed diabetics increased from 2000 to Simultaneously, top five factors in the prediction of subgroup
2015 generally with the peak in 2014 (10.02%) (Table 1a). analyses were explored (Figure 2 and Table 4). We found that top
Total medical expenditure was continuously increased from $105 five factors in the prediction among male were same as those among
Billion in 2000 to $240 Billion in 2008; decreased to $221 Billion in all diabetics. Top four factors among female were nearly same as
2012, and increased to $318 Billion in 2014, then declined to $303 those among male, and all diabetics; but being diagnosed as arthritis
Billion in 2015 (Table 1a). Additionally, the expense for doctor visits, was the fifth factor in the prediction. While, the top five factors
hospital-stays and drugs were all increased among 16 years. Expense in the prediction among different age-groups were little different
for hospital-stays and drugs were the highest ($107.85 Billion and in which being treated by the insulin or type of insurance was the
$130.66 Billion) in 2015; while expense for doctor visits was the top one factor. At the same time, being diagnosed as joint pain, as
highest ($73.91 Billion) in 2014. Detail information was listed in having high cholesterol, as arthritis was the fifth factor among 18-
Table 1b. , 25- and 45- age-groups separately. And being diagnosed with any
102 Wang and Shi

Table 1a Prevalence and medical expenditure of diagnosed diabetes in adults among 16 years, MEPS 2000–2015 (weighted)

Year n x P (%) Sum of TOTEXP ($ Mean TOTEXP ($)


Billion)

2000 202 407 999 12 428 695 6.14 105.32 8474.35


2001 207 971 832 12 943 274 6.22 131.17 10134.17
2002 211 900 593 13 782 796 6.50 156.06 11322.83

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
2003 213 972 622 14 130 693 6.60 162.02 11465.9
2004 216 389 053 15 616 974 7.22 182.29 11672.92
2005 218 635 384 16 637 943 7.61 199.01 11961.03
2006 220 965 081 17 540 195 7.94 206.45 11770.07
2007 224 137 788 19 033 781 8.49 225.72 11858.91
2008 227 338 272 21 974 026 9.67 240.46 10942.76
2009 229 227 585 19 428 623 8.48 231.13 11896.52
2010 231 107 305 20 619 152 8.92 224.92 10908.5
2011 234 530 873 22 178 815 9.46 222.50 10032.01
2012 237 079 103 21 824 906 9.21 221.69 10157.75
2013 239 213 986 22 652 710 9.47 254.98 11256.13
2014 242 325 304 24 272 685 10.02 318.07 13103.98
2015 244 437 386 24 206 082 9.90 303.19 12525.39

n: number of sample; x: number of diabetes patients.

Table 1b Medical expenditure of diagnosed diabetes in adults by sub-categories, MEPS 2000–2015 (weighted)

Year Expense for doctor visits Expense for hospital stay Expense for drugs

Sum ($ Billion) Mean ($) Sum ($ Billion) Mean ($) Sum ($ Billion) Mean ($)

2000 20.74 1666.43 37.97 3050.05 29.14 2340.81


2001 27.00 2086.26 42.48 3281.73 36.67 2833.44
2002 31.47 2283.55 55.23 4006.88 41.39 3002.69
2003 30.50 2157.09 55.69 3938.32 49.20 3479.41
2004 42.16 2699.66 56.07 3590.31 54.50 3489.79
2005 48.89 2935.94 61.92 3718.52 57.46 3450.58
2006 48.32 2755.10 65.05 3708.44 64.88 3698.89
2007 48.12 2527.34 76.54 4020.14 68.18 3581.39
2008 60.68 2761.65 79.51 3618.19 79.38 3612.63
2009 54.04 2781.67 74.16 3816.89 69.87 3596.46
2010 52.09 2526.22 81.06 3931.28 73.10 3545.21
2011 62.10 2799.89 74.05 3338.89 77.99 3516.33
2012 53.21 2437.85 86.69 3971.19 76.00 3482.26
2013 71.40 3151.97 83.24 3674.67 89.95 3970.83
2014 73.91 3044.96 91.79 3781.51 109.86 4525.91
2015 69.33 2864.36 107.85 4455.59 130.66 5397.87

other kind of heart disease or condition was the second factor among found that all the MAEs and MRE of them were higher than, all the
65- age-group. rs smaller than those in RF model.
At the same time, expenses for doctor visits, hospital stay and Main R software syntaxes were put in Appendix 2.
drugs were also predicted with RF model in which MAE, MRE and
rs were shown in Table 3. But top five factors in prediction of them
were different in which type of insurance, being diagnosed as having Discussion
had a stroke or transient ischemic attack, and being treated by the
Trends of the total medical expenditures and expenses for doctor
insulin were the top one factor among them separately (Figure 3 and
visits, hospital stay and drugs from 2000 to 2015 among diagnosed
Table 4).
diabetes adults in the USA were increased and basically consistent
with each other. After comparison of prediction results between the
RF model and linear regression model on total or sub-analyses, we
Comparison of the prediction results between the RF found that the former was better than the latter with the higher cor-
and linear regression relation coefficient (0.644 > 0.524), lower MAE ($0.363 < $0.478)
All the prediction of medical expenditure on total and sub-analyses and lower MRE (0.043 < 0.147%). Then, we considered that RF
were performed with linear regression model. Spearman correlation model could be used to predict the individual medical expenditure
coefficients, MAEs and MREs were shown in Table 5. It could be of diagnosed diabetics’ adults in the USA. Results of our study
Prediction with RF model • Research Article 103

Table 2 Comparison of total medical expenditures among different information in adult diabetes

Socioeconomic demographic information n (%) Mean ± SD M (Quartiles) P

Sex
Male 15 796 (44.91%) 10954.5 ± 20952.0 4421 (1595.6, 10940.4) <0.001
Female 19 378 (55.09%) 11616.3 ± 20664.0 5260 (2032.8, 12740.5)
Age

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
18- 359 (1.02%) 6170.0 ± 19871.9 2028.6 (568.6, 5864.1) <0.001
25- 4789 (13.62%) 7212.27 ± 18371.5 2493.6(691.0, 6877.0)
45- 16 393 (46.61%) 10611.4 ± 20984.9 4380.5 (1567.7, 10797.2)
65- 13 633 (38.76%) 13749.7 ± 21078.5 6716.5 (2952.2, 15587.6)
Race
Hispanic 8855 (25.17%) 8771.6 ± 19625.7 3095.1 (945.5, 8577.8) <0.001
Not Hispanic 26 319 (74.83%) 12176.2 ± 21106.8 5527.5 (2223.3, 13253.6)
Region
Northeast 5438 (15.46%) 12974.9 ± 22248.3 5845.9 (2415.2, 14135.3) <0.001
Midwest 6461 (18.37%) 12909.5 ± 22621.2 5585.7 (2273.6, 14094.2)
South 14 966 (42.55%) 10977.5 ± 19792.6 4725 (1715.6, 11832.2)
West 8309 (23.62%) 9614.2 ± 19915.2 3891.7 (1401, 9541.5)
Metropolitan statistical area
Urban 5018 (16.96%) 11718.4 ± 21829.7 4828.6 (1806.2, 11901.4) 0.880
Rural 24 570 (83.04%) 11440.2 ± 21206.2 4908.7 (1787, 12227.8)
Marital status
Married 4076 (11.59%) 10420.3 ± 22156.8 4036.6 (1223.6, 10500.7) <0.001
Widowed, divorced and separated 19 132 (54.39%) 10365.8 ± 19841.9 4438.5 (1686.3, 10586.7)
Never married 11 965 (34.02%) 13150.7 ± 21669.6 5996.4 (2349.6, 14778.4)
Years of education
0 415 (1.55%) 11077.7 ± 24766.3 3940.9 (1378.6, 11337.6) <0.001
1–8 4530 (16.95%) 11128.0 ± 19729.7 4364.2 (1630.9, 11943.7)
9–11 4425 (16.56%) 11826.3 ± 20779.3 5124 (1977.6, 12733.2)
12 8623 (32.26%) 10950.2 ± 20898.3 4887.8 (1875.4, 11454)
13- 8736 (32.69%) 11102.0 ± 18272.3 5192.4 (2139.8, 11978.9)
Employment status
Currently employed 12 901 (36.8%) 6699.4 ± 13704.4 3026.0 (1108.0, 7088.7) <0.001
Has a job to return to 62 (0.2%) 10234.9 ± 21174.8 3454.4 (1614.4, 11653.3)
Employed during the reference period 892 (2.5%) 9873.3 ± 19026.5 3620.4 (1126.6, 10118.6)
Not employed with no job to return to 21 212 (62.3%) 14220.1 ± 23740.3 6582.5 (2688.6, 15910.2)
Retired or not
Yes 8295 (44.69%) 13495.3 ± 20485.7 6771.6 (3130.9, 14883.7) <0.001
No 10 267 (65.31%) 12325.3 ± 21791.3 5436.7 (2153.4, 13251.5)
Born in USA
Yes 4895 (71.56%) 13814.3 ± 25554.8 6123.6 (2168.7, 15401.5) <0.001
No 1945 (28.44%) 9357.3 ± 23274.4 2957.3 (775.7, 8702.2)
Years of living in USA
<1 yr 4 (0.21%) 3614.0 ± 6405.6 637.5 (7.5, 4244) <0.001
1- 41 (2.14%) 2918.5 ± 3922.5 1492.8 (300, 3556.2)
5- 97 (5.05%) 6589.9 ± 14323.1 2168.5 (548, 6036)
10- 166 (8.65%) 7157.9 ± 15635.7 2020.5 (489.1, 6082.1)
15- 1612 (83.96%) 10022.0 ± 24747.4 3147 (844.1, 9252.8)
Body mass index
<18.5 1328 (3.91%) 11321.1 ± 16659.3 4477.8 (1047.9, 14329.2) <0.001
18.5- 5063 (14.91%) 11145.9 ± 18379.2 4458.1 (1599.9, 11576.6)
25- 10 207 (30.05%) 10992.4 ± 15671.9 4477.1 (1691.0, 10789.0)
30- 17 364 (51.13%) 12501.7 ± 13155.0 5401.3 (2066.3, 12980.2)
Economical level
Poor 7334 (20.85%) 12528.9 ± 21826.4 5302.9 (1815.5, 14096.5) <0.001
Near poor 2688 (7.64%) 13634.3 ± 26047.9 5348.1 (1860.4, 14930.8)
Low income 6442 (18.31%) 11118.2 ± 21682.8 4500.0 (1557.5, 11457.2)
Middle income 10 210 (29.03%) 10566.9 ± 19523.6 4498.6 (1709.9, 11152.8)
High income 8500 (24.17%) 10599.1 ± 18599.6 5155.9 (2104.6, 11302.5)

Continued
104 Wang and Shi

Table 2 Continued

Socioeconomic demographic information n (%) Mean ± SD M (Quartiles) P

Types of insurance
Private insurance 7860 (47.18%) 11840.6 ± 20808.0 5758.5 (2503.6, 12501.7) <0.001
Medicare 3806 (22.85%) 15617.8 ± 23741.3 7815.6 (3469.8, 17906.0)
Medicaid 2991 (17.95%) 15847.2 ± 24819.4 7715.4 (3055.8, 18647.0)

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
The others 2002 (12.02%) 12016.6 ± 22951.5 5389.9 (2302.1, 13570.6)
Co-morbidities n (%) Mean ± SD M (Quartiles) P
High blood pressure
Yes 25 707 (73.24%) 12670.9 ± 22282.6 5723.7 (2322.3, 13670.3) <0.001
No 9393 (26.76%) 7639.6 ± 15476.8 2951.2 (923.3, 7650.7)
Had been told on two or more different visits that he had high blood pressure
Yes 23 207 (91.10%) 12980.5 ± 22203.4 6039.8 (2505.1, 14219.6) <0.001
No 2268 (8.90%) 9858.7 ± 22399.3 3473.4 (1223, 9046.7)
Coronary heart disease
Yes 5652 (16.13%) 19797.9 ± 29630.0 9903.1 (4356.4, 23581.9) <0.001
No 29 395 (83.87%) 9668.9 ± 18135.0 4240.9 (1570.4, 10143.8)
Angina
Yes 2861 (8.16%) 19922.3 ± 26490.3 10736.1 (4812.7, 24576.8) <0.001
No 32 182 (91.84%) 10532.1 ± 19961.5 4524.5 (1690.8, 11000.5)
Heart failure
Yes 4247 (12.10%) 20828.8 ± 30734.1 10123.7 (4278.4, 24940.3) <0.001
No 30 851 (87.90%) 9998.4 ± 18490.2 4425.8 (1638.5, 10596.4)
Other heart diseases
Yes 6407 (18.27%) 18938.3 ± 28311.8 9812.9 (4364.5, 22542.4) <0.001
No 28 652 (81.73%) 9573.3 ± 18007.7 4150.3 (1540.5, 9938)
Stroke
Yes 3801 (10.82%) 20854.7 ± 31322.9 10750.8 (4481.4, 24805.5) <0.001
No 31 318 (89.18%) 10156.8 ± 18778.9 4454.1 (1664.5, 10649.6)
Emphysema
Yes 1509 (4.30%) 20777.4 ± 27795.7 11593.8 (5239.9, 26315.6) <0.001
No 33 616 (95.70%) 10890.7 ± 20302.3 4683.7 (1749.8, 11477.8)
High cholesterol
Yes 18 325 (69.01%) 12431.5 ± 22037.5 5628 (2258.8, 13380.1) <0.001
No 8230 (30.99%) 9414.9 ± 20034.6 3153.7 (877.7, 9406.1)
Cancers
Yes 3080 (15.09%) 17031.4 ± 26104.8 8621.1 (3548.4, 20 029) <0.001
No 17 328 (84.91%) 10371.6 ± 20444.0 4174.8 (1352.9, 10653.2)
Joint pain
Yes 18 668 (53.65%) 13672.6 ± 21782.7 6668.7 (2861.4, 15421.7) <0.001
No 16 126 (46.35%) 8287.2 ± 18531.3 3225 (1112, 8052.9)
Arthritis
Yes 16 372 (48.35%) 14431.0 ± 22706.2 7182.1 (3159.2, 16539.2) <0.001
No 17 489 (51.65%) 8599.9 ± 18800.4 3292.9 (1143.3, 8311)
Asthma
Yes 4823 (13.73%) 15588.7 ± 23980.6 8162.7 (3412.8, 18134.1) <0.001
No 30 317 (86.27%) 10645.0 ± 20167.6 4478.4 (1668.5, 10953.6)
Number of chronic diseases of diabetes
0 2972 (8.45%) 3065.2 ± 6943.6 1401.2 (363.1, 3818.3) <0.001
1 2984 (8.48%) 7395.6 ± 14276.8 2236.6 (741.1, 5427.4)
2 4573 (13.00%) 8396.2 ± 17701.5 3041.8 (1150.4, 7124.1)
3 5673 (16.13%) 8841.9 ± 10252.1 3840 (1536.8, 8783.6)
4 5854 (16.64%) 11413.7 ± 18638.5 5296.2 (2301, 11625.5)
5 5260 (14.95%) 10276.6 ± 18562.7 6608.5 (3106.2, 14721.7)
6 7861 (22.35%) 21554.0 ± 23910.5 10517.7 (4791.1, 23550.7)
Complication, care, nursing et al. n (%) Mean ± SD M (Quartiles) P
If the diabetes led to kidney problem
Yes 3578 (12.50%) 21402.1 ± 32557.5 10536.6 (4257, 25143.2) <0.001
No 25 050 (87.50%) 10113.4 ± 17443.9 4755.8 (1910.7, 11000.5)

Continued
Prediction with RF model • Research Article 105

Table 2 Continued

Socioeconomic demographic information n (%) Mean ± SD M (Quartiles) P

If the diabetes led to eye problem


Yes 6913 (24.14%) 16085.4 ± 26302.5 7814.2 (3297.7, 18163.2) <0.001
No 21 719 (75.86%) 100064.2 ± 17704.0 4568.8 (1814.7, 10687.9)
If the diabetes was treated with diet

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
Yes 22 714 (79.16%) 11848.0 ± 20706.8 5375.9 (2144.6, 12886.6) <0.001
No 5981 (20.84%) 10240.5 ± 18565.4 4612.7 (1800.4, 10661.6)
If the diabetes was treated with oral medicines
Yes 22 419 (77.85%) 10905.9 ± 18576.0 5149.4 (2144.7, 11911.8) 0.444
No 6380 (22.15%) 13508.4 ± 23918.9 5481.7 (1713.5, 14654.1)
If the diabetes was treated with insulin injection
Yes 8240 (28.64%) 17489.1 ± 27124.3 8993.6 (4001.6, 19926.5) <0.001
No 20 526 (71.36%) 9149.5 ± 16223.4 4185.1 (1653.4, 9702)
If you learned the care of diabetes from primary care provider
Yes 11 506 (90.02%) 11935.6 ± 21740.5 5252 (2019.5, 12967.2) <0.001
No 1275 (9.98%) 10454.9 ± 19598.5 4008 (1060.9, 11328.9)
If you learned the care of diabetes from the other provider
Yes 2193 (17.16%) 16190.3 ± 25910.1 7404.7 (2893.9, 17884.5) <0.001
No 10 588 (82.84%) 10876.1 ± 20402.1 4765.2 (1768.2, 11851.6)
If you learned the care of diabetes from communication with provider through telephone
Yes 1526 (11.94%) 13870.6 ± 22408.2 7004.1 (2676.1, 16533.7) <0.001
No 11 256 (88.06%) 11504.9 ± 21404.4 4926.3 (1834.7, 12272.8)
If you learned the care of diabetes from browsing the network
Yes 3510 (27.46%) 11284.4 ± 22352.8 4927.4 (1808, 12127.1) 0.011
No 9272 (72.54%) 11977.7 ± 21221.6 5234.1 (1954.8, 13119.3)
If you learned the care of diabetes from taking part in a group class
Yes 2140 (16.74%) 12272.4 ± 19448.9 5873.2 (2478.5, 13810.5) <0.001
No 10 642 (83.26%) 11689.7 ± 21935.4 4996.3 (1821.9, 12630.8)
If you have confidence in the care of diabetes
Not confident at all 563 (3.61%) 13752.3 ± 25376.6 5064.1 (1708, 14684.3) 0.500
Somewhat confident 3784 (24.26%) 11781.9 ± 19925.4 5090.2 (1890.3, 12970.4)
Confident 6059 (38.85%) 11523.1 ± 21612.6 5149 (1932.7, 12480.9)
Very confident 5190 (33.28%) 11322.4 ± 20753.2 4918.1 (1859.9, 12 066)
Access to health care n (%) Mean ± SD M (Quartiles) P
Got care needed when ill/injury
Never 308 (2.44%) 10300.4 ± 21449.2 3492.7 (1061.2, 11687.5) <0.001
Sometimes 1400 (11.07%) 13747.9 ± 25853.0 6008.3 (2165.7, 15427.5)
Usually 3222 (25.48%) 15707.3 ± 24453.4 7667 (3016.1, 18189.9)
Always 7714 (61.01%) 18585.6 ± 27068.3 9357.1 (3966.1, 22224.8)
Got medical appointment when wanted
Never 560 (2.26%) 9238.8 ± 19846.9 3481.8 (1384.9, 8353.3) <0.001
Sometimes 2849 (11.49%) 10911.9 ± 20413.0 4526.4 (1739.8, 11457.6)
Usually 7686 (31.00%) 12853.3 ± 22198.3 6021.3 (2620.7, 14007.5)
Always 13 697 (55.25%) 13053.3 ± 21287.5 6211.9 (2711.9, 14346.9)
Easy getting needed medical care
Never 167 (1.32%) 11194.4 ± 23314.4 3555.2 (1152.3, 9524) <0.001
Sometimes 1036 (8.21%) 11931.0 ± 23138.9 4800.8 (1737, 11308.1)
Usually 3368 (26.68%) 13869.5 ± 24859.0 6370.1 (2491.6, 15270.2)
Always 8052 (63.79%) 13994.3 ± 22250.4 6704.7 (2907.5, 15544.9)
Doctor listened to you
Never 333 (1.21%) 9195.7 ± 20830.8 3204.4 (988.7, 8885) <0.001
Sometimes 2122 (7.71%) 13261.1 ± 22061.9 5753.8 (2211.3, 14201.4)
Usually 7669 (27.85%) 12668.0 ± 21447.5 6072 (2532.3, 13945.1)
Always 17 416 (63.24%) 12239.5 ± 21130.4 5640.1 (2439.7, 13 150)
Doctor explained so understood
Never 371 (1.34%) 12354.1 ± 33646.0 3785 (1299.8, 9644.7) <0.001
Sometimes 2187 (7.90%) 13306.8 ± 21241.3 6064.8 (2349.7, 14 502)
Usually 8014 (28.95%) 13031.2 ± 21469.3 6203.7 (2605.7, 14511.6)
Always 17 114 (61.81%) 11991.3 ± 20824.3 5521.1 (2377.6, 12872.9)

Continued
106 Wang and Shi

Table 2 Continued

Socioeconomic demographic information n (%) Mean ± SD M (Quartiles) P

Doctor showed respect


Never 340 (1.23%) 9487.7 ± 19795.9 3767.2 (1168.8, 9287) <0.001
Sometimes 1896 (6.85%) 13670.2 ± 21288.6 6365.8 (2381, 15498.2)
Usually 7300 (26.39%) 12647.2 ± 20774.8 6136.3 (2611.2, 14074.3)
Always 18 129 (65.53%) 12233.1 ± 21481.9 5560.8 (2394.5, 13030.6)

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
Doctor spent enough time
Never 518 (1.88%) 10419.2 ± 21550.1 4215.1 (1552.5, 9891.1) <0.001
Sometimes 2835 (10.27%) 12644.0 ± 21814.3 5571.9 (2119.7, 13 763)
Usually 9471 (34.30%) 12889.0 ± 21911.4 6010.5 (2548.9, 13757.2)
Always 14 789 (53.56%) 12106.4 ± 20632.1 5659.4 (2439.6, 13223.1)
Doctor given instruction easy understood
Never 44 (0.52%) 8550.9 ± 13444.3 4551.2 (1832.6, 7007.8) 0.001
Sometimes 542 (6.44%) 15870.7 ± 31035.9 5824.1 (2342.1, 15462.2)
Usually 2622 (31.16%) 14199.6 ± 23917.4 6700.1 (2457, 16305.6)
Always 5207 (61.88%) 12752.4 ± 22469.8 5782.2 (2272.3, 13 811)
Doctor asked R description how follow
Never 1606 (19.10%) 12244.9 ± 19654.9 6253.5 (2648.6, 13401.2) 0.007
Sometimes 1457 (17.33%) 14971.2 ± 28781.5 6576.1 (2517.9, 16707.4)
Usually 2378 (28.28%) 12910.3 ± 22209.2 5913.6 (2203.9, 14739.4)
Always 2968 (35.30%) 13345.6 ± 23001.7 5655 (2210.5, 14560.9)

Table 3a MAEs, MREs and rs of prediction of medical expenditures with RF model on total and sub-analyses

Total and subgroup analyses MAE ($) MRE (%) rs

Total 0.363 0.043 0.644


Gender Male 0.380 0.052 0.653
Female 0.351 0.037 0.627
Age-groups 18- 0.488 0.108 0.509
25- 0.473 0.101 0.599
45- 0.372 0.049 0.643
65- 0.314 0.016 0.644
Doctor visits 0.479 0.125 0.538
Hospital stay 0.920 0.412 0.427
Drugs 0.425 0.092 0.605

Table 3b MAEs, MREs and rs of prediction of medical expenditures with linear regression model on total and sub-analyses

Total and subgroup analyses MAE ($) MRE (%) rs

Total 0.478 0.147 0.524


Gender Male 0.496 0.156 0.534
Female 0.462 0.139 0.513
Age-groups 18- 0.536 0.192 0.529
25- 0.589 0.206 0.482
45- 0.484 0.151 0.53
65- 0.417 0.117 0.437
Doctor visits 0.592 0.229 0.413
Hospital stay 1.020 0.515 0.337
Drugs 0.547 0.200 0.476

were different with the others more or less [9, 12, 13, 16–18, 20], expenditure of individual diabetics would be predicted directly by
which might due to the different data sources, analytic methods the professionals of health care, which was in order to take some
and predictors. After the sub-analyses among different gender and measurements in advance.
age-groups, the evaluation indexes of prediction were nearly same In addition to prediction, RF can also be used to assess the influ-
with each other. Additionally, expenses for doctor visits, hospital stay ential factors in prediction [28]. We found that the most important
and drugs were predicted with high correlation and low MAE or influential factor in the prediction of the total medical expenditure
MRE. If the RF model was put into the computer, the total medical of diagnosed diabetics in the USA was being treated by the insulin.
Prediction with RF model • Research Article 107

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
Figure 1 (1-1) Trend of absolute error changes with the number of trees grows in RF model. (1-2) Top 10 important variables for total medical expenditure in RF
model. (1-3) Local importance plot by DSINSU53 on prediction of medical expenditure were presented for the variable of PMEDPY53. (1-4) Local importance
plot by DSINSU53 on prediction of medical expenditure were presented for the variable of EMPST53H. (1-5) Local importance plot by DSINSU53 on prediction
of medical expenditure were presented for the variable of AGE. (1–6) Local importance plot by DSINSU53 on prediction of medical expenditure were presented
for the variable of POVCAT.

Diabetics with being treated by the insulin injection would have more [30], which was little different with ours. The third factor was
medical expenditure than those without it ($8993.6 > $4185.1) and employment status and 62.3% diabetics having not employed with
28.64% of diabetics had been treated by the insulin in 1 year. Another no job to return to would have more medical expenditure than those
study also found that insulin users’ 1-year type 2 diabetes healthcare currently employed, having a job to return to, and employed during
expenditures was at least double that of non-users [29], which was the reference period ($6582.5 > $3026.0, $3454.4 and $3620.4) in
similar to ours, although the figures was discussed among diabetics our study. The American Diabetes Association adopted the following
with overweight and obese. The second important influential factor position on employment that “Any person with diabetes, whether
was type of insurance and diabetics with Medicare and Medicaid insulin or non-insulin, should be eligible for any employment for
would have more medical expenditure than those with private and which he/she is otherwise qualified,” but not every individual with
the other insurance ($7815, $7715.4 > $5758.5, $5389.9); 40.8% diabetes will be qualified for, nor can perform, every available job,
of diabetics had Medicare or Medicaid. Hu et al. studied that partici- reasonable accommodations can readily be made that allow the
pants with private insurance or Medicare or Medicaid coverage were vast majority of people with diabetes to effectively perform the vast
more likely to receive quality diabetes care than other individuals majority of jobs [31]; and this situation could lead to heavy burden of
108 Wang and Shi

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
Figure 2 Top 10 important variables for the medical expenditure among gender and age-groups in RF model.

Table 4 Top five factors in different RF models on total and sub-analyses age-groups

Total and subgroup analyses Rank 1 Rank 2 Rank 3 Rank 4 Rank 5

Total DSINSU53 PMEDPY53 EMPST53H AGE POVCAT


Gender Male DSINSU53 PMEDPY53 EMPST53H AGE POVCAT
Female DSINSU53 PMEDPY53 AGE EMPST53H ARTHDX
Age-groups 18- DSINSU53 PMEDPY53 HISPANX REGION JTPAIN53
25- DSINSU53 PMEDPY53 EMPST53H HISPANX CHOLDX
45- PMEDPY53 DSINSU53 EMPST53H POVCAT ARTHDX
65- DSINSU53 OHRTDX EMPST53H number6 DSKIDN53
Doctor visits PMEDPY53 POVCAT EMPST53H ARTHDX AGE
Hospital stay STRKDX MIDX OHRTDX EMPST53H CHDDX
Drugs DSINSU53 PMEDPY53 CHOLDX EMPST53H AGE

Illustration of abbreviation: seen the “List of abbreviation” in the text.


Prediction with RF model • Research Article 109

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
Figure 3 Top 10 important variables for the doctor visits, hospital stay and drugs in RF model of age-groups.

healthcare on them. The fourth one was age and the medical expen- having the other heart diseases or conditions was the second factor
diture increased with the age, which was similar to the result of the in prediction of medical expenditure among the diabetics with 65-
study among adults no <18-years-old with diabetes [17] in which the age-group, which could be that the coexistence of metabolic and
ages were also categorized into some approximate age groups. But in cardiovascular conditions was frequent and known to have an over-
another study, it increased US$27 when the age grown one year with- proportional impact on health outcomes [34] or quality of life [35],
out the significant difference [16] which could be explained by that which might bring about heavy burden in healthcare of the older. It
the age was a continuous variable according to the population of dia- also found that type of insurance, being diagnosed as having had a
betes and non-diabetes with no <20-years-old. The economical level stroke or transient ischemic attack, and being treated by the insulin
was the fifth important influential factor. It showed from our study were the top one factor in prediction of expenses for doctor visits,
that diabetics of near poor or poor had a total medical expenditure hospital stay and drugs separately. The factor of being diagnosed as
more than other higher income levels ($5348.1, $5302.9 > $4500.0, having had a stroke or transient ischemic attack was studied with
$4498.6, $5155.9). A study found that rural American Indians in the diabetics before in which total medical expenditure of stroke with
USA were more likely to be poor and typically carried a great chronic diabetes was reported to be $23283 [36], and total health care cost
disease burden [32], which might be the reason according to our of diabetics suffering a first stroke and a repeat stroke increased by
result. 6.5, and 6.4 during the year of the event [37]. Although these two
We found that top five factors in the prediction of medical studies [p, q] focused on total healthcare cost, the results [36,37]
expenditures among male were same as those among all diabetics. could reflect the importance of being stroke on the medical cost.
But those among female were little different in which being diagnosed To our knowledge, this study was the first application of RF
as arthritis was the fifth factor in prediction of medical expenditure, method to predict the medical expenditure of diagnosed diabetes,
which could be explained that arthritis is a disease with female identified the important related factors of it and assessed the interac-
preponderance [33] and would lead to more medical expenditures. tion between some variables in the prediction of medical expenditure.
On the other hand, being treated by the insulin or type of insurance It found that RF model was an excellent tool and our findings
was the top one factor among different age-groups, which was similar provided a key step toward making decisions in health policy and
to those among all the diabetics. Especially, being diagnosed as management for diabetes.
110 Wang and Shi

Acknowledgements CHDDX: if the person (aged 18 or older) had ever been diagnosed as having
coronary heart disease
This work were partly supported by the project of visiting study overseas
of the excellent youth backbone in the universities of education depart-
ment of Anhui province in 2017 (code: gxfx2017008), Data Science Center
of the School of Public Health in Anhui Medical University, Key Project of
References
the Education Department of Anhui Province Natural Science Research (code: 1. Shaw JE, Sicree RA, Zimmet PZ. Global estimates of the prevalence
KJ2017A164) and Anhui provincial laboratory of population health and major of diabetes for 2010 and 2030. Diabetes Res Clinical Prac 2010;87:
disease screening and diagnosis. 4–14.

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
2. Ogurtsova K, da Rocha Fernandes JD, Huang Y et al. IDF diabetes
atlas: global estimates for the prevalence of diabetes for 2015 and 2040.
Authors’ contributions Diabetes Res Clinical Prac 2017;128:40–50.
Prof. Leiyu Shi designed the study and revised the text. Dr Jing Wang analyzed 3. Cho NH, Shaw JE, Karuranga S et al. IDF diabetes atlas: Global estimates
the data and wrote the manuscript. of diabetes prevalence for 2017 and projections for 2045. Diabetes Res
Clin Pract 2017;138:271–81.
4. Center for Disease Control and Prevention. National Diabetes Statistics
Report, 2014. Atlanta: US Department of Health and Human Services,
Authors’ information Center for Disease Control and Prevention 2014.
Dr Jing Wang is the first author. Her address is the department of Epi- 5. Hoyert DL, Xu J. Deaths: preliminary data for 2011. Natl Vital Stat Rep
demiology and Biostatistics, School of Public Health, Anhui Medical Uni- 2012;61:1–51.
versity, Anhui, P.R. China; Department of Health Policy and Management, 6. Menzin J, Korn JR, Cohen J et al. Relationship between glycemic control
Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, and diabetes-related hospital costs in patients with type 1 or type 2
USA. diabetes mellitus. J Manag Care Pharm 2010;16:264–75.
Prof. Leiyu Shi is the corresponding author and his address is the depart- 7. American Diabetes Association. Economic costs of diabetes in the US in
ment of Health Policy and Management, Bloomberg School of Public Health, 2012. Diabetes Care 2013;36:1033–46.
Johns Hopkins University, Baltimore, MD, USA. 8. Imperatore G, Boyle JP, Thompson TJ et al. Projections of type 1 and type
2 diabetes burden in the U.S. population aged < 20 years through 2050.
Diabetes Care 2012;35:2515–20.
Declarations 9. Ozieh MN, Bishu KG, Dismuke CE, Egede LE. Trends in health
care expenditure in US adults with diabetes:2002-2011. Diabetes Care
Ethics approval and consent to participate: Not applicable.
2015;38:1844–51.
Consent for publication: Not applicable.
10. Bilandzic A, Rosella L. The cost of diabetes in Canada over 10 years:
applying attributable health care costs to a diabetes incidence prediction
model. Health Promot Chronic Dis Prev Can 2017;37:49–53.
Funding 11. Bolin K, Gip C, Mörk AC, Lindgren B. Diabetes, healthcare cost and loss
Not applicable. of productivity in Sweden 1987 and 2005—a register-based approach.
Diabet Med 2009;26:928–34.
12. Williams JS, Bishu K, Dismuke CE, Egede LE. Sex differences in healthcare
expenditures among adults with diabetes: evidence from the medical
List of abbreviations
expenditure panel survey, 2002-2011. BMC Health Serv Res 2017;17:
MEPS: Medical Expenditure Panel Survey 259.
HC-MEPS: household component of MEPS 13. Xu GC, Luo Y, Li Q et al. Standardization of type 2 diabetes outpatient
CPI: consumer price index expenditure with bundled payment method in China. Chin Med J (Engl)
SAS: statistical analysis system 2016;129:953–9.
CART: classification and regression tree 14. Leung MY, Carlsson NP, Colditz GA, Chang SH. The burden of obesity
OOB: out-of-bag on diabetes in the United States: medical expenditure panel survey, 2008
MAE: mean absolute error to 2012. Value Health 2017;20:77–84.
MRE: mean-related error 15. Pohar SL, Majumdar SR, Johnson JA. Health care costs and mortality for
DSINSU53: being treated for his/her diabetes by the insulin Canadian urban and rural patients with diabetes: population-based trends
PMEDPY53: type of insurance from 1993−2001. Clin Ther 2007;29:1316–24.
EMPST53H: employment status 16. Ahmed N, Choe Y, Mustad VA et al. Impact of malnutrition on sur-
POVCAT: economical level vival and healthcare utilization in Medicare beneficiaries with diabetes:
HISPANX: if the diabetic was Hispanic a retrospective cohort analysis. BMJ Open Diabetes Res Care 2018;6:
JTPAIN53: if the person (aged 18 or older) had experienced pain, swelling, or e000471.
stiffness around a joint in the last 12 months 17. Egede LE, Walker RJ, Bishu KJ, Dismuke CE. Trends in costs of depression
CHOLDX: whether the person had ever been diagnosed as having high in adults with diabetes in the United States: medical expenditure panel
cholesterol survey, 2004-2011. J Gen Intern Med 2016;31:615–22.
ARTHDX: if the person (aged 18 or older) had ever been diagnosed with 18. Egede LE, Gebregziabher M, Zhao Y et al. Impact of mental health visits
arthritis on healthcare cost in patients with diabetes and comorbid mental health
OHRTDX: if the person (aged 18 or older) had ever been diagnosed with any disorders. PloS One 2014;9:e103804.
other kind of heart disease or condition 19. Pantalone KM, Hobbs TM, Wells BJ et al. Clinical characteristics, compli-
number6: the number(s) of co-morbidities of diabetes cations, comorbidities and treatment patterns among patients with type 2
DSKIDN53: if had diabetes caused kidney problems diabetes mellitus in a large integrated health system. BMJ Open Diabetes
STRKDX: if the person (aged 18 or older) had ever been diagnosed as having Res Care 2015;3:e000093.
had a stroke or transient ischemic attack 20. Campbell JA, Bishu KJ, Walker RJ, Egede LE. Trends of medical expendi-
MIDX: if the person (aged 18 or older) had ever been diagnosed as having a tures and quality of life in Usadults with diabetes: the medical expenditure
heart attack, or myocardial infarction panel survey, 2002-2011. Health Qual Life Outcomes 2017;15:70.
Prediction with RF model • Research Article 111

21. Wang J, Li M, Hu YT, Zhu Y. Comparison of hospital charge prediction followed by those of diabetes of widowed/divorced/separated and of married
models for gastric cancer patients: neural network vs. decision tree models. ($5996.4 > $4438.5 > $4036.6). Diabetes with more than 13 years of
BMC Health Serv Res 2009;9:161. education had the highest expenditure ($5192.4); those with no education
22. Breiman L. Random forests. Machine Learning 2001;1:5–32. had the smallest expenditure ($3940.9). The larger the BMI, the higher the
23. Esmaily H, Tayefi M, Doosti H et al. Comparison between decision tree expenditure was ($4408.8 < $4512.2 < $4938.9 < $6118.2). Correlation of
and random forest in determining the risk factors associated with type 2 the expenditure of diabetes and the economical level was expressed by “U”
diabetes. J Res Health Sci 2018;18:e00412. shape which meant that the expenditures of poor or near poor and high income
24. Ellis K, Kerr J, Godbole S et al. A random forest classifier for the prediction were higher than those of low or middle income. Diabetics born in the USA had

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023
of energy expenditure and type of physical activity from wrist and hip more expenditure than that of those not born in the USA and the expenditure
accelerometers. Physiol Meas 2014;35:2191–203. was approximately increased with the years living in the USA. Diabetes had
25. Lebedev AV, Westman E, Van Westen GJ et al. Random forest ensembles more medical expenditure with Medicare and Medicaid than that of those with
for detection and prediction of Alzheimer’s disease with a good between- private insurance and the other insurances.
cohort robustness. Neuroimage Clin 2014;6:115–25. Diabetes with some co-morbidities had a higher expenditure than that
26. Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from of those without co-morbidities (P < 0.05) in which the difference between
highly imbalanced data using random forest. BMC Med Inform Decis Mak them of two groups (“had a certain comorbidity” vs. “had no the certain
2011;11:51. comorbidity”) was more than $5000 when the co-morbidities were coro-
27. Boulesteix AL, Janitza S, Kruppa J, K IR. Overview of random for- nary heart disease, the other heart diseases, angina, heart failure, stroke and
est methodology and practical guidance with emphasis on computa- emphysema. Diabetes with the complication of kidney or eye problems had
tional biology and bioinformatics. WIREs Data Mining Knowl Discov the higher expenditure than that of those without complication (P < 0.05);
2012;6:493–507. diabetes treated with diet or insulin injection had the higher expenditure
28. Takeuchi M, Inuzuka R, Hayashi T et al. Novel risk assessment tool than that of those not treated with it respectively (P < 0.05). Diabetes,
for immunoglobulin resistance in Kawasaki disease: application using a learning the care from nearly all the ways except the way of browsing the
random forest classifier. The Pediatr Infect Dis J 2017;36:821–6. network, had more expenditure than that of those not learned from them
29. Johnston SS, Ammann EM, Kashyap SR et al. Body mass index and insulin (P < 0.05). We further analyzed the medical expenditures of diabetes without
use as identifiers of high-cost patients with type 2 diabetes: a retrospective any chronic diseases and with different numbers of chronic diseases and found
analysis of electronic health records linked to insurance claims data. that the medical expenditures increased with the number of chronic diseases
Diabetes Obes Metab 2019;21:1419–28. ($1401.2 < $2236.6 < $3041.8 < $3840 < $5296.2 < $6608.5 < $10517.7).
30. Hu R, Shi L, Rane S et al. Insurance, racial/ethnic, SES-related disparities Medical expenditure on diabetes became larger with the increase of the
in quality of care among US adults with diabetes. J Immigr Minor Health frequency of getting health care. However, when the frequency of doctors’
2014;16:565–75. attitude and behavior was “usually” or “sometimes,” the expenditure of
31. American Diabetes Association, Anderson JE, Greene MA et al. diabetes was the highest.
Diabetes\ignorespacesand\ignorespacesemployment. Diabetes Care
2014;37:S112–7.
32. Nicklett EJ, Omidpanah A, Whitener R et al. Access to care and diabetes
Appendix 2
management among older American Indians with type 2 diabetes. J Aging This appendix was the main R code of RF model as follows.
Health 2017;29:206–21. ### rf models
33. Mavrogeni S, Dimitroulas T, Bucciarelli-Ducci C et al. Rheumatoid arthri- set.seed(1031)
tis: an autoimmune disease with female preponderance and cardiovascular fomu<-as.formula(paste0(“TOTEXP∼”,n))
risk equivalent to diabetes mellitus: role of cardiovascular magnetic reso- rf_500 < -randomForest(fomu,data = a_nomiss,importance = T,keep.forest =
nance. Inflamm Allergy Drug Targets 2014;13:81–93 (abstract). T,keep.inbag = T)
34. Haffner SM, Lehto S, Rönnemaa T et al. Mortality from coronary set.seed(1031)
heart disease in subjects with type 2 diabetes and in nondiabetic sub- fomu<-as.formula(paste0(“TOTEXP∼”,n))
jects with and without prior myocardial infarction. New Engl J Med rf0 < -randomForest(fomu,data = a_nomiss,importance = T,keep.forest =
1998;339:229–34. T,keep.inbag = T,ntree = 200)
35. Laxy M, Hunger M, Stark R et al. The burden of diabetes mellitus in set.seed(1031)
patients with coronary heart disease: a methodological approach to assess rf0_local<-randomForest(fomu,data = a_nomiss,localImp = T,keep.forest =
quality-adjusted life-years based on individual-level longitudinal survey T,keep.inbag = T,ntree = 200)
data. Value Health 2015;18:969–76. set.seed(1031)
36. Zhou X, Shrestha SS, Luman E et al. Medical expenditures associated with fomu<-as.formula(paste0(“TOTEXP∼”,n))
diabetes in myocardial infarction and ischemic stroke patients. Am J Prev rf_male0 < -randomForest(fomu,data = a_male,importance = T,keep.forest =
Med 2017;53:S190–6 (abstract). T,keep.inbag = T,ntree = 200)
37. Ringborg A, Yin DD, Martinell M et al. The impact of acute myocardial set.seed(1031)
infarction and stroke on health care costs in patients with type 2 diabetes fomu<-as.formula(paste0(“TOTEXP∼”,n))
in Sweden. Eur J Cardiovasc Prev Rehabil 2009;16:576–82. rf_female0 < -randomForest(fomu,data = a_female,importance = T,keep.forest
= T,keep.inbag = T,ntree = 200)
set.seed(1031)
fomu<-as.formula(paste0(“TOTEXP∼”,n))
rf_age1_0 < -randomForest(fomu,data = a_age1,importance = T,keep.forest =
Appendix 1 T,keep.inbag = T,ntree = 200).
This appendix was detail explanation of the results of ‘univariate analyses set.seed(1031)
between the total medical expenditure and potential variables.’ fomu<-as.formula(paste0(“TOTEXP∼”,n))
The medical expenditure of female diabetes was more than that of rf_age2_0 < -randomForest(fomu,data = a_age2,importance = T,keep.forest =
male ($5260 > $4421), expenditure of non-Hispanic more than that of T,keep.inbag = T,ntree = 200)
Hispanic ($5527.5 > $3095.1). Expenditures increased with the age of set.seed(1031)
adult diabetes grown up. Expenditures of diabetics who residented in the fomu<-as.formula(paste0(“TOTEXP∼”,n))
midwest region was the most ($5855.7) and that in the west was the rf_age3_0 < -randomForest(fomu,data = a_age3,importance = T,keep.forest =
least ($3891.7). Diabetes of never married had the highest expenditure T,keep.inbag = T,ntree = 200)
112 Wang and Shi

set.seed(1031) set.seed(1031)
fomu<-as.formula(paste0(“TOTEXP∼”,n)) fomu<-as.formula(paste0(“hospitalstayexp∼”,n))
rf_age4_0 < -randomForest(fomu,data = a_age4,importance = T,keep.forest = rf_hospitalstayexp<-randomForest(fomu,data = a_nomiss,importance =
T,keep.inbag = T,ntree = 200) T,ntree = 200)
set.seed(1031) set.seed(1031)
fomu<-as.formula(paste0(“doctorvisitexp∼”,n)) fomu<-as.formula(paste0(“drugexp∼”,n))
rf_doctorvisitexp<-randomForest(fomu,data = a_nomiss,importance = T,ntree rf_drugexp<-randomForest(fomu,data = a_nomiss,importance = T,ntree =
= 200). 200)

Downloaded from https://academic.oup.com/intqhc/article/32/2/99/5803041 by National Science & Technology Library user on 28 September 2023

You might also like