- Churpek, Matthew;
- Gupta, Shruti;
- Spicer, Alexandra;
- Hayek, Salim;
- Srivastava, Anand;
- Chan, Lili;
- Melamed, Michal;
- Brenner, Samantha;
- Radbel, Jared;
- Madhani-Lovely, Farah;
- Bhatraju, Pavan;
- Bansal, Anip;
- Green, Adam;
- Goyal, Nitender;
- Shaefi, Shahzad;
- Parikh, Chirag;
- Semler, Matthew;
- Leaf, David
OBJECTIVES: Critically ill patients with coronavirus disease 2019 have variable mortality. Risk scores could improve care and be used for prognostic enrichment in trials. We aimed to compare machine learning algorithms and develop a simple tool for predicting 28-day mortality in ICU patients with coronavirus disease 2019. DESIGN: This was an observational study of adult patients with coronavirus disease 2019. The primary outcome was 28-day inhospital mortality. Machine learning models and a simple tool were derived using variables from the first 48 hours of ICU admission and validated externally in independent sites and temporally with more recent admissions. Models were compared with a modified Sequential Organ Failure Assessment score, National Early Warning Score, and CURB-65 using the area under the receiver operating characteristic curve and calibration. SETTING: Sixty-eight U.S. ICUs. PATIENTS: Adults with coronavirus disease 2019 admitted to 68 ICUs in the United States between March 4, 2020, and June 29, 2020. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: The study included 5,075 patients, 1,846 (36.4%) of whom died by day 28. eXtreme Gradient Boosting had the highest area under the receiver operating characteristic curve in external validation (0.81) and was well-calibrated, while k-nearest neighbors were the lowest performing machine learning algorithm (area under the receiver operating characteristic curve 0.69). Findings were similar with temporal validation. The simple tool, which was created using the most important features from the eXtreme Gradient Boosting model, had a significantly higher area under the receiver operating characteristic curve in external validation (0.78) than the Sequential Organ Failure Assessment score (0.69), National Early Warning Score (0.60), and CURB-65 (0.65; p < 0.05 for all comparisons). Age, number of ICU beds, creatinine, lactate, arterial pH, and Pao2/Fio2 ratio were the most important predictors in the eXtreme Gradient Boosting model. CONCLUSIONS: eXtreme Gradient Boosting had the highest discrimination overall, and our simple tool had higher discrimination than a modified Sequential Organ Failure Assessment score, National Early Warning Score, and CURB-65 on external validation. These models could be used to improve triage decisions and clinical trial enrichment.