Garbage in - Out - JAMA - 2023

JAMA Forum
Garbage in, Garbage out—Words of Caution on Big Data and Machine Learning
in Medical Practice
Joan M. Teno, MD, MS
A computer would deserve to be called intelligent if it could deceive a human into believing that it Author affiliations and article information are
was human. listed at the end of this article.
Alan Turing
Big data are here. Increasingly, health care professionals will make clinical decisions using prediction
rules based on large data sets that use artificial intelligence or machine learning. For example,
machine learning and data mining have been used in large administrative databases to predict
numerous outcomes, including which patients are at risk for adverse events from opiates. A recent
article1 used a Canadian medication prescribing database involving 853 324 participants to predict
30-day opioid-related adverse events. The reported C statistic of 0.82 indicated good discrimination.
In addition, the key finding was that the top 0.1 percentile of estimated risk had a positive likelihood
ratio of 28.1—this translates to a posttest probability of 43.1%. The question now is whether or how
these types of findings should be used not only in opioid prescribing, but other clinical matters
as well.
As the Canadian study’s authors pointed out, a prediction rule is only as good as the
administrative data used in its development. The pharmaceutical prescribing database in their study
lacked important clinical information such as medical diagnoses including cancer or enrollment in
hospice, raising key concerns in applying this prediction rule. So, the words of caution are that this
proposed decision rule is not ready for routine use.
The overarching worry is best illustrated in the 2015 case of Google’s recognition software that
used a prediction rule to label photos for users of its Photos app. The software correctly identified a
photo of someone in a cap and gown as “graduation.” However, the software categorized a photo of
Web Developer Jacky Alcinè, a Haitian American man, and his Black friend as “gorillas.” The
experience of Alcinè demonstrates the potential hazard of not having the proper data to develop a
prediction rule. Google subsequently apologized for the error.
In medical practice, key articles2 can help3 guide clinicians to evaluate prediction rules based on
machine learning. The US Food and Drug Administration (FDA) also issued guidance in January 2021
on artificial intelligence and machine learning as a medical device. Easily applied standards such as
the C statistic—which ranges from 0.5 (a prediction no better than a coin flip) to 1.0 (a perfect
prediction)—can give false reassurance, especially with data overfitting that prevents accurate
predictions when a machine learning model receives new information. If the input data lack
sufficient clinical information, important concerns arise: risk of bias,4 adequacy or accuracy of the
candidate prediction variables, and generalizability.
So, how should the FDA or medical journal editors decide whether an inadequate data set
means that a prediction rule should or should not be used with caution? The review guidelines from
the FDA and medical journals should lay out an oversight process that ensures the safe use of
prediction rules without hindering innovations in machine learning. The oversight needs to be
stratified by risk based on the intended use. Journal editors and FDA officials need to clearly outline
their plans for the development and ongoing reassessment of prediction rules developed with
machine learning. Without the right data set, stated bluntly, there is the potential for garbage in,
garbage out.
Open Access. This is an open access article distributed under the terms of the CC-BY License.
JAMA Health Forum. 2023;4(2):e230397. doi:10.1001/jamahealthforum.2023.0397 (Reprinted) February 16, 2023 1/3
Downloaded From: https://jamanetwork.com/ on 02/26/2023

JAMA Health Forum | JAMA Forum
To guard against this concern, professionals who develop a prediction rule to guide health care
decisions with machine learning or artificial intelligence should select and review the candidate
variables based on systematic literature reviews and expert clinician insight. The interpretation of
race, ethnicity, age, and sex and their potential to propagate bias in medical practice require careful
consideration.5
Professionals developing these prediction rules also must recognize the limitations of using
administrative data for clinical decision-making. For example, machine learning and artificial
intelligence often use Medicare billing data. However, Medicare billing data may reflect health
professionals’ attempts to maximize reimbursement rather than assess the patient’s disease severity.
As data sets for machine learning change over time, developers must oversee the data’s accuracy and
how proposed rule changes on data collection could affect the prediction rule’s validity. For example,
the Centers for Medicare & Medicaid Services’ payment incentives resulted in excess documentation
of comorbid illness in Medicare Advantage plan beneficiaries. For a prediction rule used in high-risk
situations, the developer and reviewer need to consider whether administrative data are sufficient.
They may need to validate the rule by using disease registry data or perhaps through a randomized
clinical trial that examines potential risks and benefits. Even electronic medical record data from a
single health care facility included in a large data set can lead to false conclusions.
Model validation also depends in part on examining errors in prediction because potential bias
within the data set results in inaccuracy that limits a prediction rule’s clinical use. For prognostication,
it is important to determine the knowledge that an expert clinician adds to the prediction model and
whether that might limit the prediction rule’s clinical usefulness. One potential strategy is having
expert clinicians review patients’ individual characteristics that influenced predictions for them.
Because medical knowledge is ever evolving, professionals need to continually monitor
prediction rules developed with machine learning. In the span of a few decades, a diagnosis of HIV
infection evolved from a death sentence to a manageable chronic illness. It is important that real-
world monitoring for certain prediction rules becomes mandatory.
Although the technology needed to analyze complex arrays of data with sophisticated tools has
arrived, human oversight is a must. As medical journals and the FDA wrestle with this issue, we
should not be fooled into thinking that machine learning algorithms do not need people to vet them.
Computers may be learning how to imitate human behavior but, indeed, they are not human.
ARTICLE INFORMATION
Published: February 16, 2023. doi:10.1001/jamahealthforum.2023.0397
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2023 Teno JM.
JAMA Health Forum.
Corresponding Author: Joan M. Teno, MD, MS, School of Public Health, Brown University, 121 S Main St,
Providence, RI 02903 ([email protected]).
Author Affiliations: Department of Health Services, Policy, and Practice, School of Public Health, Brown
University, Providence, Rhode Island; Behavioral and Policy Sciences Department, RAND Corporation, Arlington,
Virginia.
Conflict of Interest Disclosures: Dr Teno reported receiving funding from the Centers for Medicare & Medicaid
Innovation as an investigator on the evaluation of the Value-Based Insurance Design and for the national
implementation of the CAHPS Hospice Survey, but this JAMA Forum post is independent of that work.
REFERENCES
1. Sharma V, Kulkarni V, Jess E, et al. Development and validation of a machine learning model to estimate risk of
adverse outcomes within 30 days of opioid dispensation. JAMA Netw Open. 2022;5(12):e2248559. doi:10.1001/
jamanetworkopen.2022.48559
2. Yusuf M, Atal I, Li J, et al. Reporting quality of studies using machine learning models for medical diagnosis:
a systematic review. BMJ Open. 2020;10(3):e034568. doi:10.1136/bmjopen-2019-034568

JAMA Health Forum | JAMA Forum
3. Liu Y, Chen PC, Krause J, Peng L. How to read articles that use machine learning: Users’ Guides to the Medical
Literature. JAMA. 2019;322(18):1806-1816. doi:10.1001/jama.2019.16489
4. Andaur Navarro CL, Damen JAA, Takada T, et al. Risk of bias in studies on prediction models developed using
supervised machine learning techniques: systematic review. BMJ. 2021;375(2281):n2281. doi:10.1136/bmj.n2281
5. Ibrahim SA, Pronovost PJ. Diagnostic errors, health disparities, and artificial intelligence: a combination for
health or harm? JAMA Health Forum. 2021;2(9):e212430. doi:10.1001/jamahealthforum.2021.2430

Garbage in - Out - JAMA - 2023

Uploaded by

Copyright:

Available Formats

Garbage in - Out - JAMA - 2023

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Garbage in - Out - JAMA - 2023

Uploaded by

Copyright:

Available Formats

JAMA Forum

Downloaded From: https://jamanetwork.com/ on 02/26/2023

Downloaded From: https://jamanetwork.com/ on 02/26/2023

Downloaded From: https://jamanetwork.com/ on 02/26/2023

You might also like