Artificial Intelligence Tools and Open Data Practices For EPA Chemical Hazard Assessments: Proceedings of A Workshop in Brief (2022)

This PDF is available at http://nap.naptionalacademies.
org/26540
Artificial Intelligence Tools and Open

Data Practices for EPA Chemical Hazard
Assessments: Proceedings of a Workshop
in Brief (2022)
DETAILS
12 pages | 8.5 x 11 | PDF
ISBN 978-0-309-68672-3 | DOI 10.17226/26540
CONTRIBUTORS
Jeanette Beebe, Raymond Wassel, Kaley Beins, and Kathryn Z. Guyton, Rapporteurs;
Board on Environmental Studies and Toxicology; Division on Earth and Life
BUY THIS BOOK Studies; National Academies of Sciences, Engineering, and Medicine
FIND RELATED TITLES SUGGESTED CITATION

National Academies of Sciences, Engineering, and Medicine 2022. Artificial
Intelligence Tools and Open Data Practices for EPA Chemical Hazard Assessments:
Proceedings of a Workshop in Brief. Washington, DC: The National Academies
Press. https://doi.org/10.17226/26540.
Visit the National Academies Press at nap.edu and login or register to get:
– Access to free PDF downloads of thousands of publications
– 10% off the price of print publications
– Email or social media notiﬁcations of new titles related to your interests
– Special offers and discounts
All downloadable National Academies titles are free to be used for personal and/or non-commercial
academic use. Users may also freely post links to our titles on this website; non-commercial academic
users are encouraged to link to the version on this website rather than distribute a downloaded PDF
to ensure that all users are accessing the latest authoritative version of the work. All other uses require
written permission. (Request Permission)
This PDF is protected by copyright and owned by the National Academy of Sciences; unless otherwise
indicated, the National Academy of Sciences retains copyright to all materials in this PDF with all rights
reserved.
Artificial Intelligence Tools and Open Data Practices for EPA Chemical Hazard Assessments: Proceedings of a Workshop–in Brief
Proceedings of a Workshop—in Brief
Artificial Intelligence Tools and Open Data

Practices for EPA Chemical Hazard Assessments
Proceedings of a Workshop—in Brief
The U.S. Environmental Protection Agency’s (EPA’s) SR methods use rigorous, pre-established protocols in
Integrated Risk Information System (IRIS) Program which trained professionals strive to find all relevant
identifies and characterizes the human health hazards of studies, extract information concerning the reported
chemicals found in the environment. Human health risk methods and findings, critically analyze the information,
assessments cover hazard identification as well as dose- and summarize the information in a reusable manner.
response analyses for cancer and noncancer outcomes Because SRs for human health questions can involve
that are obtained from IRIS assessments. Human health consideration of a vast number of studies, they tend to be
risk assessments are highly important as they are used to labor intensive and costly. These constraints apply not only
inform a broad range of risk-related decisions across the to EPA but also to other government and nongovernmental
agency. These assessments involve systematic reviews organizations in the United States and abroad that conduct
(SRs) of the scientific literature, which obtain, evaluate, SRs to address environmental health questions.
and summarize information to answer a research
question in a transparent manner. Although various software is now commonly used to help
screen studies for inclusion in SRs, the process is still
The Center for Public Health and Environmental time intensive. Additionally, subsequent steps, such as
Assessment (CPHEA) within EPA’s Office of Research the extraction of data from the literature, continue to be
and Development (ORD) requested that the National done manually.
Academies of Sciences, Engineering, and Medicine
convene a workshop (see Box 1) to explore opportunities Recent advances in AI, machine learning, and data science
and challenges in using advances in artificial intelligence hold the promise of using computer-assisted tools to
(AI) and data science to enhance human health risk increase the efficiency of SR methods. These tools could
assessments. The workshop was held virtually on May 25 enable more rapid screening of the literature and may
and 26, 2022. allow for automated data extraction. However, potential
concerns about the reliability and reproducibility of the
July 2022 | 1
Copyright National Academy of Sciences. All rights reserved.

BOX 1
Workshop Presenters and Discussants
Day 1 Day 2
Topic: Promises and Prospects of AI and Data Science Topic: AI Tools and Resources in SR
Applications Moderator: David Reif, North Carolina State University
Moderator: Chirag Patel, Harvard Medical School • Andrew Rooney, NIEHS
• Daniel Ho, Stanford University • Vickie Walker, NIEHS
• Christopher Mungall, Lawrence Berkeley National • Brian Howard, Sciome, LLC
Laboratory • Nancy Baker, Leidos
• Nicole Kleinstreuer, National Institute of • Karen Ross, Georgetown University
Environmental Health Sciences (NIEHS) • Mark Musen, Stanford University
• Julie McMurry, University of Colorado School of
Topic: Challenges for Applying AI to SR
Medicine
Methods
Moderator: Scott Auerbach, NIEHS Topic: SR Tools
• Malcolm Macleod, University of Edinburgh Moderator: Joyce Tsuji, Exponent, Inc.
• Rens van de Schoot, Utrecht University • Ryan Jones, U.S. Environmental Protection Agency
• Karen A. Robinson, Johns Hopkins University (EPA)
• Olwenn Martin, University College London • Sean Watford, EPA
• Derek Lord, Evidence Partners
Topic: Optimizing Data Extraction for Evidence Synthesis
• Eitan Agai, PICO Portal
Moderator: Byron Wallace, Northeastern University
• Artur Nowak, Evidence Prime
• Weida Tong, U.S. Food and Drug
• Iain Marshall, King’s College London
Administration
• Jason Fries, Stanford University Topic: Ensuring Rigor and Reproducibility
• Daniel Sanders, IBM Moderator: David Reif, North Carolina State University
• Marianthi-Anna Kioumourtzoglou, • Marzyeh Ghassemi, Massachusetts Institute of
Columbia University Technology
• Chirag Patel, Harvard Medical School
• John Absher, Squarespace, Inc.
results from the application of these tools, as well as AI tools to facilitate the process of extracting data on
limitations of natural language processing (NLP), could study design and results from the scientific literature.
complicate efforts to apply the tools more widely.
Workshop presentations and panel discussions are
In setting the stage for the workshop, Kristina Thayer summarized in this document. Posters presented during
(Chemical and Pollutant Assessment Division, CPHEA/ the workshop are available on the National Academies
ORD, EPA) indicated that over the past several years there website.1
has been substantial onboarding of machine learning
applications to help with the process of screening studies 1
See https://www.nationalacademies.org/event/05-25-2022/workshops-to-support-epas-
development-of-human-health-assessments-artificial-intelligence-and-open-data-practices-
for further consideration. There is also interest in using in-chemical-hazard-assessment.
July 2022 | 2

ARTIFICIAL INTELLIGENCE AND DATA SCIENCE APPLICATIONS: goal of AI is to replace human intelligence. The concept
PROMISES AND PROSPECTS of augmented intelligence, she said, encompasses the
Daniel Ho (Stanford University) began the session use of data science and computational tools to enhance
by providing an overview of the use of innovative AI and support the human intellect in generating insights
tools in the federal government. He noted that 45% into human disease processes and their susceptibility to
of federal agencies are experimenting with AI but that environmental perturbation.
federal officials have faced challenges of sophistication,
accountability, and explainability. As an example, he cited Kleinstreuer discussed the use of machine learning
the inability to identify the source of errors in the results approaches to train quantitative structure activity
of applying biometric scanning technology. He also noted relationship (QSAR) models. QSAR models are used to
capacity issues, including the importance of devoting find complex relationships between chemical features,
adequate human capital to a project. He highlighted such as molecular structures, physicochemical properties,
blended expertise, both technical knowledge of AI and and toxicity values. She indicated that the NTP open-
the ability to draw on recent advances as well as subject- access tool OPERA (Open Quantitative Structure-activity/
matter expertise relevant to the problems being solved. property Relationship App or OPEn (q)saR App) that is
a free resource that provides toxicity predictions based
Christopher Mungall (Lawrence Berkeley National on chemical structures. The downloadable, open-access
Laboratory) highlighted the challenge that arises when modeling suite predicts the toxicity of agents, such as
the knowledge landscape is fragmented, existing as flame retardants, pesticides, and air pollutants, revealing
natural language text in the literature. He discussed how what factors are likely to contribute to or worsen adverse
ontologies (formally defined vocabularies) can help to health outcomes. OPERA contains predictions for nearly
organize and annotate the information for the application 1 million chemical structures, training videos, testing
of AI tools. Mungall described the gene ontology project and assessment strategies, computational models, and
as an example of the use of ontology for gene function. workflows to analyze chemical data.
He also described an upsurge of interest in the use of Kleinstreuer highlighted the need for high-quality,
knowledge graphs for integrating data and applying openly available, well-annotated datasets; computing
machine learning in the biosciences. However, the resources; and data infrastructure to foster the use of AI
knowledge graphs are not compatible with one another. tools for addressing environmental health questions.
To address this, Mungall is working on a biomedical
translator project with the National Center for Advancing Chirag Patel (Harvard Medical School) served as the
Translational Sciences at the National Institutes of Health moderator for a panel discussion among the presenters.
to develop a standard data model, referred to as Biolink The topics discussed included how to weigh the evidence
Model, for knowledge graphs. The project seeks to bring to be used for taxonomizing; recruiting individuals with
together a number of different automated systems, the expertise needed to develop and apply AI methods;
knowledge sources, and knowledge providers, so that a obtaining training data for AI tools, transparency
user can ask questions such as what chemicals or drugs in the use of complex models; and various uses of
may be used to treat certain neurological disorders that crowdsourcing approaches. Patel also asked the panelists
are associated with particular gene variants. for their opinions about transformative opportunities
over the next few years.
Nicole Kleinstreuer (National Toxicology Program [NTP]/
National Institute of Environmental Health [NIEHS]) Mungall mentioned hybrid systems that combine deep
proposed to define AI as “augmented intelligence” rather learning with older AI methods and better enforcement
than “artificial intelligence” to avoid the notion that the of standards and good data practice for making data
findable, accessible, interoperable, and reusable.
July 2022 | 3

Kleinstreuer listed increased dialogue with end users of one SR is suitable and accessible for reuse by others. He
AI tools, regulatory decision-makers, and the interested added that this could be supplemented by the automated
public. She also mentioned systems models, citing annotation of relevant literature.
genome-scale metabolic models and agent-based
modeling as examples. Another promising area is the The problem of data overwhelm was described by Rens
development of compute-optimal large language models, van de Schoot (Utrecht University). When a person
she said, such as the DeepMind model named Chinchilla, searches a database to answer a question, two problems
which significantly outperforms the existing models. come up, he said: (1) there are too many papers to read
and (2) there are too few relevant papers. With his open-
Ho mentioned advances in neuromorphic computing source project, ASReview, van de Schoot focuses on active
that may radically improve the capabilities of ordinary learning, which is machine learning that can be deployed
researchers, the social gains possible from the application to present the most relevant paper, he said. Though there
of AI tools, and new kinds of government–academies are differences across datasets and machine learning
partnerships for better information exchange concerning models, he said that “all of the simulation studies show
AI tools development and applications. He noted the that active learning outperforms human reading by far.”
promising potential for a national AI research resource.
van de Schoot presented several principles in using any
ADDRESSING CHALLENGES FOR APPLYING SYSTEMATIC REVIEW software package that implements AI:
METHODS USING ARTIFICIAL INTELLIGENCE
In the next session of the workshop, Malcolm Macleod • humans being in control,
(University of Edinburgh) discussed the limitations of • a completely open and transparent application,
human screening for SRs that can become incorporated
• application using an unbiased estimate,
into computer-assisted screening tools. Macleod stated
that the gold-standard approach, defined as using two • being aware of when an AI-aided interface is used,
human reviewers to perform title and abstract screening, and
is in need of improvement. He said that approach can
• garbage in, garbage out.
miss many of the relevant results, especially when
multiple experiments are included in one publication.
Macleod discussed the development of tools to address Scott Auerbach (NIEHS) moderated a panel discussion
this challenge by providing automatic PICO (population, among the two presenters, Karen A. Robinson (Johns
intervention, comparison, and outcomes) extraction from Hopkins University School of Medicine) and Olwenn
abstracts and semi-automated data extraction via graphs, Martin (University College London).
which save time without sacrificing accuracy.
Robinson noted that a fundamental challenge is the
Macleod raised a question concerning the amount of data lack of standards that govern SRs. Martin stated that in
loss that should be considered acceptable. He noted that chemical risk assessment, researchers might struggle
only a limited amount of work has been done to assess with identifying chemicals properly, depending on
the impact that failing to identify information sources how they are labeled in the literature. With toxicology,
has on SR conclusions. However, Macleod added, it is the guideline studies are relatively uniform in their
clear that a certain amount of missing data would not reporting, Auerbach added, but traditional manuscripts
make a material impact on the conclusions one would remain non-standardized.
draw.
Martin expressed hesitation about fully automated data
Macleod indicated that he imagines a central data store extraction. van de Schoot asked about trust in AI such
in which an annotated label on a citation source from that a researcher could set rules and let the machine do
July 2022 | 4

the work. Robinson said she believes that all parts of SR a fundamentally unlabeled set of data, which are then
could be assisted by AI tools. van de Schoot underscored used to train a machine learning model.
the aspect of AI assistance rather than full automation.
According to Fries, programmatic labeling is a consistent
Macleod offered what he characterized as a practical and reproducible approach. It breaks down a problem,
response. “I think that having some measure of which an expert might encode in a single label, into a
the provenance of the claim that you’re making as system of modular components. That offers opportunities
a regulator is important,” he said, adding that this to audit training data and interrogate assumptions used
approach asks the regulator to acknowledge the to generate training data.
percentage of data (typically up to 10%) that is missing
from the analysis. Fries said a potential direction for programmatic labeling
is natural language prompting in which people write
OPTIMIZING DATA EXTRACTION FOR EVIDENCE SYNTHESIS AND insights as natural language instruction, instead of using
HIGH-LEVEL DECISION-MAKING programming language. This could open the labeling
Weida Tong (U.S. Food and Drug Administration [FDA]) process to be more accessible to domain experts who are
discussed AI4TOX, an FDA program that focuses on not trained in writing code.
applying new AI methods in the field of toxicology to
inform regulatory decision-making. Tong explained Daniel Sanders (IBM) spoke about IBM’s use of AI tools
that as part of the program they are developing a model to help in discovering safer chemicals and materials
called AnimalGAN that will use generative adversarial that are used in the production of computer chips. It
networks (GANs) to learn from past animal studies in involves incorporating chemical hazard assessments
such a way that it can generate animal study results for into AI-based design workflows for new hypothetical
new and untested compounds without conducting new chemicals. Sanders provided a use case example of
in vivo animal studies. They developed another model photo acid generators, which are a class of molecules
called Tox-GAN to determine the underlying mechanism used to pattern semiconductor devices. To develop
of the toxicity using gene expression data. He indicated alternatives with better environmental health and safety
that AnimalGAN and Tox-GAN can be extremely helpful characteristics, Sanders’s team trained AI models to
in generating toxicological information on chemicals produce tens of thousands of chemical candidates, then
on the basis of toxicological information on other screened them for further evaluation. He indicated that
chemicals. as AI models get more sophisticated, there is a need for a
platform that allows multiple domain experts to interact
Jason Fries (Stanford University) discussed programmatic with AI specialists.
labeling for data-centric NLP. He indicated that for
traditional supervised learning, experts observe and Sanders said there is a need for improved data standards
label examples from some data distribution and use for publication and patenting such that existing and
those labels to train some sort of model. This method future NLP tools can more readily integrate information.
is expensive and slow, and it is difficult to change or
revisit decisions that were made for generating that Marianthi-Anna Kioumourtzoglou (Columbia
data. For the past several years, he has been exploring University) discussed how machine learning can assist
the use of programmatic labeling or weak supervision. epidemiological studies of chemical mixtures. A well-
Instead of having domain experts manually label single defined research question is crucial, she said, because it
data points, the focus is on designing labelers as a enables the identification of the best methods for finding
general concept, considering aspects such as rules and answers—and to then state this question explicitly in the
interactions with knowledge bases or ontologies. The paper so an AI tool can readily capture that information
objective is to automatically generate a training label on and classify the paper for SR.
July 2022 | 5

Kioumourtzoglou also discussed random sampling in etc.). The SRs consider multiple evidence streams
machine learning and the effect of seed selection. All of including human and animal studies as well as a
the methods she examined exhibited some seed-dependent diversity of other studies comprising pharmacokinetics,
variability in the results. She indicated that the degree of mechanistic evidence, and exposure information.
variability differed across methods, the kind of organic Additionally, the relevant data are published in diverse
congeners examined, and would likely vary with sample forms and the studies are heterogeneous in design. He
size. Kioumourtzoglou stated that seed sensitivity analysis noted the diversity of study types and endpoints relevant
can help evaluate robustness and interpretability. However, for all evidence streams. This complexity presents a
if the results are highly variable across seeds, it would help challenge for identifying and extracting data as well as
researchers extract information for SRs if the distributions for developing models, he said. Rooney also highlighted
of the estimated effects across seeds were presented. the challenge of annotating studies. He noted the need
for annotated datasets “from a really diverse spectrum,
In a panel moderated by Byron Wallace (Northeastern so that the models can be applied to the diverse and
University) participants discussed various topics, heterogeneous data we need to make our decisions.”
including practical limitations and needs related to
building and deploying AI solutions. Fries and Wallace The next section of the workshop moderated by David
noted an institutional issue of making a case to justify Reif (North Carolina State University) provided an
the financial resources needed to obtain, apply, and overview of a set of tool demonstrations that were pre-
maintain AI systems. recorded.
Sanders indicated that data access is a barrier because Vickie Walker (NTP/NIEHS) presented Dextr, a tool that
of the reluctance to share data and analysis plans. That supports semi-automated data extraction using machine
makes industry-wide collaboration difficult. He pointed learning models. Dextr processes full-text PDFs and
to the need for advancements in federated learning and enables researchers to share data in a machine-readable
encryption, and to find a way for AI models to learn format that can be used by model developers. Users can
in the absence of data sharing. Kioumourtzoglou said upload several PDFs at one time, import references, and
that training people can be an issue, and Tong added use a data-cleaning module. The tool currently focuses
that reliability, interpretability, applicability, and on extracting metadata to support evidence maps, but
reproducibility are also pressing concerns. future plans call for the extraction of study results,
Walker said.
USING ARTIFICIAL INTELLIGENCE TOOLS AND RESOURCES IN
SYSTEMATIC REVIEW Brian Howard (Sciome, LLC) presented a platform called
The second day of the workshop covered how machine Swift AI, which combines three tools: Swift Review
learning tools are being used—and may be used in the (evidence mapping, exploratory analysis), Active Screener
future—with a focus on SRs. Andrew Rooney (NTP/ (prioritizing studies), and FIDDLE 2.0 (extracting data).
NIEHS) opened the day with an overview. He noted that Swift Review and Active Screener focus on describing the
a key challenge for SRs of environmental health hazards problem and choosing the studies that will work best
is their broad scope. For instance, all health effects for with machine learning. FIDDLE 2.0 extracts text from
an individual chemical may be examined in a series of PDFs. Howard noted that this is not straightforward
SRs. Other environmental questions examined through because the PDF is an image-driven format and was not
SR concern all of the exposures that might be associated developed to preserve the underlying text flow. FIDDLE
with a particular disease or health effect, mixtures, 2.0 uses an algorithm to retain word order and placement
or chemical classes (e.g., the organohalogen flame across columns and tables. Overall these tools aim to
retardants, the per- and polyfluoroalkyl substances,
2
introduce efficiencies, such as by pre-populating forms,
2
See https://pubmed.ncbi.nlm.nih.gov/31436945.
July 2022 | 6

and making inferences about the user’s intent based on ability to identify the ontologies that might work well
interaction and context, Howard said. with the text they are annotating. Researchers can also
save their own ontologies. Musen also shared details
An Excel-based tool called Abstract Sifter was presented about the Center for Expanded Data Annotation and
by Nancy Baker (Leidos). Abstract Sifter enhances the Retrieval (CEDAR), which aims to “create standards-
search capabilities of PubMed by making searches compliant metadata that would enhance the fairness
effective, triaging results, and tracking articles of of datasets that experimenters are putting into online
interest. This tool also provides an overview of the repositories.” With CEDAR, researchers can write a
literature for a set of chemicals or genes. The next template that will follow a standard reporting guideline
version (7.1) will be enabled with an application for discussing a given domain of science. Musen said
programming interface to allow the user to obtain the that this helps users feel “confident about the metadata,
DSSTox ID for a list of chemicals in a PDF table or from
3
and the fact that the metadata are saying what they need
medical subject headings (MeSH) terms. With the DSSTox to say about experiments, so that those third parties
ID, the chemical structure, CAS number, and toxicology can actually understand what was done in those various
profile, including whether the chemical has been tested, investigations.”
are all available in one tool, Baker said.
Panel discussant Julie McMurry (University of Colorado
Karen Ross (Georgetown University) discussed her work School of Medicine) began with a few notes about data
with UniProt, a hub of protein sequence and function for decision-making. She emphasized that it is vital
data. She said that a major challenge is the annotation for AI tools to be assistive but not prescriptive, with
of protein sequences. “We believe that the solution is a feedback loop to steer the AI in the right direction
to enhance our ability to automatically extract protein to improve the algorithm itself. Musen noted that
information from the scientific literature,” Ross said, investment in high-quality metadata is essential and will
as well as a tool that would “automatically predict enhance understanding of what the datasets represent
annotations for poorly understood proteins using and how the experiments were done. In the future,
information from the manually curated entries.” Her there might be a big cultural shift, she added, whereby
team has been seeking out text-mining ontologies researchers will view “the primary output of the search
and functional prediction tools, and it has developed as being the data, rather than the publication.”
an ontology project called PRO, which is the reference
ontology for proteins and protein forms in the Open The panel touched on the issues of science literacy,
Biomedical and Biomedical Ontologies Foundry. Because public engagement, and metadata as well as
PRO is distributed in standard formats like OWL, it can crowdsourcing. Rooney noted that a focus on reporting
be searched with the database query language SPARQL. objective measures is key to increasing public confidence,
This “makes it easier to integrate PRO information with as quality judgments and synthesis decisions can
information from other ontologies on the semantic web.” be sources of disagreement. New ways of crediting
contributions to publications could incentivize the
Mark Musen (Stanford University) described a system availability of metadata, noted Reif. Community curation
called BioPortal, which is an open repository for is possible and UniProt’s interface allows anyone to
biomedical ontologies. He described Annotator, a suggest a relevant paper for a protein entry and also
tool that allows users to “relate journal articles to propose annotations for the protein from that paper,
standardized terms in a way which allows us to cluster Ross said. “It’s our experiment with trying to get more
articles [and] to count articles.” This offers users the crowdsourcing of protein annotations.”
3
See https://www.epa.gov/chemical-research/
distributed-structure-searchable-toxicity-dsstox-database.
July 2022 | 7

Elaine Faustman (University of Washington) noted that a search. DistillerSR tracks and manages reviews and
the World Data System’s Scientific Committee (which published references in one place. Users can also “extend
she co-chaired), a subsection of the International DistillerSR into different data platforms, such as safety
Science Council, endorsed the idea of professional code databases, predictive analytics, or AI-based tools,” he
developers receiving more credit as a means to encourage said. Lord shared an example use case of DistillerSR’s
future developments in this area. intelligent workflows and AI to stay updated with
COVID-19 data. “They were able to streamline the
SYSTEMATIC REVIEW TOOLS process using AI classifiers to help screen and complete
Ryan Jones (EPA) introduced a database system resource the review,” Lord said. DistillerSR cut the literature
called HERO (Health and Environmental Research review screening time in half. DistillerSR is able to
Online). The system was developed to support EPA’s de-duplicate the results of searches, check for errors
integrated science assessments, which have grown during the screening process, and also support the data
substantially over the past three decades. HERO is a extraction component, classifying and organizing it with
repository of citations that can be screened, categorized, premade templates, and, finally, reporting that data and
and shared. “If something gets cited, we have a copy pulling it out of the system.
of it,” Jones said, which enables the agency to be
transparent with the public about the cited literature. Eitan Agai (PICO Portal) shared his work using
HERO’s screening tools enable search results to be automation (machine learning and NLP algorithms)
viewed by field of expertise, which is efficient for the to expedite SRs. PICO Portal, an online SR platform, is
multidisciplinary teams processing literature searches. designed to serve evidence synthesis projects of any
HERO also offers a keyword-free search called citation size or scale, Agai said. Agai placed PICO Portal in the
mapping, which Jones said has been demonstrated to be company of other SR software programs, including
“three or four more times likely to return results that abstrackr, DistillerSR, and Rayyan. Agai shared how
we actually need than the traditional keyword search.” his prior experience in the mortgage and financial
In addition, Jones said HERO supports third-party tools industry translated to his current project. “I found
such as Distiller and Swift. out that systematic review has similar pain points as
mortgage automation,” he said, outlining the process of
Sean Watford (EPA) demonstrated an SR tool called classifying documents, extracting data from documents
Health Assessment Workspace Collaborative (HAWC) (bank statements, paystubs, W-2s, etc.), and following
that was developed by EPA’s Andy Shapiro. HAWC is guidelines and regulation. PICO Portal uses machine
used in assessment programs across EPA and tackles learning to help classify documents and prioritize articles
interoperability, which Watford described as a major during the screening process and provides analytics such
challenge. Watford shared an example of using as user accuracy and speed. His team received an NIEHS
the Environmental Health Vocabulary, a controlled grant to continue the work of pairing NLP with ontology.
vocabulary that applies to animal health outcomes,
but that may be expanded in the future to try to bring The next tool was Laser AI, demonstrated by Artur
more structure to the extracted data. This opens the Nowak (Evidence Prime), who indicated that the
opportunity to reuse data, including as a training dataset company’s approach is largely one of augmented
for model development. intelligence, an idea Kleinstreuer introduced earlier.
Rather than trying to replace human decisions and
The next tool, DistillerSR, was presented by Derek human intelligence, they are trying to enhance them
Lord (Evidence Partners). A Software-as-a-Service– using AI tools, Nowak said. “I hope that this mix of
based platform, DistillerSR’s objective is to help human and artificial intelligence will save us from the
make literature reviews faster and easier, Lord said, information overload,” he added. Nowak shared case
throughout the entire review lifecycle, beginning with studies from the clinical space, including an SR for the
July 2022 | 8

World Health Organization in which Laser AI acted as develop augmented human intelligence tools to tackle
a third reviewer. The role of the SR tool is to connect the most time-consuming aspect of science assessments:
publications, entities, and ontologies. “As AI, we provide extracting information from articles. Watford, Lord, and
some sort of a link that is then validated by people as Agai cited data extraction as a barrier.
part of their systematic review work,” he said. He also
indicated that it would be desirable to enable querying of There are also challenges with the PDF format. The vast
knowledge graphs through links curated by humans, and majority of the literature is still unstructured data in
those predicted by AI. the form of PDF, noted Agai, adding that researchers
and academics are not under a lot of pressure to adapt
Iain Marshall (King’s College London) presented on and make the data extraction process more efficient.
RobotReviewer. Marshall spoke about RobotReviewerLive Tsuji agreed, highlighting the need for a breakthrough
and his team’s research to learn “whether we can use in how data are structured, to make them readily usable.
a machine learning system which is augmented with Watford noted that, ideally, the basis for decisions would
some human experts to try to keep systematic reviews be provided along with the metadata published with the
up to date with low latency.” With a database called article or otherwise available in a structured repository of
Trialstreamer, his team is automatically collecting all of the supplemental information and data. Marshall
publications on randomized controlled trials daily. “We noted that agreement on a shareable format is essential
automatically extract structured data from them to to solve the problem of data shareability. He added that
identify the characteristics,” Marshall explained. The collaborations between academia and industry might be
tool is useful for maintaining and updating an existing helpful to promote standards of software development
SR using automated models and rules to screen and scan and usability. Watford added that standardizing units
new trials, with the aim of identifying and collecting for this task can be extremely difficult and encouraged
relevant studies for human review. Another project progress in this area.
in an early stage piloted an SR of COVID vaccination
candidates: the work identified 100% of the relevant Regarding pathology, Tsuji noted the challenges of
studies, with a precision of about 40% (i.e., screeners interpretation by a machine and the benefits brought
included about 40% of the articles), he said. “It’s quite through the experience of a seasoned pathologist. At
a substantial improvement compared with conventional the same time, she asked if pathology’s basis in pattern
search,” Marshall said. Another experimental path recognition provides an opportunity for toxicology or
is using neural networks for generating narrative clinical studies. Watford mentioned a company called
summaries automatically from trials. His team is PathAI, which provides tools for machine-assisted
exploring how to include humans in the loop to validate pathology assessment, noting that it was not possible
and edit these summaries to improve reliability. to exclude humans from the decision-making. Rooney
added that NTP is actively pursuing AI technologies
Session moderator Joyce Tsuji (Exponent, Inc.) reflected for histology because its histology slides contain an
that tolerance for AI errors is lower than for human extensive amount of data. The project asks experts how
error. She cited the example of car accidents caused by much of the data they can actually use to reach reliable
self-driving cars compared to those caused by human conclusions.
drivers. If AI tools can expedite regulatory assessments
and fill important gaps by supporting health protective ENSURING RIGOR AND REPRODUCIBILITY IN ARTIFICIAL
limits, what is the tolerance for errors? In this vein, Tsuji INTELLIGENCE APPLICATIONS
then asked about the opportunities for AI in supporting Marzyeh Ghassemi (Massachusetts Institute of
SR workflows and methodologies, and about the potential Technology) discussed her work in designing machine
barriers to advancement. Jones noted that the HERO learning processes for equitable health systems. She
project has developed tools to help screen and focus on explained that when considering reproducibility, it is
the likelihood of relevance, but opportunities remain to important to ask what it means for a model to perform
July 2022 | 9

well in different settings with different subgroups, and “think about letting the machines do what they are good
with different data. It is also important to understand the at—pulling out everything vaguely relevant—and then
process that generated the model and provide caveats. letting humans do that last mile where the precision
Big-picture tools that accomplish this in a transparent becomes super important.”
way include data sheets for datasets and model cards for
model reporting, she said. Ghassemi reflected on best practices for algorithm-
assisted (machine) learning, including multiple
Ghassemi said there are likely no simple fixes for ethical data sources and results that can be reproduced and
issues related to AI applications in health as well as other replicated. Reif highlighted the challenge in measuring
spaces. This is an ongoing process that needs diverse data ranges widely across fields and applications, from
data and diverse teams. It also calls for considering the tracking user behavior in an app to Patel’s exposomics
sources of bias in data, evaluating them comprehensively, work, where “you are asking to measure everything
and recognizing that not all gaps can be corrected. forever for all people.”
Patel discussed probing the robustness of exposome Ensuring rigor and reproducibility depends on solid
phenotype associations with what is called multi-verse source data, Reif said. Absher stressed the importance of
approaches. He said, “we can implicate many genetic collecting a representative sample that is as accurate as
variances all simultaneously, and say that they are robust possible, because “the thing that haunts us all, no matter
and reproducible.” He characterized the genome-wide what context, are the things that we are not seeing,”
association studies as a prime example of reproducible such as nonresponse bias.
observational science, and that one area of improvement
for the environmental health sciences is measuring Reif asked the panel about desired AI innovations for SRs.
exposures in human populations. Patel said that his team is gathering summary statistics
from papers and it is a struggle to extract data from the
To explore the idea of reproducibility, Patel referenced PDFs. Open datasets can inform how these summary
the work of Steve Goodman and his colleagues at the statistics are produced and provide a way of filtering them
Meta-Research Innovation Center at Stanford, who found for SRs. Absher added that attempting to extract data from
that reproducible research is defined by its methods (is PDFs is remarkably difficult, and that structured data are
there enough detail to repeat the study?), its inferences more valuable and useful than unstructured data. Ghassemi
(will users arrive at the same qualitative conclusion?), its underscored the importance of public, open datasets that
results, and its robustness. Patel described robustness in are heterogeneous, diverse, and representative.
terms of:
Reif closed the panel by reflecting on the idea that data
• How stable are results to variations and assumptions availability does not solve all problems, but it provides
to the study design, modeling approaches, or for the information needed to address them. In the case of
example, the seeds that are applied to stochastic SR, it must be systematic, and it is not a system if it is
algorithms for learning and for AI? not reproducible. Data are becoming available that can
help address questions about rigor and reproducibility.
• The degree of determinism.
• Signal-to-measurement errors. WORKSHOP SUMMARY AND FINAL THOUGHTS

• Statistical criteria for validity claims. In the final session of the workshop, members of the
planning committee summarized topics that were
Reif moderated the panel discussion, which was joined explored during the workshop and shared examples of
by John Absher (Squarespace, Inc.). Absher began by key takeaways (including opportunities) about the use
exploring the idea of rigor. He noted that SR requires of AI tools and open data practices in chemical hazard
“high precision and high recall,” and one strategy is to assessments.
July 2022 | 10

Faustman highlighted the need for blended expertise underscored the use of crowdsourcing approaches
(i.e., expertise in computer-assisted tools and relevant as a theme related to the application of AI tools for
subject matter). She framed the concept of explainability SR. Faustman suggested the use of knowledge maps
as a challenge in the application of AI tools because to indicate what different programs do for SR. Tsuji
some of the algorithms and search functions are not mentioned that it would be useful to compare all of the
completely clear. Faustman also mentioned the prospect different AI tools on the basis of what they can do and
of expanding the application of ontologies and knowledge how they go about it. That would help users decide what
graphs as a way of acting on the taxonomization of tool to select for a particular SR application. Wallace
environmental health entities for making SRs more cautioned not to lose site of the importance of building
efficient. prototype AI tools that can eventually become mature
enough for common usage.
Auerbach reflected on the challenges that researchers
face when using AI with SRs, especially the themes of Tsuji outlined several barriers, including annotation
reproducibility and transparency. Though AI tools can and the need to go beyond the PDF to allow for data
work quite well in filtering publications, it can be more extraction in a more accurate and efficient manner.
challenging to extract data. He listed several potential Wallace indicated that tools are available that expedite
steps: creating pre-structured data sources, systematic working with PDFs. Tsuji also mentioned that, because
online living evidence summaries or repositories, of the competitive environment they work in, some
metadata pre-annotation by authors and publishers, scientists can be reluctant to make all of their data
guidance for end users, curated resources, and open- available. Faustman mentioned possible funding
source tools. incentives offered by agencies for sharing datasets. Tsuji
underscored giving recognition to those who generated
When considering how to optimize data extraction the data in the first place.
for evidence synthesis and decision-making, Wallace
emphasized a need for more training data and labeled Wallace commented that a useful project would be to
data. He mentioned the potential benefit of using train an AI tool to convert PDFs in the open-access
alternative supervision strategies to extract data, such subset in PubMed Central into XML. He added that there
as a rule-based method, or distance supervision. He is a sufficiently sized training dataset to do this.
underscored the importance of transparency, which
can help inform decision-making, because it enables As the conversation moved on to opportunities for
an assessment of the provenance of the prediction or implementation in the short and long term, Wallace
decision. highlighted the need for more annotated data that are
readily available.
Reif indicated that, because bias can be amplified by the
way in which input data are collected, it is important to Nicholas Chartres (University of California, San
inspect input data purposefully and carefully. He added Francisco) noted that tool performance will vary
that data representation and structure are key for all depending on the dataset. Although many tools are
AI and machine learning applications. Reif indicated available, there is no one provider with a suite of tools
that open-source tools and data are expected to be that can be used throughout the SR process—which
important for promoting rigor, reproducibility, and public makes integrable tools invaluable, he noted. Chartres also
acceptance of the results. emphasized the importance of validation being conducted
by the researcher who implements AI tools within the
Faustman asked the members to consider examples of SR process. Chartres added that the researcher can also
key opportunities for AI to advance the SR workflow explain why a given tool was chosen and the approach for
and methodology to enhance efficiencies. Auerbach attaining the results.
July 2022 | 11

Regarding publications, Reif mentioned that there Wallace envisioned the next decade as an opportunity
are many venues for research dissemination and it to be more audacious. Characterizing the traditional
is important for SR methods and results to be tied to SR process as artificially constrained, he suggested the
something citable. Faustman agreed and emphasized the use of machine learning to monitor the entire pool of
need for code to be citable. literature, instead of using Boolean queries to retrieve a
subset of articles that are eligible for screening. Another
Chartres emphasized the need for original data to be idea he offered is to attempt to “curate and construct
made openly available so that data extraction does summaries of evidence on the fly,” noting that the
not become a bottleneck in the SR process. He also results would be imperfect.
mentioned that standardized reporting is an important
objective. The panel discussion ended after exploring various
notions of bias in the context of computer-assisted data
extraction for chemical hazard assessments.
DISCLAIMER: This Proceedings of a Workshop—in Brief was REVIEWERS: To ensure that this Proceedings of a Workshop—in
prepared by JEANETTE BEEBE, KALEY BEINS, KATHRYN Z. Brief meets institutional standards of quality and objectivity, it
GUYTON, and RAYMOND WASSEL as a factual summary of was reviewed in draft form by DAVID REIF, North Carolina State
what occurred at the workshop. The statements recorded here University, and QINGSHENG WANG, Texas A&M University. The
are those of the individual workshop participants and do not review comments and draft manuscript remain confidential to
necessarily represent the views of all participants; the workshop protect the integrity of the process.
planning committee; the workshop participants’ institutions; or
SPONSORS: This workshop was supported by the
the National Academies of Sciences, Engineering, and Medicine.
U.S. Environmental Protection Agency. Contract No.
WORKSHOP PLANNING COMMITTEE MEMBERS: ELAINE 68HERC19D0011.
FAUSTMAN (Chair), University of Washington; SCOTT
Suggested citation: National Academies of Sciences,
AUERBACH, National Institute of Environmental Health
Engineering, and Medicine. 2022. Artificial Intelligence Tools
Sciences; LAURA BEANE FREEMAN, National Cancer Institute;
and Open Data Practices for EPA Chemical Hazard Assessments:
NICHOLAS CHARTRES, University of California, San Francisco;
Proceedings of a Workshop—in Brief. Washington, DC: The National
AISHA DICKERSON, Johns Hopkins Bloomberg School of Public
Academies Press. https://doi.org/10.17226/26540.
Health; CHIRAG PATEL, Harvard Medical School; DAVID
REIF, North Carolina State University; DAVID RICHARDSON,
University of California, Irvine; JOYCE TSUJI, Exponent,
Inc.; CHERYL WALKER, Baylor College of Medicine; BYRON
WALLACE, Northeastern University.
Division on Earth and Life Studies
Copyright 2022 by the National Academy of Sciences. All rights reserved.

Artificial Intelligence Tools and Open Data Practices For EPA Chemical Hazard Assessments: Proceedings of A Workshop in Brief (2022)

Uploaded by

Copyright:

Available Formats

Artificial Intelligence Tools and Open Data Practices For EPA Chemical Hazard Assessments: Proceedings of A Workshop in Brief (2022)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Intelligence Tools and Open Data Practices For EPA Chemical Hazard Assessments: Proceedings of A Workshop in Brief (2022)

Uploaded by

Copyright:

Available Formats

This PDF is available at http://nap.naptionalacademies.

Artificial Intelligence Tools and Open

FIND RELATED TITLES SUGGESTED CITATION

Proceedings of a Workshop—in Brief

Artificial Intelligence Tools and Open Data

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

• Signal-to-measurement errors. WORKSHOP SUMMARY AND FINAL THOUGHTS

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Division on Earth and Life Studies

Copyright 2022 by the National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

You might also like