Translational Bioinformatics: Past, Present, and Future
Translational Bioinformatics: Past, Present, and Future
Translational Bioinformatics: Past, Present, and Future
H O S T E D BY
REVIEW
KEYWORDS
Translational bioinformatics;
Biomarkers;
Genomics;
Precision medicine;
Personalized medicine
Abstract Though a relatively young discipline, translational bioinformatics (TBI) has become a key
component of biomedical research in the era of precision medicine. Development of high-throughput
technologies and electronic health records has caused a paradigm shift in both healthcare and
biomedical research. Novel tools and methods are required to convert increasingly voluminous
datasets into information and actionable knowledge. This review provides a denition and contextualization of the term TBI, describes the disciplines brief history and past accomplishments, as
well as current foci, and concludes with predictions of future directions in the eld.
Introduction
Though a relatively young eld, translational bioinformatics
has become an important discipline in the era of personalized
and precision medicine. Advances in biological methods and
technologies have opened up a new realm of possible observations. The invention of the microscope enabled doctors and
researchers to make observations at the cellular level. The
advent of the X-ray, and later of magnetic resonance and other
imaging technologies, enabled visualization of tissues and
organs never before possible. Each of these technological
advances necessitates a companion advance in the methods
and tools used to analyze and interpret the results. With the
* Corresponding author.
E-mail: [email protected] (Tenenbaum JD).
a
ORCID: 0000-0003-3532-565X.
Peer review under responsibility of Beijing Institute of Genomics,
Chinese Academy of Sciences and Genetics Society of China.
Translational bioinformatics
Defining translational bioinformatics
According to the American Medical Informatics Association
(AMIA), translational bioinformatics (hereafter TBI) is
the development of storage, analytic, and interpretive methods to optimize the transformation of increasingly voluminous
biomedical data, and genomic data, into proactive, predictive,
preventive, and participatory health (http://www.amia.org/
http://dx.doi.org/10.1016/j.gpb.2016.01.003
1672-0229 2016 The Author. Production and hosting by Elsevier B.V. on behalf of Beijing Institute of Genomics, Chinese Academy of Sciences and
Genetics Society of China.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
32
applications-informatics/translational-bioinformatics). Put
more simply, it is the development of methods to transform
massive amounts of data into health. Dr. Russ Altman from
Stanford University delivers a year-in-review talk at AMIAs
summit on TBI. In his 2014 presentation he provided the following denition for TBI: informatics methods that link biological entities (genes, proteins, and small molecules) to clinical
entities (diseases, symptoms, and drugs)or vice versa
(https://dl.dropboxusercontent.com/u/2734365/amia-tbi-14-nal.
pdf). Figure 1 gives a visual depiction of the way in which TBI
ts within the bigger picture of biomedical informatics and
transforming data into knowledge [1]. Along the X axis is
the translational spectrum of bench-to-bedside, while the Y
axis from top to bottom represents the central dogma of informatics, transforming data to information and information to
knowledge. Toward the discovery end of the spectrum (the
bench) is bioinformatics, which includes storage, management,
analysis, retrieval, and visualization of biological data, often in
model systems. The discovery end of the spectrum has some
overlap with computational biology, particularly in the context
of systems biology methods. Toward the clinical end of the
spectrum (bedside) is health informatics. TBI ts in the middle
of this space. On the data-to-knowledge spectrum, data collection and storage are the beginning steps. After that comes data
processing, analysis, and then interpretation, thereby transforming the information that has been gleaned from the data
into actual knowledge, useful in the context of clinical care,
or for further research. In that way the data go from just being
bits1s and 0sto new knowledge and actionable
insights.
Where do we come from? A relatively short past
TBI as a eld has a relatively short history. In the year 2000,
the initial drafts of the human genome were released, arguably
necessitating this new eld of study (http://web.ornl.gov/sci/
techresources/Human_Genome/project/clinton1.shtml).
In
2002, AMIA held its annual symposium with the name
Bio*medical Informatics: One Discipline, meant to recognize and emphasize the spectrum of subdisciplines. In 2006,
the term itself was actually coined by Atul Butte and Rong
Chen at the AMIA annual symposium in a paper entitled
Finding disease-related genomic experiments within an international repository: rst steps in translational bioinformatics
[2]. In 2008, AMIA had its rst annual AMIA summit on TBI,
chaired by Dr. Butte. Year 2011 saw the rst annual TBI Conference in Asia, held in Seoul, Korea. Finally, an online textbook on TBI was published in 2012 by PLoS Computational
Biology, edited by Maricel Kann. Initially intended to be a traditional print textbook, this resource was published using an
open source model, making it freely available on the Internet
(http://collections.plos.org/translational-bioinformatics).
What are we? TBI today
A 2014 review article [3] categorized recent themes in the eld
of TBI into four major categorizations: (1) clinical big data,
or the use of electronic health record (EHR) data for discovery
(genomic and otherwise); (2) genomics and pharmacogenomics
in routine clinical care; (3) omics for drug discovery and repurposing; and (4) personal genomic testing, including a number
of ethical, legal, and social issues that arise from such services.
Big data and biomedicine
As technology enables us to take an increasingly comprehensive look across the genome, transcriptome, proteome, etc.,
the resulting datasets are increasingly high-dimensional. This
in turn requires a larger number of samples in order to achieve
the statistical power needed to detect the true signal. The past
decade or so has seen an increasing number of large-scale biorepositories intended for clinical and translational research all
over the world. These projects comprise both information and
biospecimens from individual patients, enabling researchers to
reclassify diseases based on underlying molecular pathways,
instead of the macroscopic symptoms that have been relied
on for centuries in dening disease. Examples are listed in
Table 1. These various projects involve different models of
participation, ranging from explicit informed consent to use
of de-identied biospecimens and their associated clinical
information from EHRs (also de-identied). The informed
consent is the most ethically rigorous model, but also the most
expensive. The use of de-identied specimens and data is more
scalable, and nancially feasible. However, as complete genomic data are increasingly used, it is impossible to truly deidentify these data [4]. This raises ethical issues regarding
patient privacy and data sharing. In the United States, legislation known as the Common Rule addresses these issues. In
2015, a notice of proposed rule-making (NPRM, http://www.
hhs.gov/ohrp/humansubjects/regulations/nprmhome.html) was
released to solicit feedback on some major revisions to the
law, which was originally passed in 1991. Much has changed
within biomedical research in the intervening years [5].
In order to accrue the numbers of samples required for the
big data discipline that biomedical research is becoming, the
ability to use patient data and samples in research would
be of signicant benet. One major point addressed in the
33
Name
Link
Description
http://www.research.va.gov/mvp/
veterans.cfm
http://www.personalgenomes.org/
MURDOCK Study
http://www.murdock-study.com
UK Biobank
http://www.ukbiobank.ac.uk
Genomics England
http://www.genomicsengland.co.uk/
https://www.framinghamheartstudy.org/
http://www.ckbiobank.org/site/
https://www.dor.kaiser.org/external/
DORExternal/rpgeh/index.aspx
Google Baseline
http://www.wsj.com/articles/google-tocollect-data-to-dene-healthy-human1406246214*
http://www.nih.gov/precision-medicineinitiative-cohort-program
https://emerge.mc.vanderbilt.edu
http://www.nih.go.kr/NIH/eng/contents/
NihEngContentView.jsp?cid=17881
Estonian Biobank
http://www.geenivaramu.ee/en/accessbiobank
Note: *a database weblink for Google Baseline is not available; the link to a news report about the project is provided instead. eMERGE, electronic
medical records and genomics; EHR, electronic health record.
34
Secondary use
Secondary use of data refers to data that are created or collected through clinical care. In addition to use in caring for
the patient, these data may also be crucial for operations, quality improvement, and comparative effectiveness research.
Some assert that the term secondary use should give way
to the term continuous use. They argue against the notion
that data collected at the point of care are solely for clinical
use, and everything else is secondary. We should be maximally
leveraging this valuable information. Nonetheless, there is a
legitimate concern about data quality. Data in the EHR are
often sparse, incomplete, even inaccurate [10]. This makes
these data wholly unsuitable for certain purposes, but still sufcient for others. For instance, Frankovich et al. described a
case in which an adolescent lupus patient was admitted with
a number of complicating factors that put her at risk for
thrombosis [11]. The medical team considered anticoagulation, but were concerned about the patients risk of
bleeding. No guideline was available for this specic case,
and a survey of colleagues was inconclusive. Through the institutions electronic medical record data warehouse, Frankovich
and colleagues were able to look at an electronic cohort of
pediatric lupus patients who had been seen over a 5-year period. Of the 98 patients in the cohort, 10 patients had developed
clots, with higher prevalence in patients with similar complications as the patient in question. Using this real-time analysis
based on evidence generated in the course of clinical care,
Frankovich and colleagues were able to make an evidencebased decision to administer anti-coagulants [11]. Subsequently, researchers at Stanford University have proposed a
Green Button approach to formalize this model of realtime decision support derived from aggregate patient data
and data capture to help inform future research and clinical
decisions [12].
TBI tends to focus on molecules, newly accessible in high
dimensions based on novel high-throughput technologies.
Phenotyping is a closely-related challenge, more complex than
it might seem. Disease is not binary: even within a very specic
type of cancer, a tumors genomic prole may be quite
different among the precise sampling locations and sizes [13].
There are a number of groups focusing on this problem: the
Electronic Medical Records and Genomics (eMERGE)
Network (http://www.genome.gov/27540473), the NIH
Collaboratory (https://www.nihcollaboratory.org/), PCORnet
(http://www.pcornet.org/), and the MURDOCK Study
(http://murdock-study.com/), among others [1417]. The
Phenotype KnowledgeBase website (https://phekb.org/) is a
knowledge base of phenotypes, offering a collaborative environment to build and validate phenotype denitions. The phenotypes are not (yet) computable, but it serves as a resource for
dening patient cohorts in specic disease areas [18]. Richesson et al. looked at type 2 diabetes, a phenotype that one might
expect to be fairly straightforward [19]. But dening type 2 diabetes mellitus (T2DM) using the International Classication of
Disease version 9 (ICD9) codes, diabetes-related medications,
the presence of abnormal labs, or a combination of those factors resulted in very different counts for the number of people
diagnosed with T2DM in Dukes data warehouse [19]. Using
only ICD9 codes gave 18,980 patients, while using medications
yielded 11,800. Using ICD9 codes, medications, and labs all
together yielded 9441 patients. Note that the issue is not just
a matter of semantics and terminology, where if everyone
could agree to a single denition and use the same code, then
the terms would become uniform. For different purposes, different denitions of diabetes may be needed, depending on
whether the use case involves cohort identication or
retrospective analysis. In different cases, one might care more
about minimizing false positives (e.g., retrospective analysis)
or maximizing true positives (e.g., surveillance or prospective
recruitment).
Thousands of papers have been published describing
genome-wide association studies (GWAS), in which researchers
look across the entire genome to nd SNPs that are statistically
enriched for a given phenotype (usually a disease) compared
with healthy controls [20]. Researchers at Vanderbilt University turned this approach on its head, developing a method
known as phenome-wide association studies (PheWAS,
https://phewas.mc.vanderbilt.edu/). Instead of looking at the
entire genome, PheWAS evaluates the association between a
set of genetic variants and a wide and diverse range of phenotypes, diagnoses, traits, and/or outcomes [21]. This analytic
approach asks, for a given variant, do we see an enrichment
of a specic genotype in any of these phenotypes? Figure 2 illustrates results using this approach [22]. In standard GWAS analyses, the different color bands at the bottom represent the
different chromosomes. In the case of PheWAS, they are different disease areas, e.g., neurologic, cardiovascular, digestive,
and skin. Pendergrass et al. [23] used a PheWAS approach
for the detection of pleiotropic effects, where one gene affects
multiple different phenotypes. They were able to replicate 52
known associations and 26 closely-related ones. They also
found 33 potentially-novel genotype phenotype associations
with pleiotropic effects, for example the GALNT2 SNP that
had previously been associated with HDL levels among European Americans. Here they detected an association between
GALNT2 and hypertension phenotypes in African Americans,
as well as serum calcium levels and coronary heart disease phenotypes in European Americans.
Another aspect of big data in biomedicine is the use of nontraditional data sources. These were well illustrated, both literally and guratively, in a 2012 paper by Eric Schadt [24]. A
complex and detailed gure (Figure 3) showed various data
types that could be mined for their effects on human health:
weather, air trafc, security, cell phones, and social media
among others. But strikingly to those reading the paper just
a few years later, the list did not include personal activity
trackers, e.g., FitBit, Jawbone, or even the Apple watch. This
omission of such a popular technology today is indicative of
what a fast-moving eld this is.
Genomics in clinical care
One sees a number of examples of how genomic data are used
in clinical care in the context of pharmacogenomics [25]. But
molecular data, and genomic data derived from nextgeneration sequencing (NGS) in particular, have been used
in a number of other contexts as well. One example took place
at Stanfords Lucile Packard Childrens Hospital, where a
newborn presented with a condition known as long QT
syndrome.
(http://scopeblog.stanford.edu/2014/06/30/whenten-days-a-lifetime-rapid-whole-genome-sequencing-helps-criti-
35
36
37
38
39
Conclusion
In summary, we are entering a new era in data-driven health
care. Translational bioinformatics methods continue to make
an actual difference in patients lives. The infrastructure, information technology, policy, and culture need to catch up with
some of the technological advances. For researchers working
at the cutting edge of translational bioinformatics, opportunities abound, and the future looks bright.
Competing interests
The author has declared no competing interests.
Acknowledgments
This work was supported in part by the Clinical and Translational Science Award (Grant No. UL1TR001117) to Duke
University from the National Institutes of Health (NIH),
United States.
References
[1] Tenenbaum JD, Shah NH, Altman RB. Translational bioinformatics. In: Shortliffe EH, Cimino JJ, editors. Biomedical informatics. London: Springer-Verlag; 2014. p. 72154.
[2] Butte AJ, Chen R. Finding disease-related genomic experiments
within an international repository: rst steps in translational
bioinformatics. In: AMIA Annual Symposium Proceedings 2006.
p. 10610.
[3] Denny JC. Surveying recent themes in translational bioinformatics: big data in EHRs, omics for drugs, and personal genomics.
Yearb Med Inform 2014;9:199205.
[4] Kulynych J, Greely H. Every patient a subject: When personalized
medicine, genomic research, and privacy collide, 2014, http://
www.slate.com/articles/technology/future_tense/2014/12/when_
personalized_medicine_genomic_research_and_privacy_collide.
[5] Hudson KL, Collins FS. Bringing the common rule into the 21st
century. N Engl J Med 2015;373:22936.
[6] Institute of Medicine (US) Roundtable on Evidence-Based
Medicine. Leadership commitments to improve value in healthcare: nding common ground: workshop summary. Washington
(DC): National Academies Press (US); 2009, http://www.ncbi.
nlm.nih.gov/books/NBK52847/.
[7] Fleurence R, Selby JV, Odom-Walker K, Hunt G, Meltzer D,
Slutsky JR, et al. How the patient-centered outcomes research
institute is engaging patients and others in shaping its research
agenda. Health Aff 2013;32:393400.
[8] Embi PJ, Payne PR. Evidence generating medicine: redening the
research-practice relationship to complete the evidence cycle. Med
Care 2013;51:S8791.
[9] Luce BR, Kramer JM, Goodman SN, Connor JT, Tunis S,
Whicher D, et al. Rethinking randomized clinical trials for
comparative effectiveness research: the need for transformational
change. Ann Intern Med 2009;151:2069.
[10] Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR:
data quality issues and informatics opportunities. AMIA Jt
Summits Transl Sci Proc 2010;2010:15.
[11] Frankovich J, Longhurst CA, Sutherland SM. Evidence-based
medicine in the EMR era. N Engl J Med 2011;365:17589.
40
[12] Longhurst CA, Harrington RA, Shah NH. A green button for
using aggregate patient data at the point of care. Health Aff
2014;33:122935.
[13] Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D,
Gronroos E, et al. Intratumor heterogeneity and branched
evolution revealed by multiregion sequencing. N Engl J Med
2012;366:88392.
[14] Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM,
Weston N, et al. Electronic medical records for genetic research:
results of the eMERGE consortium. Sci Transl Med 2011;3:79re1.
[15] Richesson RL, Hammond WE, Nahm M, Wixted D, Simon GE,
Robinson JG, et al. Electronic health records based phenotyping
in next-generation clinical trials: a perspective from the NIH
Health Care Systems Collaboratory. J Am Med Inform Assoc
2013;20:e22631.
[16] Collins FS, Hudson KL, Briggs JP, Lauer MS. PCORnet: turning
a dream into reality. J Am Med Inform Assoc 2014;21:5767.
[17] Tenenbaum JD, Christian V, Cornish MA, Dolor RJ, Dunham
AA, Ginsburg GS, et al. The MURDOCK study: a long-term
initiative for disease reclassication through advanced biomarker
discovery and integration with electronic health records. Am J
Transl Res 2012;4:291301.
[18] Rasmussen LV, Thompson WK, Pacheco JA, Kho AN, Carrell
DS, Pathak J, et al. Design patterns for the development of
electronic health record-driven phenotype extraction algorithms. J
Biomed Inform 2014;51:2806.
[19] Richesson RL, Rusincovitch SA, Wixted D, Batch BC, Feinglos
MN, Miranda ML, et al. A comparison of phenotype denitions for diabetes mellitus. J Am Med Inform Assoc 2013;20:
e31926.
[20] Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP,
Collins FS, et al. Potential etiologic and functional implications of
genome-wide association loci for human diseases and traits. Proc
Natl Acad Sci U S A 2009;106:93627.
[21] Pendergrass SA, Ritchie MD. Phenome-wide association studies:
leveraging comprehensive phenotypic and genotypic data for
discovery. Curr Genet Med Rep 2015;3:92100.
[22] Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA,
Bradford Y, et al. Variants near FOXE1 are associated with
hypothyroidism and other thyroid conditions: using electronic
medical records for genome- and phenome-wide studies. Am J
Hum Genet 2011;89:52942.
[23] Pendergrass SA, Brown-Gentry K, Dudek S, Frase A, Torstenson
ES, Goodloe R, et al. Phenome-wide association study (PheWAS)
for detection of pleiotropy within the Population Architecture
using Genomics and Epidemiology (PAGE) network. PLoS Genet
2013;9:e1003087.
[24] Schadt EE. The changing privacy landscape in the era of big data.
Mol Syst Biol 2012;8:612.
[25] McCarthy JJ, McLeod HL, Ginsburg GS. Genomic medicine: a
decade of successes, challenges, and opportunities. Sci Transl Med
2013;5:189sr4.
[26] Baskar S, Aziz PF. Genotype-phenotype correlation in long QT
syndrome. Glob Cardiol Sci Pract 2015;2015:26.
[27] Wilson MR, Naccache SN, Samayoa E, Biagtan M, Bashir H, Yu
G, et al. Actionable diagnosis of neuroleptospirosis by nextgeneration sequencing. N Engl J Med 2014;370:240817.
[28] Johnson DB, Dahlman KH, Knol J, Gilbert J, Puzanov I, MeansPowell J, et al. Enabling a genetically informed approach to
cancer medicine: a retrospective evaluation of the impact of
comprehensive tumor proling using a targeted next-generation
sequencing panel. Oncologist 2014;19:61622.
[29] Wagle N, Grabiner BC, Van Allen EM, Amin-Mansour A,
Taylor-Weiner A, Rosenberg M, et al. Response and acquired
resistance to everolimus in anaplastic thyroid cancer. N Engl J
Med 2014;371:142633.
41