Meisel 2018
Meisel 2018
Meisel 2018
Abstract
The Mid-Atlantic Microbiome Meet-up (M3) organization brings together academic, government, and industry groups
to share ideas and develop best practices for microbiome research. In January of 2018, M3 held its fourth meeting,
which focused on recent advances in biodefense, specifically those relating to infectious disease, and the use of
metagenomic methods for pathogen detection. Presentations highlighted the utility of next-generation sequencing
technologies for identifying and tracking microbial community members across space and time. However, they also
stressed the current limitations of genomic approaches for biodefense, including insufficient sensitivity to detect low-
abundance pathogens and the inability to quantify viable organisms. Participants discussed ways in which the
community can improve software usability and shared new computational tools for metagenomic processing,
assembly, annotation, and visualization. Looking to the future, they identified the need for better bioinformatics toolkits
for longitudinal analyses, improved sample processing approaches for characterizing viruses and fungi, and more
consistent maintenance of database resources. Finally, they addressed the necessity of improving data standards to
incentivize data sharing. Here, we summarize the presentations and discussions from the meeting, identifying the areas
where microbiome analyses have improved our ability to detect and manage biological threats and infectious disease,
as well as gaps of knowledge in the field that require future funding and focus.
Keywords: Microbiome, Metagenomics, Bioinformatics, Biodefense, Biothreats, Pathogen detection, Longitudinal analysis
* Correspondence: [email protected]
1
Center for Bioinformatics and Computational Biology, University of
Maryland, College Park, College Park, MD, USA
17
Present address: Department of Computer Science – MS-132, Rice
University, P.O. Box 1892, Houston, TX 77005-1892, USA
Full list of author information is available at the end of the article
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Meisel et al. Microbiome (2018) 6:197 Page 2 of 10
Fig. 1 Different sectors and institutions represented at the January 2018 M3 Meet-up
4. Robust bioinformatics tools are critical for future pathogens because standard sequencing pipelines are not
progress. These tools must be developed to better always available, and achieving required sequencing
match the needs of end users and must be subject depths may be cost-prohibitive. Dr. Sarah Allard (UMD
to critical validation. SPH) shared her work from CONSERVE (Center of Excel-
5. Data standards are essential for ensuring the quality lence at the Nexus of Sustainable Water Reuse, Food, and
and usefulness of shared datasets, but overly Health), whose mission is to enable the safe use of
onerous reporting requirements discourage sharing. non-traditional irrigation water sources on food crops [9].
In cases where privacy is a concern, we must also Dr. Allard used both culture-based and sequence-based
develop solutions that allow for secure storage and methods to detect foodborne pathogens in water samples.
processing of sensitive data. She concluded that culture-based techniques are currently
the most sensitive pathogen detection strategies and that
These key recommendations are summarized in sequencing analysis sensitivity and stringency vary
Table 1 and more extensively discussed below. strongly by method.
From a public health perspective, quantification of vi-
Sequencing-based assays frequently lack sensitivity able organisms contributing to disease is essential but
While the biodefense community has benefited from cannot be achieved with metagenomic analysis alone.
high-throughput sequencing strategies, these methods are Culturing and other approaches are important for gain-
not always as sensitive as required. In some cases, cultur- ing insight into the metabolic activity of the microbes in
ing is still the most reliable method for detecting a community [10]. Additionally, researchers must often
Meisel et al. Microbiome (2018) 6:197 Page 4 of 10
Table 1 Outline of current research gaps and future goals discussed at the January 2018 M3 Meeting
Research gaps Current limitations Community goals
Tracking microbial communities across time and • Sequencing strategies are not able to quantify • Collection, sequencing, and sharing
topography (Key Conclusions 1 and 3) viable organisms (which is essential for of more time series datasets
Importance: studies incorporating temporal biodefense applications) • Development of statistical methods and
and/or spatial sampling allow us to detect • Lack of well-established statistical approaches tools to help analyze longitudinal and/or
important shifts in community dynamics for exploring longitudinal microbiome data geospatial microbiome datasets
Application example: detecting the spread • Increased sample size makes these studies
of infection in a hospital or of a pathogen more expensive and harder to obtain sufficient
contaminating crops and spreading statistical power for all subjects/time
food-borne illness points/regions
Looking beyond bacterial pathogens • Lack of a universally distributed marker • More consistent database curation and
(Key Conclusion 2) gene (viruses) maintenance (potentially incentivized
Importance: viral and fungal components of the • Difficult to obtain sufficient material financially or with publications)
microbiome are often under-explored, from low biomass environments • Improved gene function identification
despite their potential implications in biodefense • High levels of host contamination
Application example: better understanding the • Incomplete databases
transmission of infectious viruses, like influenza
Development and application of metagenomic • Tools for metagenome pre-processing, • Easy to install, open-access software with
analysis tools (Key Conclusion 4) assembly, and binning are not always comprehensive documentation detailing
Importance: computational tools need to be sensitive or fast enough for detection best and worst use cases
developed to help improve the utility of of pathogens in a sample • Defined metrics for critical assessment
high-throughput sequencing strategies • As sequencing technologies advance, and validation of existing tools
for biodefense problems we need new tools to handle output • Software and database versions should be
Application example: improved metagenome from long- and short-read technologies, more consistently reported in the literature
assembly methods could better delineate as well as single-cell metagenomics and preserved for future replication of
between different strains of a pathogen in approaches analyses
samples
Navigating the trade-off between speed and • Current algorithms vary in speed and • Better documentation of available tools to
accuracy (Key Conclusion 4) accuracy (often sacrificing one for the other) help users optimize their software choice
Importance: metagenomic analysis used for • Large datasets, error-prone heuristics, based on their available resources
pathogen detection and identification are and coarse resolution of k-mer-based • Improvements in sequencing technologies
time-sensitive methods present challenges and tools/algorithms to improve both
Application example: deciding if a food product speed and accuracy
should be recalled due to contamination
Storing and sharing data (Key Conclusion 5) • Not all data can be shared because it is • Defined quality standard to maintain
Importance: access to publicly available datasets important to protect personally identifiable usable, open repositories
will help in verification of results and advance information or intellectual property rights • Improved ways for secure interrogation
of scientific knowledge. Scientists need to be • Lack of sufficient infrastructure or manpower of genomic datasets that cannot be openly
encouraged to move their data out of private to upload or store datasets at scale shared due to privacy regulations
silos and into shared databases
make a trade-off between the sensitivity of their detec- include the ability to obtain sufficient material from low
tion methods and the computational costs of analyzing biomass environments, high levels of host contamin-
increasingly deep sequencing datasets. Even partial cul- ation, incomplete databases, and a lack of available wet
turing of select organisms or samples can help shift this lab protocols and computational analysis pipelines. At
trade-off. As commented during a breakout session, the meeting, it was noted that central repositories for
“you can’t always sequence your way out of it.” shared protocols do exist (e.g., protocols.io [11]), and a
concerted effort in viral protocol sharing has been made
Few studies look beyond bacterial pathogens by the Gordon and Betty Moore Foundation, which
Shotgun metagenomics and a decrease in the cost of funds VERVE Net [12]. Proposed goals to address other
DNA sequencing have enabled researchers to analyze barriers included providing financial and/or publication
the genetic potential of microorganisms directly from an incentives for database curation and maintenance and
environmental sample. However, the majority of micro- focusing work on gene function identification. Since the
biome and metagenome studies focus only on the pro- NCBI SRA already contains many metagenomic sequen-
karyotic component of the community, while few have cing datasets, it may be worthwhile to identify novel fun-
explored the roles of fungi or viruses in these microbial gal and viral genomes from existing datasets to optimize
communities. This is due, in large part, to limitations in data usage, as this approach has been employed in previ-
resources, laboratory procedures, and in the case of vi- ous studies of environmental viruses [13].
ruses, the lack of a universally distributed marker gene. Despite the aforementioned barriers to fungal and viral
Additional barriers to mycobiome and virome studies metagenomics, additional research in this area can
Meisel et al. Microbiome (2018) 6:197 Page 5 of 10
significantly contribute to biodefense. One such import- Development and application of metagenomic analysis
ant topic is the spread of viral pathogens. Invited sem- tools is critical for progress
inar speaker Dr. Don Milton (UMD SPH) presented his Computational methods required for metagenomic ana-
work on the transmission of the influenza virus in col- lyses include taxonomic abundance profiling, taxonomic
lege dormitories [14]. The Centers for Disease Control sequence classification and annotation, functional
and Prevention (CDC) suggests that human influenza characterization, and metagenomic assembly. Many of
transmission mainly occurs by droplets made when the presentations at the meeting shared new and/or im-
people with flu cough, sneeze, or talk. However, Dr. Mil- proved tools for different aspects of microbiome studies.
ton explained that dueling reviews have disputed the im- Victoria Cepeda (UMD) described how her tool, Meta-
portance of airborne transmission [15–20]. He presented Compass, uses reference genomes to guide metagenome
NGS data showing that exhaled breath of symptomatic assembly [26], and Gherman Uritskiy (JHU) presented his
influenza cases contains infectious virus in fine particles, pipeline, metaWRAP, for the pre-processing and binning
suggesting that aerosol exposures are likely an important of metagenomes [27]. Furthermore, Brian Ondov (UMD,
mode of transmission. NIH, NHGRI) shared his implementation of the MinHash
containment estimation algorithm to screen metagenomes
Tracking microbial communities across time and topography for the presence of genomes and plasmids [28]. Data
Temporal and biogeographic sequencing studies provide visualization is important for accurately interpreting
increased resolution of microbial community shifts. In microbiome data analyses, and Dr. Héctor Corrada-Bravo
the context of biodefense, this is important for detecting (UMD) demonstrated how to use his lab’s tool, Metaviz
and containing outbreaks. Additionally, these studies [29], for interactive statistical analysis of metagenomes.
provide insight into environmental changes, which may Conventional metagenomic analyses often reflect the
contribute to epidemics by causing shifts in disease vec- most abundant elements from a complex sample and can-
tors and/or spurring human migration to new regions or not detect rare elements with confidence. Dr. Nicholas
densely populated urban areas. Several presentations at Bergman (NBACC) shared a more sensitive single-cell
the meeting shared spatiotemporal microbiome analyses metagenomics approach that allows for increased detec-
of different environments. Dr. Sean Conlan (NIH, tion of all elements of a community sample. Dr. Bergman’s
NHGRI) presented his work using metagenomics to talk also emphasized the necessity of improving sensitivity,
study outbreaks of nosocomial infections and identified preventing contamination, eliminating biases, and increas-
the transfer of plasmids from patients to the hospital en- ing efficiency for sequencing-based techniques.
vironment [21, 22]. Gherman Uritskiy (JHU) and Dr. Sa-
rah Preheim (JHU) used a combination of marker gene Bioinformatics tools should better match the needs of end users
and metagenomics approaches to characterize the Many discussions at the meetings focused on how the
changes in environmental microbiomes in response to field can optimize tool utility. It was agreed that scien-
perturbations. Uritskiy studied halite endoliths from the tists should always carefully evaluate the strengths and
Atacama Desert in Chile over several years and showed weakness of available methods, either via existing
how they were significantly impacted by rainstorms. Dr. “bake-off” studies or through the available documenta-
Preheim compared a biogeochemical model to microbial tion, to ensure they are using the best tools to address
communities’ changes in a lake over the spring and sum- their specific problem. Tool developers should disclose
mer to reveal the influence of energy availability on mi- the limits of their methods and advise on the types of
crobial population dynamics. data their software is best suited to analyze. Developers
While time series datasets provide valuable informa- should also work towards producing software that is
tion, they are much more difficult to analyze with easy to download and install, providing comprehensive
current statistical methods and models than documentation for their tools, and ensuring open access
cross-sectional sampling strategies [23, 24]. Among for the academic community. As a community, we
other reasons, this is because it is difficult to identify the should encourage that publications list not only cases
optimal sampling frequency, the compositional nature of and data types where methods perform best, but also
microbiome data frequently violates assumptions of stat- where they underperform or even fail. Additional
istical methods, and the commonly available software studies, like the Critical Assessment of Metagenome
tools are often insufficient for required complex compar- Interpretation (CAMI) [30, 31], Microbiome Quality
isons. Addressing this, Dr. J Gregory Caporaso (NAU) Control project [32], or challenges run under the aegis
presented QIIME 2 (https://qiime2.org) and shared his of PrecisionFDA [33], should be conducted to help
team’s QIIME 2 plugin, q2-longitudinal, which incorpo- characterize the strengths and weaknesses of different
rates multiple methods for characterizing longitudinal approaches and evaluate their impact on data analysis
and paired-sample marker gene datasets [25]. and interpretation.
Meisel et al. Microbiome (2018) 6:197 Page 6 of 10
Some meeting attendees are currently contributing to program supports parallelism, consideration should be
these goals. Dr. Nathan Olson (UMD, NIST) presented given to the type of hardware required. For example, some
his evaluation of different 16S rRNA marker gene survey available options include large multicore servers for multi-
bioinformatic pipelines using mixture samples. Addition- threaded applications, cluster nodes for distribution of
ally, Dr. Daniel Nasko (UMD) characterized how genomic compute jobs, or cloud computing solutions. Other strat-
database growth affects study findings, showing that dif- egies might involve analyzing only a subset of the data or
ferent versions of the RefSeq database strongly influenced using a smaller, application-specific reference database.
species-level taxonomic classifications from metagenomic Finally, strategies discussed for speeding up time-critical
samples [34]. Because the version of software and data- analyses included employing a multi-tiered approach (e.g.,
bases used can significantly affect the findings, this infor- a quick first pass followed by more detailed analyses [42])
mation should be reported more consistently in the and considering the suitability of various sequencing plat-
literature. Furthermore, we should consider strategies to forms for certain applications. Interventions or optimiza-
preserve previous software and database versions to en- tions were discussed with regard to their impact on
able future replication of analyses. analysis accuracy and interpretation of results. Preferred
solutions are the ones that provide both the desired speed
Bioinformatics tools must better navigate the trade-off and accuracy, though more often than not there is a
between speed and accuracy trade-off between the two. The optimal balance also
Metagenomic analysis methods vary in the central pro- depends on the use case. Assessment and validation
cessing unit (CPU) time, memory, and disk resource methods are required to characterize a method’s speed
usage, and this is not always clearly reported in software and accuracy. It will be up to the subject matter experts to
publications. Additionally, method scalability relative to determine the desired accuracy level for each case and the
size or type of input data also varies considerably. Opti- extent to which they can sacrifice accuracy for speed.
mizing speed and accuracy is especially important for
biodefense applications. For instance, improvements in Data needs to be moved out of private silos and into
NGS analysis allowing for collection and analysis of sam- public repositories
ples in a clinically relevant time frame can help effect- Data sharing is continually a challenge that gets raised
ively track hospital outbreaks and prevent the spread of within the biological community, especially as DNA/
infection [35]. Furthermore, confidence in the accuracy RNA sequencing becomes more ubiquitous and tangible
of these analyses is required to execute appropriate plans outside of core facilities [43]. This challenge is prevalent
of action and prevent panic. Recently, findings of Bacil- across multiple scientific disciplines and was recently
lus strains on the International Space Station that were highlighted by the National Research Council as a prior-
genomically similar to pathogenic Bacillus anthracis re- ity for microbial forensics [44]. There are numerous rea-
quired more detailed characterization to ensure that sons data are not being shared, including the need to
their presence was not a concern for the health of the protect personally identifiable information or intellectual
crew [36–38]. B. anthracis was also initially reported to property rights prior to publication and the lack of suffi-
be found in the NYC subway system, along with Yersina cient infrastructure or manpower to upload at scale. How-
pestis, the pathogen responsible for the plague [39]. ever, leveraging this diversity and breadth of data will be
After public attention prompted further analysis, the au- important for an effective biodefense capacity, as well as
thors found no evidence that these organisms were other bioscience applications like healthcare, pharmaceuti-
present and found no evidence of pathogenicity [40, 41], cals, agriculture, and industry. In order to incentivize data
again highlighting the importance of careful evaluation sharing, we need to evaluate and improve publicly avail-
and interpretation of results, especially those with severe able resources for storing and processing data.
public health consequences. Inherent altruism or obligation to share data should be
Many different strategies for speeding up analyses met with as little friction as possible, and we need to
were discussed at the meeting, including hardware, soft- incentivize openness. One incentive is academic credit
ware, and algorithm choice. Some hardware consider- through authorship on publications, though this will re-
ations for the speed of analyses include balancing CPUs quire combined efforts of researchers, journal editors,
with co-processors such as graphics processing units and funding agencies to better define what contributions
(GPUs) or field-programmable gate arrays (FPGAs), ser- constitute data authorship and what responsibilities data
ver configuration in terms of the amount of random ac- authors have [45, 46]. Another potential incentive is the
cess memory (RAM), or disk storage type and speed. availability of free software for data analysis and meeting
Programs and algorithms vary in accuracy as well as ease participants debated the desirability and sustainability of
of parallelization. Often a slower yet parallelizable algo- service-based options (e.g., MG-RAST [47]) compared
rithm is preferred to one that is not parallelizable. If a to user-installable software options (e.g., QIIME [48],
Meisel et al. Microbiome (2018) 6:197 Page 7 of 10
mothur [49]). At the meeting, Dr. Nur A. Hasan participants used NGS tools to identify the transfer of
(CosmosID, Inc.) highlighted the cloud-based metagen- microbes from patients to their hospital environments,
ome tools and databases his company has to offer. There track the transmission of influenza in a community liv-
are also strong movements towards software sharing, ing space, study environmental shifts over time, and
such as the Astrophysics Source Code Library [50] and evaluate the safety of using non-traditional water sources
the Materials Resource Registry at NIST [51]. on food crops. These studies, and others, have been
It is expected that some quality standard is needed to partly driven by cheaper, more reliable sequencing tech-
maintain usable, open repositories. Where that standard nologies and improvements in computational analysis
is set can affect how much data is shared. For example, a tools. Open-source software for sequence processing and
high bar may ensure high-quality sequences and com- quality control, taxonomic annotation, metagenomic as-
prehensive metadata but minimize sharing, while a lower sembly, and binning, and data visualization have been
quality bar will more likely move data out of silos. The essential for growth. Continued development of these re-
solution may be a combination of repositories with vary- sources will result in significant scientific advances.
ing standards or a single repository which allows for Despite this progress, there are several limitations to
varying degrees of annotation completeness and allows using NGS approaches for biodefense problems. First
the user to modify searches based on that feature. It is and foremost, sequencing methods are unable to accur-
important to note that a single repository may be diffi- ately quantify viable organisms from metagenomic sam-
cult to reliably curate and manage at scale. Another op- ples, which is essential for identifying potential threats
tion is distributed but federated systems, like used by to public health. Beyond that, applications for which
the US Virtual Astronomical Observatory [52]. Groups NGS approaches are well-suited still present many chal-
like the Genomic Standards Consortium [53, 54] are lenges. Although sequencing costs are steadily declining,
working towards improving data quality by supporting it remains expensive to process, computationally analyze,
projects such as Minimum Information about any Se- and store the increasingly large datasets that are gener-
quence (MIxS) [55], which establishes standards for de- ated. Confident detection of infectious, but potentially
scribing genomic data and provides checklists to help rare pathogens in a community often requires very deep
with annotation. We need to build a community consen- sequencing, and scientists must make the appropriate
sus on how much metadata is required to make report- speed, cost, and accuracy trade-offs to best answer their
ing less onerous for data providers but ensure data research questions. In many cases, sequencing experi-
usability by others in the field. ments may need to be complemented with culturing, en-
Incentivizing open data sharing should not be the only richment, or other targeted approaches. Because of these
solution, as some sensitive data cannot be openly shared limitations, and others, researchers must be extremely
due to privacy regulations (e.g., human genomes and careful when interpreting data to identify biothreats;
Health Insurance Portability and Accountability Act regu- reporting false positives without critical validation can
lations). Other sectors, such as the financial industry, have have significant fiscal and public health consequences.
long been working on solutions to enable storage, transit, Developing the capacity to identify not only when a po-
and operations of protected data. These solutions include tential pathogen is present but also at what levels it is
software-based approaches (e.g., homomorphic encryp- actively contributing to an infectious disease will greatly
tion, Yao’s protocol, secure fault-tolerant protocols, oblivi- improve our response to biothreats. Another area that
ous transfer) and hardware-based approaches (e.g., AES requires further investigation is the detection of anti-
full disk encryption for data storage, Intel® Software Guard microbial resistance. While only briefly highlighted in
Extension for secure operations). Dr. Stephanie Rogers the meeting talks about influenza and nosocomial tra-
presented the GEMStone 2.0 project from B. Next, an cing, antimicrobial resistance poses a significant threat
IQT Lab, called SIG-DB, which explores homomorphic to public health and biodefense. Current metagenomic
encryption and Intel Software Guard Extension (SGX) to sequencing methods allow us to identify antimicrobial
securely search genomic databases [56]. Early results of resistance genes from different environments; however,
applying these solutions to biological data are promising these techniques cannot determine whether these genes
and should be explored more fully. are actively being expressed and are currently not prac-
tical for wide-spread adoption in clinical settings [57].
Conclusions To date, few microbiome studies have focused on viral
Overall, this meeting successfully brought together sci- and fungal/eukaryotic organisms, despite their poten-
entists from academia, government, and industry to tially important community interactions and roles in
present their research and discuss how high-throughput pathogenesis. In order to generate relevant virome and
genomics methods have stimulated interest and progress mycobiome datasets, we must improve sample process-
in biodefense and pathogen detection. Notably, meeting ing techniques and dedicate resources to effectively
Meisel et al. Microbiome (2018) 6:197 Page 8 of 10
curate and maintain publicly available databases. We Activity (IARPA), via the Army Research Office (ARO) under Federal Award No.
also need to develop advanced statistical toolkits for ana- W911NF-17-2-0089. HC was supported by the NIH, R01 grant GM114267. The
views and conclusions contained herein are those of the authors and should not
lyzing longitudinal studies. In general, tool developers be interpreted as necessarily representing the official policies or endorsements,
should focus on creating user-friendly, adaptable re- either expressed or implied, of the ODNI, IARPA, ARO, or the US Government.
sources, with comprehensive documentation and clear The contributions of ALB, NHB, MJR, DDS, and SR were funded under Contract
No. HSHQDC-15-C-00064 awarded by the Department of Homeland Security
descriptions of default settings and optional parameters. (DHS) Science and Technology Directorate (S&T) for the operation and manage-
These tools must be critically evaluated for their appro- ment of the National Biodefense Analysis and Countermeasures Center
priate use cases; however, when looking for emerging (NBACC), a Federally Funded Research and Development Center. The views and
conclusions contained in this document are those of the authors and should
threats, it will be necessary to develop validation ap- not be interpreted as necessarily representing the official policies, either
proaches that do not require the use of gold standards. expressed or implied, of the DHS or S&T. In no event shall DHS, NBACC, S&T, or
In order to encourage additional growth, the greater Battelle National Biodefense Institute have any responsibility or liability for any
use, misuse, inability to use, or reliance upon the information contained herein.
scientific community should invest in expanding and en- DHS does not endorse any products or commercial services mentioned in this
forcing clear standards for genomic datasets. If set ap- publication. JGC was supported in part by the National Cancer Institute of the
propriately, these standards will help incentivize data National Institutes of Health under the awards for the Partnership of Native
American Cancer Prevention U54CA143924 (UACC) and U54CA143925
sharing and improve the quality and usability of public (NAU) and by the National Science Foundation award 1565100. SC was
repositories. Additional focus should be on strengthen- supported by NIH Intramural Research. JD and GU were supported in part
ing best practices and solutions for handling sensitive by the NSF, grant DEB1556574 to JD. BDO was supported by the Intramural
Research Program of the National Human Genome Research Institute and
datasets that are subject to privacy regulations. Moving National Institutes of Health and utilized the computational resources of
forward, active conversations between researchers and the NIH HPC Biowulf cluster (https://hpc.nih.gov).
policymakers will be essential to expand and implement
Availability of data and materials
these ideas in biodefense. Not applicable
Medicine, Johns Hopkins University, Baltimore, MD, USA. 17Present address: 24. Gerber GK. The dynamic microbiome. FEBS Lett Wiley-Blackwell. 2014;588:
Department of Computer Science – MS-132, Rice University, P.O. Box 1892, 4131–9.
Houston, TX 77005-1892, USA. 25. Bokulich N, Zhang Y, Dillon M, Rideout JR, Bolyen E, Li H, et al. q2-
longitudinal: a QIIME 2 plugin for longitudinal and paired-sample analyses
Received: 21 August 2018 Accepted: 18 October 2018 of microbiome data. bioRxiv Cold Spring Harbor Laboratory. 2017:223974.
https://doi.org/10.1101/223974.
26. Cepeda V, Liu B, Almeida M, Hill CM, Koren S, Treangen TJ, et al.
MetaCompass: reference-guided assembly of metagenomes. 2017. https://
References doi.org/10.1101/212506.
1. Drew TW, Mueller-Doblies UU. Dual use issues in research - a subject of 27. Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP - a flexible pipeline for
increasing concern? Vaccine. 2017;35:5990–4. genome-resolved metagenomic data analysis. Microbiome. BioMed Central.
2. Gardy JL, Loman NJ. Towards a genomics-informed, real-time, global 2018;6:158.
pathogen surveillance system. Nat Rev Genet Nature Publishing Group. 28. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et
2018;19:9–20. al. Mash: fast genome and metagenome distance estimation using
3. Robinson ER, Walker TM, Pallen MJ. Genomics and outbreak investigation: MinHash. Genome Biol. BioMed Central. 2016;17:132.
from sequence to consequence. Genome Med. BioMed Central. 2013;5:36. 29. Wagner J, Chelaru F, Kancherla J, Paulson JN, Zhang A, Felix V, et al. Metaviz:
4. Miller RR, Montoya V, Gardy JL, Patrick DM, Tang P. Metagenomics for interactive statistical and visual analysis of metagenomic data. Nucleic Acids
pathogen detection in public health. Genome Med. BioMed Central. 2013;5: Res. 2018;514:59.
81. 30. Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, et al. Critical
5. Lipkin WI. The changing face of pathogen discovery and surveillance. Nat assessment of metagenome interpretation—a benchmark of metagenomics
Rev Microbiol. 2013;11:133–41. https://www.ncbi.nlm.nih.gov/pubmed/ software. Nat Methods Nature Publishing Group. 2017;14:1063–71.
23268232. 31. Bremges A, AC MH. Critical assessment of metagenome interpretation
6. Forbes JD, Knox NC, Ronholm J, Pagotto F, Reimer A. Metagenomics: the enters the second round. mSystems. Am Soc Microbiol J. 2018;3:537.
next culture-independent game changer. Front Microbiol. Frontiers. 2017;8: 32. Sinha R, Abu-Ali G, Vogtmann E, Fodor AA, Ren B, Amir A, et al. Assessment
1069. of variation in microbial community amplicon sequencing by the
7. Mid-Atlantic Microbiome Meet-up main groups.io Group [Internet]. [cited Microbiome Quality Control (MBQC) project consortium. Nat Biotechnol
2018 May 4]. Available from: https://m3.groups.io/g/main/. Nature Publishing Group. 2017;35:1077.
8. Winter 2018 Mid-Atlantic Microbiome Meetup Biodefense and Pathogen 33. Altman RB, Prabhu S, Sidow A, Zook JM, Goldfeder R, Litwack D, et al. A
Detection Agenda [Internet]. [cited 2018 May 4]. Available from: https://cpb- research roadmap for next-generation sequencing informatics. Sci Transl
us-e1.wpmucdn.com/blog.umd.edu/dist/d/418/files/2017/10/WinterM3_ Med American Association for the Advancement of Science. 2016;8:335ps10.
agenda_final-27afpqx.pdf 34. Nasko DJ, Koren S, Phillippy AM, Treangen TJ. RefSeq database growth
9. CONSERVE: A Center of Excellence at the Nexus of Sustainable Water Reuse, influences the accuracy of k-mer-based species identification. bioRxiv. 2018.
Food, and Health, year 1 achievements (March 2016–February 2017) https://www.biorxiv.org/content/early/2018/04/19/304972, https://doi.org/
[Internet]. Available from: https://static1.squarespace.com/static/ 10.1186/s13059-018-1554-6.
578101761b631b1a87aa0a3c/t/59f8f8e8e31d19ae528310e9/1509488877173/ 35. Snitkin ES, Zelazny AM, Thomas PJ, Stock F, NISC Comparative Sequencing
CONSERVE_annual_report.pdf Program Group, Henderson DK, et al. Tracking a hospital outbreak of
10. Singer E, Wagner M, Woyke T. Capturing the genetic makeup of the active carbapenem-resistant Klebsiella pneumoniae with whole-genome
microbiome in situ. ISME J. Nature Publishing Group; 2017;11:1949–1963. sequencing. Sci Transl Med. American Association for the Advancement of
11. Teytelman L, Stoliartchouk A, Kindler L, Hurwitz BL. Protocols.io: virtual Science. 2012;4:148ra116.
communities for protocol development and discussion. Plos Biol. Public 36. Venkateswaran K, Singh NK, Checinska Sielaff A, Pope RK, Bergman NH, van
Library of Science. 2016;14:e1002538. Tongeren SP, et al. Non-toxin-producing Bacillus cereus strains belonging to
12. VERVE Net [Internet]. protocols.io. Available from: protocols.io/g/verve-net. the B. anthracis clade isolated from the International Space Station. Bik H,
13. Paez-Espino D, Eloe-Fadrosh EA, Pavlopoulos GA, Thomas AD, Huntemann editor. mSystems. 2017;2:e00021–e00017.
M, Mikhailova N, et al. Uncovering Earth’s virome. Nature Nature Research. 37. van Tongeren SP, Roest HIJ, Degener JE, Harmsen HJM. Bacillus anthracis-
2016;536:425–30. like bacteria and other B. cereus group members in a microbial community
14. Yan J, Grantham M, Pantelic J, Bueno de Mesquita PJ, Albert B, Liu F, et al. within the International Space Station: a challenge for rapid and easy
Infectious virus in exhaled breath of symptomatic seasonal influenza cases molecular detection of virulent B. anthracis. Schuch R, editor. PLoS ONE.
from a college community. Proc Natl Acad Sci U S A. 2018;115:1081–6. Public Library of Science; 2014;9:e98871.
15. Killingley B, Nguyen-Van-Tam J. Routes of influenza transmission. Influenza 38. Venkateswaran K, Checinska Sielaff A, Ratnayake S, Pope RK, Blank TE,
Other Respir Viruses Wiley/Blackwell (10.1111). 2013;7(Suppl 2):42–51. Stepanov VG, et al. Draft genome sequences from a novel clade of Bacillus
16. Tellier R. Aerosol transmission of influenza A virus: a review of new studies. J cereus sensu lato strains, isolated from the International Space Station.
R Soc Interface The Royal Society. 2009;6(Suppl 6):S783–90. Genome Announc. American Society for Microbiology Journals. 2017;5:
17. Bridges CB, Kuehnert MJ, Hall CB. Transmission of influenza: implications for e00680–17.
control in health care settings. Clin Infect Dis. 2003;37:1094–101. 39. Afshinnekoo E, Meydan C, Chowdhury S, Jaroudi D, Boyer C, Bernstein N, et
18. Tellier R. Review of aerosol transmission of influenza a virus. Emerging Infect al. Geospatial resolution of human and bacterial diversity with city-scale
Dis Centers for Disease Control and Prevention. 2006;12:1657–62. metagenomics. CELS Elsevier. 2015;1:1–16.
19. Lemieux C, Brankston G, Gitterman L, Hirji Z, Gardam M. Questioning 40. Ackelsberg J, Rakeman J, Hughes S, Petersen J, Mead P, Schriefer M, et al.
aerosol transmission of influenza. Emerging Infect Dis. 2007;13:173–4 – Lack of evidence for plague or anthrax on the New York City subway. CELS.
authorreply174–5. 2015;1:4–5.
20. Brankston G, Gitterman L, Hirji Z, Lemieux C, Gardam M. Transmission of 41. Afshinnekoo E, Meydan C, Chowdhury S, Jaroudi D, Boyer C, Bernstein N, et
influenza A in human beings. Lancet Infect Dis Elsevier. 2007;7:257–65. al. Modern methods for delineating metagenomic complexity. CELS. 2015;1:
21. Conlan S, Park M, Deming C, Thomas PJ, Young AC, Coleman H, et al. 6–7.
Plasmid Dynamics in KPC-Positive Klebsiella pneumoniae during long-term 42. Bazinet AL, Ondov BD, Sommer DD, Ratnayake S. BLAST-based validation of
patient colonization. mBio. 2016;7:e00742–16. metagenomic sequence assignments. PeerJ. PeerJ Inc; 2018;6:e4892.
22. Weingarten RA, Johnson RC, Conlan S, Ramsburg AM, Dekker JP, Lau AF, et 43. Langille MGI, Ravel J, Fricke WF. “Available upon request”: not good enough
al. Genomic analysis of hospital plumbing reveals diverse reservoir of for microbiome data! Microbiome. BioMed Central. 2018;6:8.
bacterial plasmids conferring carbapenem resistance. Bonomo RA, editor. 44. National Research Council. Science needs for microbial forensics.
mBio. American Society for Microbiology; 2018;9:e02011–e02017. Developing initial international research priorities. Washington: National
23. Faust K, Lahti L, Gonze D, de Vos WM, Raes J. Metagenomics meets time Academies Press; 2014.
series analysis: unraveling microbial community dynamics. Curr Opin 45. Bierer BE, Crosas M, Pierce HH. Data authorship as an incentive to data
Microbiol Elsevier Current Trends. 2015;25:56–66. sharing. N Engl J Med. 2017;376:1684–7.
Meisel et al. Microbiome (2018) 6:197 Page 10 of 10
46. Credit for Data Sharing [Internet]. 2018 [cited 6 Aug 2018]. Available from:
https://www.aamc.org/initiatives/research/485818/datasharing.html
47. Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, et al. The
metagenomics RAST server – a public resource for the automatic
phylogenetic and functional analysis of metagenomes. BMC Bioinformatics.
BioMed Central. 2008;9:386.
48. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello
EK, et al. QIIME allows analysis of high-throughput community sequencing
data. Nat Methods. 2010;7:335–6.
49. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al.
Introducing mothur: open-source, platform-independent, community-
supported software for describing and comparing microbial communities.
Appl Environ Microbiol American Society for Microbiology. 2009;75:7537–41.
50. ACSL.net [Internet]. [cited 2018 Aug 6]. Available from: http://ascl.net
51. Materials Resource Registry [Internet]. [cited 2018 Aug 6]. Available from:
https://materials.registry.nist.gov
52. Hanisch RJ, Berriman GB, Lazio TJW, Emery Bunn S, Evans J, McGlynn TA, et
al. The virtual astronomical observatory: re-engineering access to
astronomical data. Astron Comput. 2015;11:190–209.
53. Field D, Sterk P, Kottmann R, De Smet JW, Amaral-Zettler L, Cochrane G, et
al. Genomic standards consortium projects. Stand Genomic Sci Michigan
State University. 2014;9:599–601.
54. Field D, Amaral-Zettler L, Cochrane G, Cole JR, Dawyndt P, Garrity GM, et al.
The Genomic Standards Consortium. Plos Biol. Public Library of Science.
2011;9:e1001088.
55. Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, et al.
Minimum information about a marker gene sequence (MIMARKS) and
minimum information about any (x) sequence (MIxS) specifications. Nat
Biotechnol Nature Publishing Group. 2011;29:415–20.
56. Titus AJ, Flower A, Hagerty P, Gamble P, Lewis C, Stavish T, et al. SIG-DB:
Leveraging homomorphic encryption to securely interrogate privately held
genomic databases. Markel S, editor. PLoS computational biology. Public
Library of Science. 2018;14:e1006454.
57. Ellington MJ, Ekelund O, Aarestrup FM, Canton R, Doumith M, Giske C, et al.
The role of whole genome sequencing in antimicrobial susceptibility testing
of bacteria: report from the EUCAST subcommittee. Clin Microbiol Infect
Elsevier. 2017;23:2–22.