1-s2 0-S1097276515003408-Main
1-s2 0-S1097276515003408-Main
1-s2 0-S1097276515003408-Main
Review
High-Throughput Sequencing Technologies
Jason A. Reuter,1 Damek V. Spacek,1 and Michael P. Snyder1,*
1Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
*Correspondence: [email protected]
http://dx.doi.org/10.1016/j.molcel.2015.05.004
The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine
has been made possible only because of the extraordinary advancements in DNA sequencing technologies
over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing
array of sequencing assays developed around them, as well as the challenges facing current sequencing
platforms and their clinical application.
Introduction
The human genome sequence was completed in draft form in
2001 (Lander et al., 2001; Venter et al., 2001). Shortly thereafter,
the genome sequences of several model organisms were determined (Chinwalla et al., 2002; Gibbs et al., 2004; Chimpanzee
Sequencing and Analysis Consortium, 2005). These feats were
accomplished with Sanger DNA sequencing, which was limited
in throughput and high cost; indeed, the first human genome
sequence was estimated to cost 0.51 billion dollars. These
limitations reduced the potential of DNA sequencing for other
applications, such as personal genome sequencing. Following
the release of the finished human genome (International
Human Genome Sequencing Consortium, 2004), the National
Human Genome Research Institute (NGHRI) created a 70 million
dollar DNA sequencing technology initiative aimed at achieving a
$1,000 human genome in 10 years (Schloss, 2008), and a flurry of
high-throughput sequencing (HTS) technologies emerged.
To put this initiative in perspective, improvements to traditional
Sanger sequencing had decreased the per base cost by around
100-fold by the completion of the Human Genome Project
(Schloss, 2008). To reach the $1,000 dollar genome threshold,
however, an additional leap of five orders of magnitude was
necessary. Much of this divide has been traversedthe cost of
a genome sequence (without interpretation) is presently less
than $2,000. The road to this milestone involved many commercial HTS platforms, which differ in their details but typically
follow a similar general paradigm: template preparation, clonal
amplification, followed by cyclical rounds of massively parallel
sequencing. The specific strategy employed by each platform
determines the quality, quantity, and biases of the resulting
sequence data and the platforms usefulness for particular
applications.
Several excellent reviews have covered HTS platform strategies in great depth (Metzker, 2010; Morey et al., 2013). Many
important platforms are not covered here, including Roche/
454s pyrophosphate Genome Sequencer (Margulies et al.,
2005) and Helicos single-molecule Heliscope sequencer (Harris
et al., 2008) as well as the Polonator (Shendure et al., 2005),
ABIs SOLiD (Valouev et al., 2008), and Complete Genomics
DNA nano-array sequencer (Drmanac et al., 2010). Instead, we
focus on the most commonly used platforms today as well as
more recent developments. We also provide an overview of
586 Molecular Cell 58, May 21, 2015 2015 Elsevier Inc.
Molecular Cell
Review
Figure 1. Timeline and Comparison of
Commercial HTS Instruments
10,000,000
Complete
Genomics
150
Illumina
NextSeq 500
ABI SOLiD 3
35
35
1,000
Solexa/Illumina
sequence analyzer
100
200
Ion Torrent
Ion Proton
Roche/454
GS Junior
454 GS-20
pyrosequencer
400
500
10
12
20
11
20
10
20
09
20
08
20
07
20
06
20
05
20
20
04
ABI SOLiD
20
14
13
300
Polonator
Illumina
G.007 Ion Torrent
MiSeq
Ion PGM
Roche/454
400
GS FLX+
14k
800
Pacific Bioscience
RSII
Helicos
Heliscope
10,000
20
32
35
150
ABI SOLiD
5500xl W Illumina
HiSeq 3000
75
13
Illumina
GAII 50
35
Illumina
GAIIx 150
125
100
150
20
Illumina
HiSeq 2500
Illumina
Hi-Seq 2000
1,000,000
Illumina
HiSeq X Ten
Pacific Biosciences
Single-molecule real-time (SMRT) sequencing was pioneered by
Nanofluidics, Inc. and commercialized by Pacific Biosciences.
Template preparation involves ligation of single-stranded,
hairpin adapters onto the ends of digested DNA or cDNA molecules, generating a capped template (SMRT-bell). By using a
strand displacing polymerase, the original DNA molecule can
be sequenced multiple times, thereby increasing accuracy (Travers et al., 2010). Importantly, clonal amplification is avoided,
allowing direct sequencing of native, and potentially modified,
DNA. DNA synthesis occurs in zeptoliter-sized chambers, called
zero-mode waveguides (ZMWs), in which a single polymerase is
immobilized at the bottom of the chamber (Levene et al., 2003)
(Figure 3A). The physics of these chambers reduces background
Molecular Cell 58, May 21, 2015 2015 Elsevier Inc. 587
Molecular Cell
Review
A
flow reversibly
terminating
dNTPs
B
TTP
GTP
CTP
GTP
ATP
H+
TTP
H+
H+
microwell
CTP
H+
H+
flo
wc
pH
H+
ell
sensor
cycle 3
cycle 2
cycle 4
dATP
voltage change
cycle 1
CTTA
GAATCGAAATCG
AGTC
cycle 1
noise such that phosphate-labeled versions of all four nucleotides can be present simultaneously. Thus, polymerization
occurs continuously, and the DNA sequence can be read in
real-time from the fluorescent signals recorded in a video (Eid
et al., 2009).
Released in 2010, the RS II remains Pacific Biosciences only
commercially available machine. However, altering the chemistry and doubling the number of ZMWs to 150 k per SMRT cell
have greatly enhanced performance. Using the latest chemistry,
each SMRT cell produces 50 k reads and up to 1 Gb of data in
4 hr. The average read lengths are >14 kb, but individual reads
can be as long 60 kb. As with most single-molecule sequencing
platforms, high error rates (11%) are evident for single pass
reads, and these errors are dominated by indels. Sequencing
errors, however, are distributed randomly, allowing accurate
consensus calls with increasing coverage or multiple passes
around the same template, so-called circular consensus sequences (Carneiro et al., 2012; Koren et al., 2012). By avoiding
clonal amplification, SMRT sequencing is also much less sensitive to GC sequence content than other platforms (Loomis et al.,
2013). This suite of characteristics makes SMRT sequencing
particularly useful for projects involving de novo assembly of
small bacterial and viral genomes as well as large genome
finishing (English et al., 2012). Reconstructing structural variation
(SV) in the genome (Chaisson et al., 2015) and isoform usage
in the transcriptome (Sharon et al., 2013) are also key areas
where SMRT sequencing has clear advantages over short read
technologies. However, lower throughput and higher per base
sequencing costs currently limit the scope of most genomewide studies.
588 Molecular Cell 58, May 21, 2015 2015 Elsevier Inc.
H+
Molecular Cell
Review
Molecular Cell
Review
Molecular Cell
Review
Table 1. Selected HTS Methods
Method
Purpose
Reference
RNA-seq
Transcript analysis
Transcription
Nascent-seq
Transcription
Transcription
Ribo-seq
Translation
Replication
Hi-C
Chromatin conformation
Chromatin conformation
Chromatin conformation
Genome localization
Genome methylation
Genome methylation
DNAse-seq
Open chromatin
Open chromatin
RNA structure
Structure-seq
RNA structure
RNA-protein interactions
RNA-protein interactions
Enhancer assay
Molecular Cell
Review
Table 2. Examples of Consortia-Based Projects
Initiative
Purpose
Website
www.1000genomes.org
www.encodeproject.org
www.roadmapepigenomics.org
www.hmpdacc.org
Genotype-Tissue Expression
Program
www.commonfund.nih.gov/GTEx/
index
http://www.immuneprofiling.org
https://esp.gs.washington.edu/drupal
www.cancergenome.nih.gov
www.icgc.org
www.genome.gov/27546194
www.mendelian.org
www.commonfund.nih.gov/Diseases/
index
www.genome.gov/27558493
www.benchtobassinet.com
www.niagads.org/adsp
Molecular Cell
Review
such as the number of sites undergoing RNA editing (Li et al.,
2009).
Understanding the structure and biology of these newly discovered transcripts has led to the development of additional HTS
applications. For instance, microRNA-target discovery has been
facilitated by sequencing signatures of miRNA-mediated mRNA
decay, using parallel analysis of RNA ends (PAREs) (German
et al., 2008). Furthermore, RNA immunoprecipitation chip (RIPchip) and subsequently RIP-seq were utilized to show that approximately 20% of the lncRNAs associate with polycomb repressor
complex 2 (PRC2), a chromatin-modifying complex (Khalil et al.,
2009; Zhao et al., 2010). Given these links to chromatin, methods
analogous to ChIP-seq were developed, such as chromatin isolation by RNA purification (ChIRP-seq), to determine the genomic
localization of lncRNAs (Chu et al., 2011). HTS applications have
also made it possible to determine transcript structure both
in vitro (parallel analysis of RNA sequencing; PARS) and in vivo
(Structure-seq), providing insight into the effects of various
structural features on translation efficiency, splicing, and polyadenylation (Ding et al., 2014; Kertesz et al., 2010). More recently,
systematic interrogation of sequence-function relationships for
RNA-protein interactions has been made possible using a highthroughput biochemical assay called RNA on a massively parallel
array (RNA-MaP) (Buenrostro et al., 2014). The use of these
assays, and many others, have enabled researchers to study
RNA biology both comprehensively and with great detail, thereby
enhancing our appreciation for the varied roles RNA plays in
normal cellular homeostasis as well as human disease.
Microbiome Sequencing
Advances in HTS have enabled extensive cataloging of metagenomic samples, providing insight into the diversity of microbial
species from a wide variety of sources, including the ocean,
soil, and human body. These studies use both 16S rRNA gene
sequencing to determine phylogenetic relationships as well as
more comprehensive shotgun sequencing to predict detailed
species and gene composition. In particular, much attention
has been paid to characterizing the diverse microbes resident
to healthy human populations (Human Microbiome Project Consortium, 2012). These studies found extensive variation in both
body site habitat and among different individuals, giving rise to
the concept of a personal microbiome. Microbial diversity, or
the number and abundance distribution of microorganisms in
a given niche, also correlates with several human diseases. For
instance, an increase in diversity is associated with bacterial
vaginosis (Fredricks et al., 2005), whereas obesity and inflammatory bowel disease exhibit a decrease in the diversity of gut
microbes (Qin et al., 2010; Turnbaugh et al., 2009). Although
transplant studies in mice have demonstrated a direct link
between the gut microbiome, energy metabolism, and obesity
(Turnbaugh et al., 2006), causal relationships for the majority of
human diseases are not well established. A deeper understanding will require more detailed characterizations of the dynamics
of microbiomes across health states as well as more integrative
studies to investigate the functional interplay between the microbiota, the host, and the environment.
Genome Sequencing of Rare Diseases
The capacity to sequence genomes, exomes, and transcriptomes has profoundly influenced our understanding of the ge-
Molecular Cell
Review
Limitations of Current HTS Technologies
It is becoming increasingly clear that while the technologies of
today may be capable of providing population-level sequencing
to both researchers and clinicians, key limitations remain. From a
technological perspective, accuracy and coverage across the
genome are still problematic, particularly for GC-rich regions
and long homopolymer stretches (Ross et al., 2013). In addition,
the short read lengths produced by most current platforms
severely limit our ability to accurately characterize large repeat
regions, many indels, and SV, leaving significant portions of
the genome opaque or inaccurate (Snyder et al., 2010). The
establishment of a gold standard genome, as envisioned by
the Genome in a Bottle Consortium (Zook and Salit, 2011) as
well as standards for data processing, variant calling, and reporting as set out in the CLARITY Challenge (Brownstein et al., 2014),
will be valuable for comparing and reporting the accuracy of
different platforms and studies. Given the limitations and biases
of different platforms, it is also likely that accurate genome
sequencing will use a combination of technologies.
In addition to genomes, quantitative analysis of complete
transcriptomes, with individual allelic and spliced isoforms, is
hindered by short reads. Improvements in the throughput and
accuracy of current long-read technologies, such as Pacific Biosciences and Oxford Nanopore Technologies, as well as the use
of synthetic long-read methods in which longer fragments can
be sequenced and assembled from short reads will help overcome these limitations (Tilgner et al., 2015). Although both the
research and medical communities are pressing forward with
current technologies, these limitations will also continue to drive
the innovation of new sequencing platforms (reviewed by Schadt
et al., 2010).
HTS in the Coming Era of Personalized Medicine
To date, clinical HTS has most often been employed on focused
regions of the genome or in the context of small pathogen
identification. For instance, prenatal tests designed to non-invasively detect chromosomal abnormalities in cell-free DNA from
maternal blood are clinically available (e.g., Ariosa Diagnostics
Harmony Test and BGIs NIFTY Test). Similarly, targeted HTS
of clinically actionable mutations is being utilized to guide the
diagnosis and treatment of cancer (e.g., Foundation Medicines
FoundationONE test). HTS has also been employed in clinical
contexts to monitor pathogen outbreaks, such as methicillinresistant S. aureus infections (Koser et al., 2012). The development and use of these focused assays will continue to expand,
but the full promise of personalized medicine relies upon the
routine clinical application of more comprehensive techniques,
such as WGS, which still faces significant challenges.
In order for large-scale genomics to become fully integrated
into the clinic, we need to reduce the costs and timescales associated with storage and interpretation of genome data. Most
importantly, however, we must improve our ability to understand
the biological and clinical consequences of variants of unknown
significance. This class of alterations is the most common in personal genome sequences and includes novel variants that affect
the coding sequence of known disease-causing genes but can
also refer to variants in genes previously unlinked to disease or
in regulatory regions of the genome. Interpretation of these var594 Molecular Cell 58, May 21, 2015 2015 Elsevier Inc.
Molecular Cell
Review
Cheng, Y., Ma, Z., Kim, B.-H., Wu, W., Cayting, P., Boyle, A.P., Sundaram, V.,
Xing, X., Dogan, N., Li, J., et al.; Mouse ENCODE Consortium (2014). Principles
of regulatory information conservation between mouse and human. Nature
515, 371375.
Chimpanzee Sequencing and Analysis Consortium (2005). Initial sequence
of the chimpanzee genome and comparison with the human genome. Nature
437, 6987.
Chinwalla, A., Cook, L., Delehaunty, K., Fewell, G., Fulton, L., Fulton, R.,
Graves, T., Hillier, L., Mardis, E., and McPherson, J. (2002). Initial sequencing
and comparative analysis of the mouse genome. Nature 420, 520562.
Chu, C., Qu, K., Zhong, F.L., Artandi, S.E., and Chang, H.Y. (2011). Genomic
maps of long noncoding RNA occupancy reveal principles of RNA-chromatin
interactions. Mol. Cell 44, 667678.
Church, G.M. (2005). The personal genome project. Mol. Syst. Biol. 1. Published online Decembe 13, 2005.
Churchman, L.S., and Weissman, J.S. (2011). Nascent transcript sequencing
visualizes transcription at nucleotide resolution. Nature 469, 368373.
Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D.,
Pradhan, S., Nelson, S.F., Pellegrini, M., and Jacobsen, S.E. (2008). Shotgun
bisulphite sequencing of the Arabidopsis genome reveals DNA methylation
patterning. Nature 452, 215219.
Core, L.J., Waterfall, J.J., and Lis, J.T. (2008). Nascent RNA Sequencing
Reveals Widespread Pausing and Divergent Initiation at Human Promoters.
Science 322, 18451848.
Crawford, G.E., Holt, I.E., Whittle, J., Webb, B.D., Tai, D., Davis, S., Margulies,
E.H., Chen, Y., Bernat, J.A., Ginsburg, D., et al. (2006). Genome-wide mapping
of DNase hypersensitive sites using massively parallel signature sequencing
(MPSS). Genome Res. 16, 123131.
Ding, L., Ley, T.J., Larson, D.E., Miller, C.A., Koboldt, D.C., Welch, J.S.,
Ritchey, J.K., Young, M.A., Lamprecht, T., McLellan, M.D., et al. (2012). Clonal
evolution in relapsed acute myeloid leukaemia revealed by whole-genome
sequencing. Nature 481, 506510.
Ding, Y., Tang, Y., Kwok, C.K., Zhang, Y., Bevilacqua, P.C., and Assmann,
S.M. (2014). In vivo genome-wide profiling of RNA secondary structure reveals
novel regulatory features. Nature 505, 696700.
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and
Ren, B. (2012). Topological domains in mammalian genomes identified by
analysis of chromatin interactions. Nature 485, 376380.
Hansen, R.S., Thomas, S., Sandstrom, R., Canfield, T.K., Thurman, R.E., et al.
(2010). Sequencing newly replicated DNA reveals widespread plasticity in
human replication timing. Proc. Natl. Acad. Sci. USA 107, 139144.
Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A.,
Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., et al. (2012). Landscape of
transcription in human cells. Nature 489, 101108.
Harris, T.D., Buzby, P.R., Babcock, H., Beer, E., Bowers, J., Braslavsky, I.,
Causey, M., Colonell, J., Dimeo, J., Efcavitch, J.W., et al. (2008). Singlemolecule DNA sequencing of a viral genome. Science 320, 106109.
Dohm, J.C., Lottaz, C., Borodina, T., and Himmelbauer, H. (2008). Substantial
biases in ultra-short read data sets from high-throughput DNA sequencing.
Nucleic Acids Res. 36, e105.
Huang, F.W., Hodis, E., Xu, M.J., Kryukov, G.V., Chin, L., and Garraway, L.A.
(2013). Highly recurrent TERT promoter mutations in human melanoma.
Science 339, 957959.
Doolittle, W.F. (2013). Is junk DNA bunk? A critique of ENCODE. Proc. Natl.
Acad. Sci. USA 110, 52945300.
Human Microbiome Project Consortium (2012). Structure, function and diversity of the healthy human microbiome. Nature 486, 207214.
Dostie, J., Richmond, T.a, Arnaout, R.a, Selzer, R.R., Lee, W.L., Honan, T.a,
Rubio, E.D., Krumm, A., Lamb, J., Nusbaum, C., et al. (2006). Chromosome
Conformation Capture Carbon Copy (5C): a massively parallel solution for
mapping interactions between genomic elements. Genome Res. 16, 1299
1309.
Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L., Burns, N.L., Kermani,
B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G., et al. (2010). Human
genome sequencing using unchained base reads on self-assembling DNA
nanoarrays. Science 327, 7881.
Jain, M., Fiddes, I.T., Miga, K.H., Olsen, H.E., Paten, B., and Akeson, M. (2015).
Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12,
351356.
Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., et al. (2009). Real-time DNA sequencing from single
polymerase molecules. Science 323, 133138.
Jin, F., Li, Y., Dixon, J.R., Selvaraj, S., Ye, Z., Lee, A.Y., Yen, C.-A., Schmitt,
A.D., Espinoza, C.A., and Ren, B. (2013). A high-resolution map of the threedimensional chromatin interactome in human cells. Nature 503, 290294.
ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774.
Johnson, D.S., Mortazavi, A., Myers, R.M., and Wold, B. (2007). Genome-wide
mapping of in vivo protein-DNA interactions. Science 316, 14971502.
English, A.C., Richards, S., Han, Y., Wang, M., Vee, V., Qu, J., Qin, X., Muzny,
D.M., Reid, J.G., Worley, K.C., and Gibbs, R.A. (2012). Mind the gap: upgrad-
Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K.,
Ward, L.D., Birney, E., Crawford, G.E., Dekker, J., et al. (2014). Defining
Molecular Cell 58, May 21, 2015 2015 Elsevier Inc. 595
Molecular Cell
Review
functional DNA elements in the human genome. Proc. Natl. Acad. Sci. USA
111, 61316138.
Kertesz, M., Wan, Y., Mazor, E., Rinn, J.L., Nutter, R.C., Chang, H.Y., and
Segal, E. (2010). Genome-wide measurement of RNA secondary structure in
yeast. Nature 467, 103107.
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A.,
Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., et al. (2005). Genome
sequencing in microfabricated high-density picolitre reactors. Nature 437,
376380.
Khalil, A.M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D.,
Thomas, K., Presser, A., Bernstein, B.E., van Oudenaarden, A., et al. (2009).
Many human large intergenic noncoding RNAs associate with chromatinmodifying complexes and affect gene expression. Proc. Natl. Acad. Sci.
USA 106, 1166711672.
Kheradpour, P., Ernst, J., Melnikov, A., Rogov, P., Wang, L., Zhang, X., Alston,
J., Mikkelsen, T.S., and Kellis, M. (2013). Systematic dissection of regulatory
motifs in 2000 predicted human enhancers using a massively parallel reporter
assay. Genome Res. 23, 800811.
Khodor, Y.L., Rodriguez, J., Abruzzi, K.C., Tang, C.H.A., Marr, M.T., and Rosbash, M. (2011). Nascent-seq indicates widespread cotranscriptional premRNA splicing in Drosophila. Genes Dev. 25, 25022512.
Kidd, J.M., Graves, T., Newman, T.L., Fulton, R., Hayden, H.S., Malig, M.,
Kallicki, J., Kaul, R., Wilson, R.K., and Eichler, E.E. (2010). A human genome
structural variation sequencing resource reveals insights into mutational
mechanisms. Cell 143, 837847.
Korbel, J.O., Urban, A.E., Affourtit, J.P., Godwin, B., Grubert, F., Simons, J.F.,
Kim, P.M., Palejev, D., Carriero, N.J., Du, L., et al. (2007). Paired-end mapping
reveals extensive structural variation in the human genome. Science 318,
420426.
Koren, S., Schatz, M.C., Walenz, B.P., Martin, J., Howard, J.T., Ganapathy, G.,
Wang, Z., Rasko, D.A., McCombie, W.R., and Jarvis, E.D.; Adam M Phillippy
(2012). Hybrid error correction and de novo assembly of single-molecule
sequencing reads. Nat. Biotechnol. 30, 693700.
Koser, C.U., Holden, M.T.G., Ellington, M.J., Cartwright, E.J.P., Brown, N.M.,
Ogilvy-Stuart, A.L., Hsu, L.Y., Chewapreecha, C., Croucher, N.J., Harris, S.R.,
et al. (2012). Rapid whole-genome sequencing for investigation of a neonatal
MRSA outbreak. N. Engl. J. Med. 366, 22672275.
Kuleshov, V., Xie, D., Chen, R., Pushkarev, D., Ma, Z., Blauwkamp, T., Kertesz,
M., and Snyder, M. (2014). Whole-genome haplotyping using long reads and
statistical methods. Nat. Biotechnol. 32, 261266.
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J.,
Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al.; International Human
Genome Sequencing Consortium (2001). Initial sequencing and analysis of
the human genome. Nature 409, 860921.
Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou,
S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., et al. (2012). ChIP-seq
guidelines and practices of the ENCODE and modENCODE consortia.
Genome Res. 22, 18131831.
Lawrence, M.S., Stojanov, P., Mermel, C.H., Robinson, J.T., Garraway, L.A.,
Golub, T.R., Meyerson, M., Gabriel, S.B., Lander, E.S., and Getz, G. (2014).
Discovery and saturation analysis of cancer genes across 21 tumour types.
Nature 505, 495501.
Levene, M.J., Korlach, J., Turner, S.W., Foquet, M., Craighead, H.G., and
Webb, W.W. (2003). Zero-mode waveguides for single-molecule analysis at
high concentrations. Science 299, 682686.
Li, J.B., Levanon, E.Y., Yoon, J.-K., Aach, J., Xie, B., LeProust, E., Zhang, K.,
Gao, Y., and Church, G.M. (2009). Genome-wide identification of human RNA
editing sites by parallel DNA capturing and sequencing. Science 324, 1210
1213.
Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy,
T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al.
(2009). Comprehensive mapping of long-range interactions reveals folding
principles of the human genome. Science 326, 289293.
Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., Lin, D., Lu, L., and Law, M. (2012).
Comparison of next-generation sequencing systems. J. Biomed. Biotechnol.
2012, http://dx.doi.org/10.1155/2012/251364.
Loomis, E.W., Eid, J.S., Peluso, P., Yin, J., Hickey, L., Rank, D., Mccalmon, S.,
Hagerman, R.J., Tassone, F., and Hagerman, P.J. (2013). Sequencing the un-
596 Molecular Cell 58, May 21, 2015 2015 Elsevier Inc.
Meissner, A., Mikkelsen, T.S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A.,
Zhang, X., Bernstein, B.E., Nusbaum, C., Jaffe, D.B., et al. (2008). Genomescale DNA methylation maps of pluripotent and differentiated cells. Nature
454, 766770.
Mellmann, A., Harmsen, D., Cummings, C.A., Zentz, E.B., Leopold, S.R., Rico,
A., Prior, K., Szczepanowski, R., Ji, Y., Zhang, W., et al. (2011). Prospective
genomic characterization of the German enterohemorrhagic Escherichia coli
O104:H4 outbreak by rapid next generation sequencing technology. PLoS
ONE 6, e22751.
Metzker, M.L. (2010). Sequencing technologies - the next generation. Nat.
Rev. Genet. 11, 3146.
Morey, M., Fernandez-Marmiesse, A., Castineiras, D., Fraga, J.M., Couce,
M.L., and Cocho, J.A. (2013). A glimpse into past, present, and future DNA
sequencing. Mol. Genet. Metab. 110, 324.
Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., and
Snyder, M. (2008). The transcriptional landscape of the yeast genome defined
by RNA sequencing. Science 320, 13441349.
Navin, N., Kendall, J., Troge, J., Andrews, P., Rodgers, L., McIndoo, J., Cook,
K., Stepansky, A., Levy, D., Esposito, D., et al. (2011). Tumour evolution
inferred by single-cell sequencing. Nature 472, 9094.
Ng, S.B., Bigham, A.W., Buckingham, K.J., Hannibal, M.C., McMillin, M.J., Gildersleeve, H.I., Beck, A.E., Tabor, H.K., Cooper, G.M., Mefford, H.C., et al.
(2010). Exome sequencing identifies MLL2 mutations as a cause of Kabuki
syndrome. Nat. Genet. 42, 790793.
Patwardhan, R.P., Hiatt, J.B., Witten, D.M., Kim, M.J., Smith, R.P., May, D.,
Lee, C., Andrie, J.M., Lee, S.-I., Cooper, G.M., et al. (2012). Massively parallel
functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30,
265270.
Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., Manichanh, C., Nielsen,
T., Pons, N., Levenez, F., Yamada, T., et al.; MetaHIT Consortium (2010). A human gut microbial gene catalogue established by metagenomic sequencing.
Nature 464, 5965.
Quick, J., Quinlan, A.R., and Loman, N.J. (2014). A reference bacterial genome
dataset generated on the MinION TM portable single-molecule nanopore
sequencer. 3, http://dx.doi.org/10.1186/2047-217X-3-22.
Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D.,
Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., and Aiden,
E.L. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 16651680.
Roadmap Epigenomics Consortium, Kundaje, A., Meuleman, W., Ernst, J.,
Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang,
J., et al. (2015). Integrative analysis of 111 reference human epigenomes.
Nature 518, 317330.
Ross, M.G., Russ, C., Costello, M., Hollinger, A., Lennon, N.J., Hegarty, R.,
Nusbaum, C., and Jaffe, D.B. (2013). Characterizing and measuring bias in
sequence data. Genome Biol. 14, R51.
Rothberg, J.M., Hinz, W., Rearick, T.M., Schultz, J., Mileski, W., Davey, M.,
Leamon, J.H., Johnson, K., Milgrew, M.J., Edwards, M., et al. (2011). An integrated semiconductor device enabling non-optical genome sequencing.
Nature 475, 348352.
Schadt, E.E., Turner, S., and Kasarskis, A. (2010). A window into third-generation sequencing. Hum. Mol. Genet. 19 (R2), R227R240.
Schloss, J.A. (2008). How to get genomes at one ten-thousandth the cost. Nat.
Biotechnol. 26, 11131115.
Selvaraj, S., R Dixon, J., Bansal, V., and Ren, B. (2013). Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat.
Biotechnol. 31, 11111118.
Molecular Cell
Review
Sephton, C.F., Cenik, C., Kucukural, A., Dammer, E.B., Cenik, B., Han, Y.,
Dewey, C.M., Roth, F.P., Herz, J., Peng, J., et al. (2011). Identification of
neuronal RNA targets of TDP-43-containing ribonucleoprotein complexes.
J. Biol. Chem. 286, 12041215.
Sharon, D., Tilgner, H., Grubert, F., and Snyder, M. (2013). A single-molecule
long-read survey of the human transcriptome. Nat. Biotechnol. 31, 10091014.
Shendure, J., Porreca, G.J., Reppas, N.B., Lin, X., McCutcheon, J.P., Rosenbaum, A.M., Wang, M.D., Zhang, K., Mitra, R.D., and Church, G.M. (2005).
Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 17281732.
Smith, M.G., Gianoulis, T.A., Pukatzki, S., Mekalanos, J.J., Ornston, L.N.,
Gerstein, M., and Snyder, M. (2007). New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon
mutagenesis. Genes Dev. 21, 601614.
Snyder, M., Du, J., and Gerstein, M. (2010). Personal genome sequencing:
current approaches and challenges. Genes Dev. 24, 423431.
Su, Z., qabaj, P.P., Li, S., Thierry-Mieg, J., Thierry-Mieg, D., Shi, W., Wang, C.,
Schroth, G.P., Setterquist, R.A., Thompson, J.F., et al.; SEQC/MAQC-III
Consortium (2014). A comprehensive assessment of RNA-seq accuracy,
reproducibility and information content by the Sequencing Quality Control
Consortium. Nat. Biotechnol. 32, 903914.
The 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing. Nature 467, 10611073.
nucleosome position map of C. elegans reveals a lack of universal sequencedictated positioning. Genome Res. 18, 10511063.
Van Allen, E.M., Wagle, N., Sucker, A., Treacy, D.J., Johannessen, C.M.,
Goetz, E.M., Place, C.S., Taylor-Weiner, A., Whittaker, S., Kryukov, G.V.,
et al.; Dermatologic Cooperative Oncology Group of Germany (DeCOG)
(2014). The genetic landscape of clinical resistance to RAF inhibition in metastatic melanoma. Cancer Discov. 4, 94109.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G.,
Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. (2001). The sequence
of the human genome. Science 291, 13041351.
Voskoboynik, A., Neff, N.F., Sahoo, D., Newman, A.M., Pushkarev, D., Koh,
W., Passarelli, B., Fan, H.C., Mantalas, G.L., Palmeri, K.J., et al. (2013). The
genome sequence of the colonial chordate, Botryllus schlosseri. Elife 2,
e00569.
Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool
for transcriptomics. Nat. Rev. Genet. 10, 5763.
Wang, Y., Waters, J., Leung, M.L., Unruh, A., Roh, W., Shi, X., Chen, K.,
Scheet, P., Vattathil, S., Liang, H., et al. (2014). Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155160.
Wang, Y., Yang, Q., and Wang, Z. (2015). The evolution of nanopore
sequencing. Front. Genet. 5, 449.
Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen,
E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B., et al. (2012). The
accessible chromatin landscape of the human genome. Nature 489, 7582.
Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He,
W., Chen, Y.-J., Makhijani, V., Roth, G.T., et al. (2008). The complete genome
of an individual by massively parallel DNA sequencing. Nature 452, 872876.
Tilgner, H., Jahanbani, F., Blauwkamp, T., Moshrefi, A., Jaeger, E., Chen, F.,
Harel, I., Bustamante, C., Rasmussen, M., and Snyder, M. (2015). Comprehensive transcriptome analysis using synthetic long read sequencing reveals
molecular co-association of distant splicing events. Nat. Biotechnol. 33.
Worthey, E.A., Mayer, A.N., Syverson, G.D., Helbling, D., Bonacci, B.B.,
Decker, B., Serpe, J.M., Dasu, T., Tschannen, M.R., Veith, R.L., et al. (2011).
Making a definitive diagnosis: successful clinical application of whole exome
sequencing in a child with intractable inflammatory bowel disease. Genet.
Med. 13, 255262.
Travers, K.J., Chin, C.S., Rank, D.R., Eid, J.S., and Turner, S.W. (2010).
A flexible and efficient template format for circular consensus sequencing
and SNP detection. Nucleic Acids Res. 38, e159.
Turnbaugh, P.J., Ley, R.E., Mahowald, M.A., Magrini, V., Mardis, E.R., and
Gordon, J.I. (2006). An obesity-associated gut microbiome with increased
capacity for energy harvest. Nature 444, 10271031.
Turnbaugh, P.J., Hamady, M., Yatsunenko, T., Cantarel, B.L., Duncan, A., Ley,
R.E., Sogin, M.L., Jones, W.J., Roe, B.A., Affourtit, J.P., et al. (2009). A core gut
microbiome in obese and lean twins. Nature 457, 480484.
Uemura, S., Aitken, C.E., Korlach, J., Flusberg, B.A., Turner, S.W., and Puglisi,
J.D. (2010). Real-time tRNA transit on single translating ribosomes at codon
resolution. Nature 464, 10121017.
Valouev, A., Ichikawa, J., Tonthat, T., Stuart, J., Ranade, S., Peckham, H.,
Zeng, K., Malek, J.A., Costa, G., McKernan, K., et al. (2008). A high-resolution,
Yang, Y., Muzny, D.M., Reid, J.G., Bainbridge, M.N., Willis, A., Ward, P.A.,
Braxton, A., Beuten, J., Xia, F., Niu, Z., et al. (2013). Clinical whole-exome
sequencing for the diagnosis of mendelian disorders. N. Engl. J. Med. 369,
15021511.
Zhang, Z.D., Du, J., Lam, H., Abyzov, A., Urban, A.E., Snyder, M., and Gerstein, M. (2011). Identification of genomic indels and structural variations using
split reads. BMC Genomics 12, 375.
Zhao, J., Ohsumi, T.K., Kung, J.T., Ogawa, Y., Grau, D.J., Sarma, K., Song,
J.J., Kingston, R.E., Borowsky, M., and Lee, J.T. (2010). Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939953.
Zook, J.M., and Salit, M. (2011). Genomes in a bottle: creating standard
reference materials for genomic variation - why, what and how? Genome
Biol. 12, 31.
Molecular Cell 58, May 21, 2015 2015 Elsevier Inc. 597