Bexfield 2010 Metagenomic - Virus

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

The Veterinary Journal xxx (2010) xxx–xxx

Contents lists available at ScienceDirect

The Veterinary Journal


journal homepage: www.elsevier.com/locate/tvjl

Review

Metagenomics and the molecular identification of novel viruses


Nicholas Bexfield a,⇑, Paul Kellam b
a
Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, UK
b
The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK

a r t i c l e i n f o a b s t r a c t

Article history: There have been rapid recent developments in establishing methods for identifying and characterising
Accepted 20 October 2010 viruses associated with animal and human diseases. These methodologies, commonly based on hybrid-
Available online xxxx isation or PCR techniques, are combined with advanced sequencing techniques termed ‘next generation
sequencing’. Allied advances in data analysis, including the use of computational transcriptome subtrac-
Keywords: tion, have also impacted the field of viral pathogen discovery. This review details these molecular detec-
Metagenomics tion techniques, discusses their application in viral discovery, and provides an overview of some of the
Virus discovery
novel viruses discovered. The problems encountered in attributing disease causality to a newly identified
Animals
Computational transcriptome subtraction
virus are also considered.
Hybridisation Ó 2010 Elsevier Ltd. All rights reserved.

Introduction One such approach uses sequence information from known


pathogens to identify related but undiscovered agents through
Given that animal pathogens (in particular viruses) are consid- cross-hybridisation. Examples include microarray (Wang et al.,
ered a significant source of emerging human infections (Cleaveland 2002) and subtractive (Lisitsyn et al., 1993) hybridisation-based
et al., 2001), the identification and optimal characterisation of no- methods. Another advance has involved PCR amplification of the
vel organisms affecting both domestic and wild animal populations pathogen genome, where there is complete knowledge of the path-
is central to protecting both human and animal health. Recent out- ogen to be amplified (conventional PCR), or where this information
breaks of human infection caused by influenza H7N7 virus trans- is limited (degenerate PCR). Other PCR methods such as
mitted from poultry (Koopmans et al., 2004) and H1N1 virus sequence-independent single primer amplification, degenerate
transmitted from pigs (Dawood et al., 2009; Itoh et al., 2009) are oligonucleotide primed PCR, random PCR and rolling circle amplifi-
cases in point, highlighting the need for ongoing, vigilant epidemi- cation, also have the capacity to detect completely novel pathogens.
ological surveillance of such pathogens in animal populations. Hybridisation and PCR-based methods are more effective if the
Moreover, epidemiological studies strongly suggest that novel sample to be analysed is first enriched for virus, a process achieved
infectious agents remain to be discovered (Woolhouse et al., by removing host and other contaminating nucleic acids. The end
2008) and may be contributing to a host of cancers, autoimmune result of most hybridisation and PCR methods are amplified prod-
disorders, and degenerative diseases in humans (Relman, 1999; ucts that require definitive identification by sequencing. Advances
Dalton-Griffin and Kellam, 2009). Similar, yet-to-be-identified in sequencing that have facilitated virus discovery include the arri-
viruses may be contributing to the pathogenesis of similar diseases val of ‘next or second generation sequencing’, which can generate
in animals. very large amounts of sequence data.
Viruses can be identified by a wide range of techniques. Tradi- Technological advances have also resulted in the development
tional methods include electron microscopy, cell culture, inocula- of metagenomics, the culture-independent study of the collective
tion studies and serology (Storch, 2007). Whereas many of the set of microbial populations (microbiome) in a sample by analysing
viruses known today were first identified by these techniques, the sample’s nucleotide sequence content (Petrosino et al., 2009).
the methods have limitations. For instance, many viruses cannot The different microorganisms constituting a microbiome can in-
be cultivated in the laboratory and can only be characterised by clude bacteria, fungi (mostly yeasts) and viruses. Examples of
molecular methods (Amann et al., 1995), and in recent years we microbiomes in mammalian biology include the microbial popula-
have seen the increasing use of these techniques in pathogen dis- tions inhabiting the human intestine or mucosal surfaces both in
covery (Fig. 1). health and disease.
To date, the study of the viral microbiome (virome) has been ap-
⇑ Corresponding author. Tel.: +44 1223 765631; fax: +44 1223 330886. plied to a range of biological and environmental samples including
E-mail address: [email protected] (N. Bexfield). human (Breitbart et al., 2003; Zhang et al., 2006; Finkbeiner et al.,

1090-0233/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.tvjl.2010.10.014

Please cite this article in press as: Bexfield, N., Kellam, P. Metagenomics and the molecular identification of novel viruses. The Veterinary Journal (2010),
doi:10.1016/j.tvjl.2010.10.014
2 N. Bexfield, P. Kellam / The Veterinary Journal xxx (2010) xxx–xxx

Fig. 1. A schematic overview of the molecular methods currently available for viral discovery. Hybridisation methods include microarray and subtractive hybridisation
techniques such as representational difference analysis. PCR-based methods include degenerate PCR, degenerate oligonucleotide primed PCR (DOP-PCR), sequence-
independent single primer amplification (SISPA), random PCR and rolling circle amplification (RCA).

2008) and equine (Cann et al., 2005) intestinal contents, bat guano or sub-types of known, viruses. Such a technique has been used to
(Li et al., 2010), sea water (Breitbart et al., 2002; Angly et al., 2006; discriminate between human herpes viruses (for example
Williamson et al., 2008), marine sediment (Breitbart et al., 2004), Foldes-Papp et al., 2004). The second type of microarray method
fresh water (Breitbart et al., 2009; Djikeng et al., 2009), hot springs employs long oligonucleotide probes (60 or 70 bp) that allow for
(Schoenfeld et al., 2008), soil (Fierer et al., 2007) and plants sequence mismatches (Wang et al., 2002). Microarray applications
(Coetzee et al., 2010). Early results from a large initiative to de- have been used in the discovery of novel animal viruses such as a
scribe the humane microbiome associated with health and disease coronavirus in a Beluga whale (Mihindukulasuriya et al., 2008), the
have recently been published (Nelson et al., 2010), and such bornavirus that causes proventricular dilation disease in wild psit-
findings, together with those of other studies, are likely to lead tacine birds (Kistler et al., 2008), and an enterovirus associated
to the discovery of a wealth of previously unknown viruses. with tongue erosions in bottle-nose dolphins (Nollens et al.,
This review describes the current molecular techniques avail- 2009). In human medicine they have been used to characterise
able for the detection of viruses infecting animals and humans. SARS-CoV (Wang et al., 2003), and to identify novel coronaviruses
We begin by discussing hybridisation and PCR-based methods and rhinoviruses in asthmatics (Kistler et al., 2007), gammaretrovirus
and describe advances that have facilitated the detection of com- in prostate tissue (Urisman et al., 2006) and cardioviruses in the
pletely novel viruses. Advances in sequencing methodology and gastro-intestinal tract (Chiu et al., 2008).
data analysis, such as transcriptome subtraction, are also ap- Microarray technology is a powerful tool as it screens for a large
praised. The review concludes with an assessment of the problems number of potential pathogens simultaneously (Wang et al., 2002;
encountered when attempting to attribute disease causality to a Palacios et al., 2007; Xiao-Ping et al., 2009). The method does have
newly discovered virus. limitations however, as the process of interpreting hybridisation
signals is not a trivial one, often involving the empirical character-
Hybridisation-based methods isation of signals produced by known viruses and the development
of specialised software (Urisman et al., 2005). Furthermore, micro-
Microarray techniques array techniques utilise probes with a finite specificity for a partic-
ular pathogen or small group of pathogens so that novel or highly
Microarrays consist of high-density oligonucleotide probes (or divergent strains or viruses can be difficult to detect. Non-specific
segments of DNA) immobilised on a solid surface. Any complemen- binding of test material to hybridisation probes can also result in
tary sequences (labelled with fluorescent nucleotides) in a test loss of test sensitivity. Despite these limitations, microarrays have
sample hybridise to the probe on the microarray. The results of proven extremely effective in novel pathogen discovery.
hybridisation are detected and quantified by fluorescence-based
detection and thus the relative abundance of nucleic acid se- Subtractive hybridisation
quences in a sample can be determined (Clewley, 2004).
Two types of microarray techniques are commonly used for This form of hybridisation identifies sequence differences be-
virus identification. The first uses short oligonucleotide probes tween two related samples and is based on the principle of remov-
(sensitive to single-base mismatches) to detect or identify known, ing common nucleic acid sequences from two samples while

Please cite this article in press as: Bexfield, N., Kellam, P. Metagenomics and the molecular identification of novel viruses. The Veterinary Journal (2010),
doi:10.1016/j.tvjl.2010.10.014
N. Bexfield, P. Kellam / The Veterinary Journal xxx (2010) xxx–xxx 3

leaving differing sequences intact. Such a process can be applied to gammaherpesviruses (Van Devanter et al., 1996; Rose et al.,
any pair of nucleic acid sources such as ‘treated’ vs. ‘untreated’ or 1997), a novel alphaherpesvirus associated with death in rabbits
‘diseased’ vs. ‘undiseased’ tissue, or to samples obtained prior to (Jin et al., 2008), and a novel chimpanzee polyomavirus (Johne
and after experimental infection (Muerhoff et al., 1997). et al., 2005). Novel viruses infecting humans detected using this
Subtractive hybridisation uses two nucleic acid sources termed technique include hepatitis G virus (Simons et al., 1995a), a
‘tester’ and ‘driver’ with only the tester containing pathogen se- hantavirus (sin nombre virus) (Nichol et al., 1993), coronaviruses
quences (Ambrose and Clewley, 2006). DNA in both the tester (Sampath et al., 2005), and parainfluenza viruses 1–3 (Corne
and driver is digested by restriction enzymes and adaptors are et al., 1999).
ligated to the DNA fragments from the tester sample only. The
two DNA populations are mixed, denatured and annealed to form
three types of molecule: (1) tester/tester; (2) hybrids of tester/dri- Sequence-independent single primer amplification (SISPA)
ver, and (3) driver/driver. The tester/tester molecules should now
be enriched for pathogen, which are preferentially and exponen- Sequence-independent amplification of viral nucleic acid
tially amplified by primers specific for the adaptors present on avoids the potential limitations of other methods, particularly
both DNA strands. The tester/driver molecules, which only contain the lack of microarray hybridisation due to genetic divergence
an adaptor on one DNA strand, undergo linear amplification but are from known viruses, the absence of a matched sample for subtrac-
then removed by enzymatic digestion. The driver/driver molecules tive hybridisation and where PCR amplification using conven-
have no adaptors and are not amplified. Sufficiently enriched in tional or degenerate primers fails. The advantages of these
this way, the tester sample is sequenced and the pathogen methods are their ability to detect novel viruses highly divergent
identified. from those already known, their relative speed and simplicity of
An example of a subtractive hybridisation method is represen- use and their lack of bias in identifying particular groups of
tational difference analysis (RDA) (Lisitsyn et al., 1993). Despite viruses (Delwart, 2007).
its impressive performance in model systems, RDA has had limited A sequence-independent amplification technique termed se-
success in the discovery of novel viruses, largely due to the require- quence-independent single primer amplification (SISPA) was intro-
ment for two highly matched nucleic acid sources. Restriction en- duced almost two decades ago to identify viral nucleic acid of
zyme digestion also leads to an increased DNA complexity and the unknown sequence present in low amounts (Reyes and Kim,
risk of inefficient subtractive hybridisation, a particular problem 1991). SISPA was used first to sequence the norovirus genome from
with samples containing large amounts of host DNA, such as serum human faeces (Matsui et al., 1991), in addition to an astrovirus
or plasma. Despite these limitations, RDA has been used to identify (Matsui et al., 1993) and a rotavirus (Lambden et al., 1992) infect-
the agent causing Kaposi’s sarcoma (human herpesvirus-8) (Chang ing humans. Originally the SISPA method involved endonuclease
et al., 1994), torque teno or transfusion-transmitted virus (TTV) digestion of DNA, followed by directional ligation of an asymmetric
(Nishizawa et al., 1997) and the hepatitis GBV-A and GBV-B viruses adaptor or primer on to both ends of the DNA molecule (Reyes and
(Simons et al., 1995b). Kim, 1991). Common end sequences of the adaptor allowed the
DNA to be amplified in a subsequent PCR reaction using a comple-
mentary single primer. Due to the low complexity of a viral gen-
PCR-based methods ome, enzymatic digestion produces a large amount of a limited
number of fragments. After amplification these are visible as dis-
Degenerate PCR crete bands on an agarose gel and can be sequenced and identified
(Allander et al., 2001). Since animal and bacterial genomes are lar-
Conventional PCR is frequently used to identify or exclude the ger and more complex, restriction digestion generates many differ-
presence of a virus in samples. Given that the method relies on ent-sized fragments, the amplification of which can result in
the annealing of specific primers complementary to the pathogen’s ‘smears’ on agarose gel.
genomic sequence of interest, it is unsuitable for the detection of One of the disadvantages of sequence-independent amplifica-
novel viruses where there are marked sequence differences from tion techniques is the contemporaneous amplification of ‘contam-
the primers. Prior knowledge of the viral sequence is therefore a inating’ host and bacterial nucleic acid. Enriching methods that
pre-requisite. An alternative PCR method, degenerate PCR, uses reduce such ‘background’ genomic material include filtration,
primers designed to anneal to highly-conserved sequence regions ultra-centrifugation, density-gradient ultra-centrifugation and
shared by related viruses. enzymatic digestion of non-viral nucleic acids using DNAse and
Because these regions are almost never completely conserved, RNAse (Delwart, 2007). These techniques take advantage of the dif-
primers generally include some degeneracy that permits binding ferential protection afforded to the virus genome by nucleocapsids
to all or the most common known variants on the conserved se- and capsids. However, as viral nucleic acid not protected by such
quence (Rose et al., 1998). The overall aim is to achieve a balance capsids is removed by the purification process and not amplified,
between covering all possible viral variants within a family (i.e. some potential assay sensitivity is lost. Furthermore, the
primers with high degeneracy) and creating an unwieldy number random nature of the amplification reaction means that great care
of different primers. At high levels of degeneracy, only a small pro- must be taken to maintain PCR integrity and prevent cross-
portion of primers are able to prime DNA synthesis, whereas a contamination.
large proportion of the remaining primers will be able to anneal The original SISPA method has now been modified to include
but, because of sequence mismatches, will be refractory to PCR steps to detect both RNA and DNA viruses, to enrich for virus,
extension. The maximum level of degeneracy is usually fixed at and to remove host genomic and contaminating nucleic acid
approximately 256, and degeneracy can be reduced by using codon (Allander et al., 2001; Djikeng et al., 2008). Novel human and ani-
usage tables (Wada et al., 1992) and inter-codon dinucleotide fre- mal viruses detected in clinical samples using these modified
quencies (Smith et al., 1983). methods include parvoviruses (Allander et al., 2001; Jones et al.,
Degenerate primers are used to detect viruses, including novel 2005), a coronavirus (van der Hoek et al., 2004), an adenovirus
viruses, from existing sufficiently homologous virus families. (Jones et al., 2007a), an orthoreovirus (Victoria et al., 2008), a
Such primers have been used in the identification of pig endoge- picornavirus (Jones et al., 2007b), and a porcine pestivirus
nous retrovirus (PERV) (Patience et al., 1997), numerous macaque (Kirkland et al., 2007).

Please cite this article in press as: Bexfield, N., Kellam, P. Metagenomics and the molecular identification of novel viruses. The Veterinary Journal (2010),
doi:10.1016/j.tvjl.2010.10.014
4 N. Bexfield, P. Kellam / The Veterinary Journal xxx (2010) xxx–xxx

Degenerate oligonucleotide primed PCR (DOP-PCR) The polymerase enzyme has a strong strand-displacing capabil-
ity, high processivity (approximately 70 000 bases/binding event),
This sequence-independent amplification technique, termed and proof-reading activity (Esteban et al., 1993). When the poly-
degenerate oligonucleotide primed PCR (DOP-PCR), was initially merase enzyme comes ‘full circle’ on a circular viral genome it dis-
developed for genome mapping studies (Telenius et al., 1992), places its 50 end and continues to extend the new strand multiple
but has more recently been modified to detect viral genomic mate- times around the DNA circle. Random primers can then anneal to
rial (Nanda et al., 2008). DOP-PCR uses primers with a short (4–6 the displaced strand and convert it to double-stranded DNA (Dean
nucleotide) 30 -anchor sequence which typically occur every 256 et al., 2001). By using multiply-primed RCA, unknown circular DNA
and 4096 bp, respectively, preceded by a non-specific degenerate templates can be exponentially amplified. The long, double-
sequence of 6–8 nucleotides for random priming. Immediately up- stranded DNA products can then be cut with a restriction enzyme
stream of the non-specific degenerate sequence, each primer also to release linear fragments, sequenced and identified, the length of
contains a defined 50 -sequence of 10 nucleotides. Because of the the circle.
degenerate sequence, each reaction includes a mixture of several Although technically more demanding than other methods of
thousand different primers. sequence-independent amplification, the RCA approach has facili-
At low stringency during the first few DOP-PCR amplification tated the identification of a novel variant of bovine papillomavirus
cycles, at least 12 consecutive nucleotides from the 30 end of the type-1 (Rector et al., 2004b) and of novel papillomaviruses in a
primer anneal to DNA sequences on the PCR template. In subse- Florida manatee (Rector et al., 2004a). This method has also yielded
quent cycles at higher stringency, these initial PCR products are the full genomic sequences of polyomaviruses (Johne et al., 2006b),
amplified further using the same primer population. DOP-PCR, an anellovirus (Niel et al., 2005), circoviruses (Johne et al., 2006a)
when followed by sequencing of the product, has the advantage and wasp polydnavirus (Espagne et al., 2004). Through the use of
of facilitating the detection of both RNA and DNA viruses without a combination of RCA and SISPA, nine anelloviruses found in hu-
a priori knowledge of the infectious agent (Nanda et al., 2008). man plasma and cat saliva have been detected and characterised
(Biagini et al., 2007).
Random PCR
Sequencing methods
This further, alternative sequence-independent amplification
technique is known as ‘random’ PCR (Froussard, 1992). The method
Most hybridisation and PCR methods generate products that re-
is commonly used to amplify and label probes with fluorescent dyes
quire definitive identification by sequencing. One method of
for microarray analysis but has also been used in the identification
achieving this is the commonly used ‘chain termination method’,
of novel viruses. Unlike SISPA, random PCR has no requirement for
often referred to as ‘Sanger’ or ‘dideoxy sequencing’. This method
an adaptor ligation step and compared with ‘conventional’ PCR,
is based on the DNA polymerase-dependent synthesis of a comple-
which utilises a pair of complementary ‘forward’ and ‘reverse’
mentary DNA strand in the presence of natural 20 -doexynucleo-
primers to amplify DNA in both directions, random PCR utilises
tides (dNTPs) and 20 ,30 -didoexynucleotides (ddNTPs) that serve as
two different primers and two separate PCR reactions. The single
non-reversible synthesis terminators (Sanger et al., 1977). A limita-
primer used in the first PCR reaction has a defined sequence at its
tion of this technique in terms of virus identification can be the
50 end, followed by a degenerate hexamer or heptamer sequence
requirement to clone viral sequences into bacteria prior to
at the 30 end. A second PCR reaction is then performed with a
sequencing, although direct sequencing of PCR products can also
specific primer complementary to the 50 defined region of the first
be employed. When cloning is performed using this method,
primer thus enabling amplification of products formed in the first
host-related bias can occur (Hall, 2007), and, as only a relatively
reaction.
limited number of clones can be sequenced, methods to enrich
Random PCR has been used extensively for the detection of both
for virus prior to amplification are required.
DNA and RNA viruses and is currently the molecular method most
Recently, use of the Sanger method has been partially suc-
commonly used to identify unknown viruses. Viruses infecting ani-
ceeded by ‘next generation sequencing’ technologies that circum-
mals identified using this technique include a dicistrovirus associ-
vent the need for cloning by using highly efficient in vitro DNA
ated with ‘honey-bee colony collapse disorder’ (Cox-Foster et al.,
amplification (Morozova and Marra, 2008). Next generation
2007), a seal picornavirus (Kapoor et al., 2008), and circular DNA
sequencing technology includes the 454 pyrosequencing-based
viruses in the faeces of wild-living chimpanzees (Blinkova et al.,
instrument (Roche Applied Sciences), genome analysers (Illumina)
2010). Random PCR has also proved successful in detecting novel
and the SOLiD system (Applied Biosystems). This approach dra-
viruses infecting humans including a parvovirus (Allander et al.,
matically increases cost-effective sequence throughput, albeit at
2005), a coronavirus (Fouchier et al., 2004), and a polyomavirus
the expense of sequence read-length.
in patients with respiratory tract disease (Allander et al., 2007), a
Compared to read-lengths in the region of up to 900 bp pro-
parechovirus (Li et al., 2009c), a picornavirus (Li et al., 2009b),
duced by modern automated Sanger instruments, read-lengths of
and a bocavirus in patients with diarrhoea (Kapoor et al., 2009),
approximately 76–106 bp are generated by Illumina and of 250–
a human gammapapillomavirus in an patient with encephalitis
400 bp by 454 technology. The comparatively short read-length
(Li et al., 2009a), and several viruses in children with acute flaccid
of next generation sequencing technologies is however compen-
paralysis (Blinkova et al., 2009; Victoria et al., 2009).
sated for by the large number of ‘reads’ generated. Typically 100
kilobases of sequence data are produced from a modern Sanger
Rolling circle amplification (RCA)
instrument with 454 sequencing capable of generating up to 400
megabases of data, and Illumina sequencing technology can pro-
A ‘rolling circle’ sequence-independent amplification technique
duce up to 20 gigabases of sequence data/run (Metzker, 2010).
makes use of the property of circular DNA molecules such as plas-
mids or viral genomes replicating through a rolling circle mecha-
nism. RCA mimics this natural process without requiring prior Bioinformatics
knowledge of the viral sequence, utilising random hexamer primers
that bind at multiple locations on a circular DNA template, and a Several different approaches have been used to analyse data
polymerase enzyme, such as bacteriophage F29 DNA polymerase. produced by sequencing methods. To date, the majority of novel

Please cite this article in press as: Bexfield, N., Kellam, P. Metagenomics and the molecular identification of novel viruses. The Veterinary Journal (2010),
doi:10.1016/j.tvjl.2010.10.014
N. Bexfield, P. Kellam / The Veterinary Journal xxx (2010) xxx–xxx 5

viruses have been discovered using Basic Local Alignment Search


Tool (BLAST) programmes that compare detected nucleotide se-
quences to those in a database, and rely on the fact that novel
viruses have some homology to known viruses. Detecting distant
viral relatives or completely new viruses can however be problem-
atic. For instance, a proportion of sequences (5–30%) derived from
animal samples by sequence-independent amplification methods,
and an even greater fraction of sequences derived from environ-
mental samples, do not have nucleotide or amino acid sequences
similar to those of viruses listed in existing databases (Delwart,
2007). However, using these methods, viruses have been identified
that are distantly related to known viruses.
Several approaches can be used to increase the likelihood of
identifying virus, including ‘querying’ translated DNA sequences
against a translated DNA database, as evolutionary relationships
remain detectable for longer at the amino acid than at the nucleo-
tide level. The computational generation of theoretical ancestral
sequences, and their subsequent use in sequence similarity
searches, may also improve identification of highly divergent viral
sequences (Delwart, 2007). Computational biologists have also
Fig. 3. Graphic representation of the probability of detecting viral sequences based
developed new ingenious algorithms and techniques to analyse
on the viral genome-transcript sequence frequency and the number of sequence
data produced by next generation sequencing to aid the identifica- ‘reads’ generated (lines with symbols).
tion of novel viruses (Wooley et al., 2010).
Before viruses are identified, the hybridisation and PCR meth-
ods previously described generally require both an initial step, to
enrich for virus, and an amplification step (Fig. 2A). Enrichment quences will remain and can be further studied. With the comple-
can result in loss of viral nucleic acid thus reducing test sensitivity, tion of the sequencing of several animal genomes, transcriptome
and amplification can generate bias towards a dominant (poten- subtraction techniques are applicable to a variety of other species,
tially host-derived) sequence. A method known as transcriptome and the possibility exists to use both public databases and subtrac-
subtraction has been developed for viral discovery (Weber et al., tion against un-infected control material.
2002) with the advantage that it can be performed without the A transcriptome subtraction method has been used to identify a
need for enrichment or amplification (Fig. 2B). Transcriptome sub- previously unknown polyomavirus in human Merkel cell carci-
traction is based on the principal that genes are transcribed (ex- noma (Feng et al., 2008) and to identify an uncharacterised arena-
pressed) to produce mRNA, which can be converted in vitro to virus associated with three transplant-related deaths (Palacios
single-stranded DNA product complementary DNA (cDNA). The et al., 2008). This technique has the advantage of being able to
sequencing of this cDNA, rather than genomic DNA, therefore al- identify very small amounts of virus, as in the case of the polyoma-
lows the transcribed portion of the genome to be analysed. In view virus detailed above, only 10 viral transcripts/cell were present. Gi-
of the large number of transcripts present, sequencing is usually ven that each cell contains approximately 1 million host
performed using next generation technologies. transcripts, only a small proportion of the cellular RNA is virus-de-
The technique works on the assumption that a sample infected rived. Providing every cell is infected, even at very low levels, 10
with a virus would contain host and viral transcripts. Host tran- million sequence ‘reads’ gives a >99.99% probability of detecting
script sequences are aligned and subtracted from public databases; at least one viral sequence (Fig. 3). Such a large number of reads
in the case of a human sample, these include reference sequences is readily obtainable using next generation technology such as
such as the human RefSeq RNA, mitochondrial or assembled chro- the Illumina platform. However the technique does have limita-
mosome sequences in the National Centre for Biotechnology Infor- tions in that if only 1/10 cells is infected, or a sequencing method-
mation (NCBI) databases. After aligning and subtracting human ology is used which produces only 50 000 sequence reads, the
sequences against databases, non-matched virus-enriched se- probabilities of detecting viral sequence decrease to approximately
60% and 5%, respectively.

Identification of viral sequences and proof of causation

While many newly identified viruses infecting animals and hu-


mans were initially found in patients with particular clinical signs
or symptoms, most have not been causally associated with partic-
ular diseases. The detection of viruses in such contexts may merely
reflect the presence of a virus in a sample or the ability of a virus to
replicate within a particular disease environment, rather than the
virus directly causing the disease. For example, although several
Fig. 2. Sequence of events in the molecular detection of viruses: (A) Samples infectious agents have been found in samples from human patients
processed by hybridisation or PCR require steps to enrich for virus before amplified with multiple sclerosis (Johnson et al., 1984; Challoner et al., 1995;
products are sequenced and identified. Enrichment may result in decreased assay Perron et al., 1997; Thacker et al., 2006), causal roles in pathogen-
sensitivity, and amplification can generate bias towards a dominant sequence; (B) esis have never been attributed (Munz et al., 2009). Similarly, her-
Transcriptome subtraction methods can be performed without enrichment or
amplification with direct sequencing of nucleic acids extracted from a sample of
pes simplex virus type-2 (HSV-2) was strongly implicated as the
interest. Subsequent subtraction of resulting sequences from databases facilitates cause of cervical cancer in humans for many years until human
virus identification. papilloma virus DNA was identified in biopsies (Durst et al., 1983).

Please cite this article in press as: Bexfield, N., Kellam, P. Metagenomics and the molecular identification of novel viruses. The Veterinary Journal (2010),
doi:10.1016/j.tvjl.2010.10.014
6 N. Bexfield, P. Kellam / The Veterinary Journal xxx (2010) xxx–xxx

Henle–Koch postulates are a well known set of criteria that Blinkova, O., Kapoor, A., Victoria, J., Jones, M., Wolfe, N., Naeem, A., Shaukat, S.,
Sharif, S., Alam, M.M., Angez, M., Zaidi, S., Delwart, E.L., 2009. Cardioviruses are
must be fulfilled by a microorganism for it to be proven as the
genetically diverse and cause common enteric infections in South Asian
cause of disease. The ability to culture viruses in vitro and the children. Journal of Virology 83, 4631–4641.
detection of antibodies against viruses led to new proposals for Blinkova, O., Victoria, J., Li, Y., Keele, B.F., Sanz, C., Ndjango, J.B., Peeters, M., Travis,
the demonstration of causality (Rivers, 1937). Advances in technol- D., Lonsdorf, E.V., Wilson, M.L., Pusey, A.E., Hahn, B.H., Delwart, E.L., 2010. Novel
circular DNA viruses in stool samples of wild-living chimpanzees. Journal of
ogy have resulted in new challenges to the assigning of causation General Virology 91, 74–86.
and sequence-based approaches to virus identification have led Breitbart, M., Salamon, P., Andresen, B., Mahaffy, J.M., Segall, A.M., Mead, D., Azam,
to the formulation of guidelines defining the relationship between F., Rohwer, F., 2002. Genomic analysis of uncultured marine viral communities.
Proceedings of the National Academy of Sciences (USA) 99, 14250–14255.
the presence of viral sequences and disease (Fredericks and Breitbart, M., Hewson, I., Felts, B., Mahaffy, J.M., Nulton, J., Salamon, P., Rohwer, F.,
Relman, 1996). Such guidelines have been used to link hepatitis 2003. Metagenomic analyses of an uncultured viral community from human
C virus (HCV) with non-A, non-B hepatitis (Kuo et al., 1989), and feces. Journal of Bacteriology 185, 6220–6223.
Breitbart, M., Felts, B., Kelley, S., Mahaffy, J.M., Nulton, J., Salamon, P., Rohwer, F.,
human herpesvirus-8 with Kaposi’s sarcoma (Moore and Chang, 2004. Diversity and population structure of a near-shore marine-sediment
1995; Noel et al., 1996), but are often ignored in the race to assign viral community. Proceedings. Biological sciences/The Royal Society 271, 565–
significance to virus discovery. 574.
Breitbart, M., Hoare, A., Nitti, A., Siefert, J., Haynes, M., Dinsdale, E., Edwards, R.,
In infectious disease research a balance must be struck between Souza, V., Rohwer, F., Hollander, D., 2009. Metagenomic and stable isotopic
the prompt identification of highly significant new human patho- analyses of modern freshwater microbialites in Cuatro Cienegas, Mexico.
gens such as pandemic swine H1N1 influenza (Dawood et al., Environmental Microbiology 11, 16–34.
Cann, A.J., Fandrich, S.E., Heaphy, S., 2005. Analysis of the virus population present
2009), and clearly defining the more tenuous connection between
in equine faeces indicates the presence of hundreds of uncharacterized virus
xenotropic murine leukaemia virus-related virus (XMRV) and genomes. Virus Genes 30, 151–156.
chronic fatigue syndrome (Lombardi et al., 2009). Epidemiological, Challoner, P.B., Smith, K.T., Parker, J.D., MacLeod, D.L., Coulter, S.N., Rose, T.M.,
immunological and sequence-based criteria should support any Schultz, E.R., Bennett, J.L., Garber, R.L., Chang, M., et al., 1995. Plaque-associated
expression of human herpesvirus 6 in multiple sclerosis. Proceedings of the
proposed link between an infectious organism and the disease un- National Academy of Sciences (USA) 92, 7440–7444.
der study. Establishing causality must also involve an appreciation Chang, Y., Cesarman, E., Pessin, M.S., Lee, F., Culpepper, J., Knowles, D.M., Moore, P.S.,
of the full range of genetic diversity of the viral species, as it is well 1994. Identification of herpesvirus-like DNA sequences in AIDS-associated
Kaposi’s sarcoma. Science 266, 1865–1869.
established that distinct viral genotypes or even minor genetic Chiu, C.Y., Greninger, A.L., Kanada, K., Kwok, T., Fischer, K.F., Runckel, C., Louie, J.K.,
variations can result in large changes in viral pathogenicity. Glaser, C.A., Yagi, S., Schnurr, D.P., Haggerty, T.D., Parsonnet, J., Ganem, D.,
DeRisi, J.L., 2008. Identification of cardioviruses related to Theiler’s murine
encephalomyelitis virus in human infections. Proceedings of the National
Conclusions Academy of Sciences (USA) 105, 14124–14129.
Cleaveland, S., Laurenson, M.K., Taylor, L.H., 2001. Diseases of humans and their
domestic mammals: pathogen characteristics, host range and the risk of
Viral identification is an ever-evolving discipline where new emergence. Philosophical Transactions of the Royal Society of London. Series B,
technologies are likely to have significant impact over the coming Biological Sciences 356, 991–999.
decades. The further development of hybridisation and PCR-based Clewley, J.P., 2004. A role for arrays in clinical virology: fact or fiction? Journal of
Clinical Virology 29, 2–12.
methods, the increased availability of next generation sequencing, Coetzee, B., Freeborough, M.J., Maree, H.J., Celton, J.M., Rees, D.J., Burger, J.T., 2010.
improvements in transcriptome subtraction methods, continued Deep sequencing analysis of viruses infecting grapevines: virome of a vineyard.
expansion of viral and animal genome databases, and improved Virology 400, 157–163.
Corne, J.M., Green, S., Sanderson, G., Caul, E.O., Johnston, S.L., 1999. A multiplex RT-
bioinformatic tools will all facilitate the acceleration of this identi- PCR for the detection of parainfluenza viruses 1–3 in clinical samples. Journal of
fication process. Virological Methods 82, 9–18.
Cox-Foster, D.L., Conlan, S., Holmes, E.C., Palacios, G., Evans, J.D., Moran, N.A., Quan,
P.L., Briese, T., Hornig, M., Geiser, D.M., Martinson, V., van Engelsdorp, D.,
Conflict of interest statement Kalkstein, A.L., Drysdale, A., Hui, J., Zhai, J., Cui, L., Hutchison, S.K., Simons, J.F.,
Egholm, M., Pettis, J.S., Lipkin, W.I., 2007. A metagenomic survey of microbes in
honey bee colony collapse disorder. Science 318, 283–287.
Neither of the authors of this paper has a financial or personal Dalton-Griffin, L., Kellam, P., 2009. Infectious causes of cancer and their detection.
relationship with other people or organisations that could inappro- Journal of Biology 8, 67.
priately influence or bias the content of the paper. Dawood, F.S., Jain, S., Finelli, L., Shaw, M.W., Lindstrom, S., Garten, R.J., Gubareva,
L.V., Xu, X., Bridges, C.B., Uyeki, T.M., 2009. Emergence of a novel swine-origin
influenza A (H1N1) virus in humans. The New England Journal of Medicine 360,
References 2605–2615.
Dean, F.B., Nelson, J.R., Giesler, T.L., Lasken, R.S., 2001. Rapid amplification of
plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed
Allander, T., Emerson, S.U., Engle, R.E., Purcell, R.H., Bukh, J., 2001. A virus discovery
rolling circle amplification. Genome Research 11, 1095–1099.
method incorporating DNase treatment and its application to the identification
Delwart, E.L., 2007. Viral metagenomics. Reviews in Medical Virology 17, 115–131.
of two bovine parvovirus species. Proceedings of the National Academy of
Djikeng, A., Halpin, R., Kuzmickas, R., Depasse, J., Feldblyum, J., Sengamalay, N.,
Sciences (USA) 98, 11609–11614.
Afonso, C., Zhang, X., Anderson, N.G., Ghedin, E., Spiro, D.J., 2008. Viral genome
Allander, T., Tammi, M.T., Eriksson, M., Bjerkner, A., Tiveljung-Lindell, A., Andersson,
sequencing by random priming methods. BMC Genomics 9, 5.
B., 2005. Cloning of a human parvovirus by molecular screening of respiratory
Djikeng, A., Kuzmickas, R., Anderson, N.G., Spiro, D.J., 2009. Metagenomic analysis of
tract samples. Proceedings of the National Academy of Sciences (USA) 102,
RNA viruses in a fresh water lake. PLoS One 4, e7264.
12891–12896.
Durst, M., Gissmann, L., Ikenberg, H., zur Hausen, H., 1983. A papillomavirus DNA
Allander, T., Andreasson, K., Gupta, S., Bjerkner, A., Bogdanovic, G., Persson, M.A.,
from a cervical carcinoma and its prevalence in cancer biopsy samples from
Dalianis, T., Ramqvist, T., Andersson, B., 2007. Identification of a third human
different geographic regions. Proceedings of the National Academy of Sciences
polyomavirus. Journal of Virology 81, 4130–4136.
(USA) 80, 3812–3815.
Amann, R.I., Ludwig, W., Schleifer, K.H., 1995. Phylogenetic identification and in situ
Espagne, E., Dupuy, C., Huguet, E., Cattolico, L., Provost, B., Martins, N., Poirie, M.,
detection of individual microbial cells without cultivation. Microbiological
Periquet, G., Drezen, J.M., 2004. Genome sequence of a polydnavirus: insights
Reviews 59, 143–169.
into symbiotic virus evolution. Science 306, 286–289.
Ambrose, H.E., Clewley, J.P., 2006. Virus discovery by sequence-independent
Esteban, J.A., Salas, M., Blanco, L., 1993. Fidelity of phi 29 DNA polymerase.
genome amplification. Reviews in Medical Virology 16, 365–383.
Comparison between protein-primed initiation and DNA polymerization. The
Angly, F.E., Felts, B., Breitbart, M., Salamon, P., Edwards, R.A., Carlson, C., Chan, A.M.,
Journal of Biological Chemistry 268, 2719–2726.
Haynes, M., Kelley, S., Liu, H., Mahaffy, J.M., Mueller, J.E., Nulton, J., Olson, R.,
Feng, H., Shuda, M., Chang, Y., Moore, P.S., 2008. Clonal integration of a
Parsons, R., Rayhawk, S., Suttle, C.A., Rohwer, F., 2006. The marine viromes of
polyomavirus in human Merkel cell carcinoma. Science 319, 1096–1100.
four oceanic regions. PLoS Biology 4, e368.
Fierer, N., Breitbart, M., Nulton, J., Salamon, P., Lozupone, C., Jones, R., Robeson, M.,
Biagini, P., Uch, R., Belhouchet, M., Attoui, H., Cantaloube, J.F., Brisbarre, N., de Micco,
Edwards, R.A., Felts, B., Rayhawk, S., Knight, R., Rohwer, F., Jackson, R.B., 2007.
P., 2007. Circular genomes related to anelloviruses identified in human and
Metagenomic and small-subunit rRNA analyses reveal the genetic diversity of
animal samples by using a combined rolling-circle amplification/sequence-
bacteria, archaea, fungi, and viruses in soil. Applied and Environmental
independent single primer amplification approach. Journal of General Virology
Microbiology 73, 7059–7066.
88, 2696–2701.

Please cite this article in press as: Bexfield, N., Kellam, P. Metagenomics and the molecular identification of novel viruses. The Veterinary Journal (2010),
doi:10.1016/j.tvjl.2010.10.014
N. Bexfield, P. Kellam / The Veterinary Journal xxx (2010) xxx–xxx 7

Finkbeiner, S.R., Allred, A.F., Tarr, P.I., Klein, E.J., Kirkwood, C.D., Wang, D., 2008. Li, L., Barry, P., Yeh, E., Glaser, C., Schnurr, D., Delwart, E., 2009a. Identification of a
Metagenomic analysis of human diarrhea: viral detection and discovery. PLoS novel human gammapapillomavirus species. Journal of General Virology 90,
Pathogens 4, e1000011. 2413–2417.
Foldes-Papp, Z., Egerer, R., Birch-Hirschfeld, E., Striebel, H.M., Demel, U., Tilz, G.P., Li, L., Victoria, J., Kapoor, A., Blinkova, O., Wang, C., Babrzadeh, F., Mason, C.J.,
Wutzler, P., 2004. Detection of multiple human herpes viruses by DNA Pandey, P., Triki, H., Bahri, O., Oderinde, B.S., Baba, M.M., Bukbuk, D.N., Besser,
microarray technology. Molecular Diagnosis 8, 1–9. J.M., Bartkus, J.M., Delwart, E.L., 2009b. A novel picornavirus associated with
Fouchier, R.A., Hartwig, N.G., Bestebroer, T.M., Niemeyer, B., de Jong, J.C., Simon, J.H., gastroenteritis. Journal of Virology 83, 12002–12006.
Osterhaus, A.D., 2004. A previously undescribed coronavirus associated with Li, L., Victoria, J., Kapoor, A., Naeem, A., Shaukat, S., Sharif, S., Alam, M.M., Angez, M.,
respiratory disease in humans. Proceedings of the National Academy of Sciences Zaidi, S.Z., Delwart, E., 2009c. Genomic characterization of novel human
(USA) 101, 6212–6216. parechovirus type. Emerging Infectious Diseases 15, 288–291.
Fredericks, D.N., Relman, D.A., 1996. Sequence-based identification of microbial Li, L., Victoria, J.G., Wang, C., Jones, M., Fellers, G.M., Kunz, T.H., Delwart, E., 2010. Bat
pathogens: a reconsideration of Koch’s postulates. Clinical Microbiology guano virome: predominance of dietary viruses from insects and plants plus
Reviews 9, 18–33. novel mammalian viruses. Journal of Virology 84, 6955–6965.
Froussard, P., 1992. A random-PCR method (rPCR) to construct whole cDNA library Lisitsyn, N., Lisitsyn, N., Wigler, M., 1993. Cloning the differences between two
from low amounts of RNA. Nucleic Acids Research 20, 2900. complex genomes. Science 259, 946–951.
Hall, N., 2007. Advanced sequencing technologies and their wider impact in Lombardi, V.C., Ruscetti, F.W., Das Gupta, J., Pfost, M.A., Hagen, K.S., Peterson, D.L.,
microbiology. The Journal of Experimental Biology 210, 1518–1525. Ruscetti, S.K., Bagni, R.K., Petrow-Sadowski, C., Gold, B., Dean, M., Silverman,
Itoh, Y., Shinya, K., Kiso, M., Watanabe, T., Sakoda, Y., Hatta, M., Muramoto, Y., R.H., Mikovits, J.A., 2009. Detection of an infectious retrovirus, XMRV, in blood
Tamura, D., Sakai-Tagawa, Y., Noda, T., Sakabe, S., Imai, M., Hatta, Y., Watanabe, cells of patients with chronic fatigue syndrome. Science 326, 585–589.
S., Li, C., Yamada, S., Fujii, K., Murakami, S., Imai, H., Kakugawa, S., Ito, M., Matsui, S.M., Kim, J.P., Greenberg, H.B., Su, W., Sun, Q., Johnson, P.C., DuPont, H.L.,
Takano, R., Iwatsuki-Horimoto, K., Shimojima, M., Horimoto, T., Goto, H., Oshiro, L.S., Reyes, G.R., 1991. The isolation and characterization of a Norwalk
Takahashi, K., Makino, A., Ishigaki, H., Nakayama, M., Okamatsu, M., Warshauer, virus-specific cDNA. The Journal of Clinical Investigation 87, 1456–1461.
D., Shult, P.A., Saito, R., Suzuki, H., Furuta, Y., Yamashita, M., Mitamura, K., Matsui, S.M., Kim, J.P., Greenberg, H.B., Young, L.M., Smith, L.S., Lewis, T.L.,
Nakano, K., Nakamura, M., Brockman-Schneider, R., Mitamura, H., Yamazaki, M., Herrmann, J.E., Blacklow, N.R., Dupuis, K., Reyes, G.R., 1993. Cloning and
Sugaya, N., Suresh, M., Ozawa, M., Neumann, G., Gern, J., Kida, H., Ogasawara, K., characterization of human astrovirus immunoreactive epitopes. Journal of
Kawaoka, Y., 2009. In vitro and in vivo characterization of new swine-origin Virology 67, 1712–1715.
H1N1 influenza viruses. Nature 460, 1021–1025. Metzker, M.L., 2010. Sequencing technologies – the next generation. Nature
Jin, L., Lohr, C.V., Vanarsdall, A.L., Baker, R.J., Moerdyk-Schauwecker, M., Levine, C., Reviews Genetics 11, 31–46.
Gerlach, R.F., Cohen, S.A., Alvarado, D.E., Rohrmann, G.F., 2008. Characterization Mihindukulasuriya, K.A., Wu, G., St Leger, J., Nordhausen, R.W., Wang, D., 2008.
of a novel alphaherpesvirus associated with fatal infections of domestic rabbits. Identification of a novel coronavirus from a beluga whale by using a panviral
Virology 378, 13–20. microarray. Journal of Virology 82, 5084–5088.
Johne, R., Enderlein, D., Nieper, H., Muller, H., 2005. Novel polyomavirus detected in Moore, P.S., Chang, Y., 1995. Detection of herpesvirus-like DNA sequences in
the feces of a chimpanzee by nested broad-spectrum PCR. Journal of Virology Kaposi’s sarcoma in patients with and without HIV infection. The New England
79, 3883–3887. Journal of Medicine 332, 1181–1185.
Johne, R., Fernandez-de-Luco, D., Hofle, U., Muller, H., 2006a. Genome of a novel Morozova, O., Marra, M.A., 2008. Applications of next-generation sequencing
circovirus of starlings, amplified by multiply primed rolling-circle amplification. technologies in functional genomics. Genomics 92, 255–264.
Journal of General Virology 87, 1189–1195. Muerhoff, A.S., Leary, T.P., Desai, S.M., Mushahwar, I.K., 1997. Amplification and
Johne, R., Wittig, W., Fernandez-de-Luco, D., Hofle, U., Muller, H., 2006b. subtraction methods and their application to the discovery of novel human
Characterization of two novel polyomaviruses of birds by using multiply viruses. Journal of Medical Virology 53, 96–103.
primed rolling-circle amplification of their genomes. Journal of Virology 80, Munz, C., Lunemann, J.D., Getts, M.T., Miller, S.D., 2009. Antiviral immune
3523–3531. responses: triggers of or triggered by autoimmunity? Nature Reviews
Johnson, R.T., Griffin, D.E., Hirsch, R.L., Wolinsky, J.S., Roedenbeck, S., Lindo de Immunology 9, 246–258.
Soriano, I., Vaisberg, A., 1984. Measles encephalomyelitis – clinical and Nanda, S., Jayan, G., Voulgaropoulou, F., Sierra-Honigmann, A.M., Uhlenhaut, C.,
immunologic studies. The New England Journal of Medicine 310, 137–141. McWatters, B.J., Patel, A., Krause, P.R., 2008. Universal virus detection by
Jones, M.S., Kapoor, A., Lukashov, V.V., Simmonds, P., Hecht, F., Delwart, E., 2005. degenerate-oligonucleotide primed polymerase chain reaction of purified viral
New DNA viruses identified in patients with acute viral infection syndrome. nucleic acids. Journal of Virological Methods 152, 18–24.
Journal of Virology 79, 8230–8236. Nelson, K.E., Weinstock, G.M., Highlander, S.K., Worley, K.C., Creasy, H.H., Wortman,
Jones 2nd, M.S., Harrach, B., Ganac, R.D., Gozum, M.M., Dela Cruz, W.P., Riedel, B., J.R., Rusch, D.B., Mitreva, M., Sodergren, E., Chinwalla, A.T., Feldgarden, M.,
Pan, C., Delwart, E.L., Schnurr, D.P., 2007a. New adenovirus species found in a Gevers, D., Haas, B.J., Madupu, R., Ward, D.V., Birren, B.W., Gibbs, R.A., Methe, B.,
patient presenting with gastroenteritis. Journal of Virology 81, 5978–5984. Petrosino, J.F., Strausberg, R.L., Sutton, G.G., White, O.R., Wilson, R.K., Durkin, S.,
Jones, M.S., Lukashov, V.V., Ganac, R.D., Schnurr, D.P., 2007b. Discovery of a Giglio, M.G., Gujja, S., Howarth, C., Kodira, C.D., Kyrpides, N., Mehta, T., Muzny,
novel human picornavirus in a stool sample from a pediatric patient D.M., Pearson, M., Pepin, K., Pati, A., Qin, X., Yandava, C., Zeng, Q., Zhang, L.,
presenting with fever of unknown origin. Journal of Clinical Microbiology 45, Berlin, A.M., Chen, L., Hepburn, T.A., Johnson, J., McCorrison, J., Miller, J., Minx, P.,
2144–2150. Nusbaum, C., Russ, C., Sykes, S.M., Tomlinson, C.M., Young, S., Warren, W.C.,
Kapoor, A., Victoria, J., Simmonds, P., Wang, C., Shafer, R.W., Nims, R., Nielsen, O., Badger, J., Crabtree, J., Markowitz, V.M., Orvis, J., Cree, A., Ferriera, S., Fulton, L.L.,
Delwart, E., 2008. A highly divergent picornavirus in a marine mammal. Journal Fulton, R.S., Gillis, M., Hemphill, L.D., Joshi, V., Kovar, C., Torralba, M.,
of Virology 82, 311–320. Wetterstrand, K.A., Abouellleil, A., Wollam, A.M., Buhay, C.J., Ding, Y., Dugan,
Kapoor, A., Slikas, E., Simmonds, P., Chieochansin, T., Naeem, A., Shaukat, S., Alam, S., FitzGerald, M.G., Holder, M., Hostetler, J., Clifton, S.W., Allen-Vercoe, E., Earl,
M.M., Sharif, S., Angez, M., Zaidi, S., Delwart, E., 2009. A newly identified A.M., Farmer, C.N., Liolios, K., Surette, M.G., Xu, Q., Pohl, C., Wilczek-Boney, K.,
bocavirus species in human stool. Journal of Infectious Diseases 199, 196–200. Zhu, D., 2010. A catalog of reference genomes from the human microbiome.
Kirkland, P.D., Frost, M., Finlaison, D.S., King, K.R., Ridpath, J.F., Gu, X., 2007. Science 328, 994–999.
Identification of a novel virus in pigs-Bungowannah virus: a possible new Nichol, S.T., Spiropoulou, C.F., Morzunov, S., Rollin, P.E., Ksiazek, T.G., Feldmann, H.,
species of pestivirus. Virus Research 129, 26–34. Sanchez, A., Childs, J., Zaki, S., Peters, C.J., 1993. Genetic identification of a
Kistler, A., Avila, P.C., Rouskin, S., Wang, D., Ward, T., Yagi, S., Schnurr, D., Ganem, D., hantavirus associated with an outbreak of acute respiratory illness. Science 262,
DeRisi, J.L., Boushey, H.A., 2007. Pan-viral screening of respiratory tract 914–917.
infections in adults with and without asthma reveals unexpected human Niel, C., Diniz-Mendes, L., Devalle, S., 2005. Rolling-circle amplification of Torque
coronavirus and human rhinovirus diversity. The Journal of Infectious Diseases teno virus (TTV) complete genomes from human and swine sera and
196, 817–825. identification of a novel swine TTV genogroup. Journal of General Virology 86,
Kistler, A.L., Gancz, A., Clubb, S., Skewes-Cox, P., Fischer, K., Sorber, K., Chiu, C.Y., 1343–1347.
Lublin, A., Mechani, S., Farnoushi, Y., Greninger, A., Wen, C.C., Karlene, S.B., Nishizawa, T., Okamoto, H., Konishi, K., Yoshizawa, H., Miyakawa, Y., Mayumi, M.,
Ganem, D., DeRisi, J.L., 2008. Recovery of divergent avian bornaviruses from 1997. A novel DNA virus (TTV) associated with elevated transaminase levels in
cases of proventricular dilatation disease: identification of a candidate etiologic posttransfusion hepatitis of unknown etiology. Biochemical and Biophysical
agent. Virology Journal 5, 88. Research Communications 241, 92–97.
Koopmans, M., Wilbrink, B., Conyn, M., Natrop, G., van der Nat, H., Vennema, H., Noel, J.C., Hermans, P., Andre, J., Fayt, I., Simonart, T., Verhest, A., Haot, J., Burny, A.,
Meijer, A., van Steenbergen, J., Fouchier, R., Osterhaus, A., Bosman, A., 2004. 1996. Herpesvirus-like DNA sequences and Kaposi’s sarcoma: relationship with
Transmission of H7N7 avian influenza A virus to human beings during a epidemiology, clinical spectrum, and histologic features. Cancer 77, 2132–2136.
large outbreak in commercial poultry farms in the Netherlands. Lancet 363, Nollens, H.H., Rivera, R., Palacios, G., Wellehan, J.F., Saliki, J.T., Caseltine, S.L., Smith,
587–593. C.R., Jensen, E.D., Hui, J., Lipkin, W.I., Yochem, P.K., Wells, R.S., St Leger, J., Venn-
Kuo, G., Choo, Q.L., Alter, H.J., Gitnick, G.L., Redeker, A.G., Purcell, R.H., Miyamura, T., Watson, S., 2009. New recognition of Enterovirus infections in bottlenose
Dienstag, J.L., Alter, M.J., Stevens, C.E., Tegtmeier, G.E., Bonino, F., Colombo, M., dolphins (Tursiops truncatus). Veterinary Microbiology 139, 170–175.
Lee, W.-S., Kuo, C., Berger, K., Shuster, J.R., Overby, L.R., Bradley, D.W., Houghton, Palacios, G., Quan, P.L., Jabado, O.J., Conlan, S., Hirschberg, D.L., Liu, Y., Zhai, J.,
M., 1989. An assay for circulating antibodies to a major etiologic virus of human Renwick, N., Hui, J., Hegyi, H., Grolla, A., Strong, J.E., Towner, J.S., Geisbert, T.W.,
non-A, non-B hepatitis. Science 244, 362–364. Jahrling, P.B., Buchen-Osmond, C., Ellerbrok, H., Sanchez-Seco, M.P., Lussier, Y.,
Lambden, P.R., Cooke, S.J., Caul, E.O., Clarke, I.N., 1992. Cloning of noncultivatable Formenty, P., Nichol, M.S., Feldmann, H., Briese, T., Lipkin, W.I., 2007.
human rotavirus by single primer amplification. Journal of Virology 66, 1817– Panmicrobial oligonucleotide array for diagnosis of infectious diseases.
1822. Emerging Infectious Diseases 13, 73–81.

Please cite this article in press as: Bexfield, N., Kellam, P. Metagenomics and the molecular identification of novel viruses. The Veterinary Journal (2010),
doi:10.1016/j.tvjl.2010.10.014
8 N. Bexfield, P. Kellam / The Veterinary Journal xxx (2010) xxx–xxx

Palacios, G., Druce, J., Du, L., Tran, T., Birch, C., Briese, T., Conlan, S., Quan, P.L., Hui, J., Storch, G.A., 2007. Diagnostic virology. In: Knipe, D.M., Howley, P.M. (Eds.), Fields
Marshall, J., Simons, J.F., Egholm, M., Paddock, C.D., Shieh, W.J., Goldsmith, C.S., Virology, vol. 1. Lippinicott, Williams & Wilkins, pp. 565–604.
Zaki, S.R., Catton, M., Lipkin, W.I., 2008. A new arenavirus in a cluster of fatal Telenius, H., Carter, N.P., Bebb, C.E., Nordenskjold, M., Ponder, B.A.J., Tunnacliffe, A.,
transplant-associated diseases. The New England Journal of Medicine 358, 991– 1992. Degenerate oligonucleotide-primed PCR – general amplification of target
998. DNA by a single degenerate primer. Genomics 13, 718–725.
Patience, C., Takeuchi, Y., Weiss, R.A., 1997. Infection of human cells by an Thacker, E.L., Mirzaei, F., Ascherio, A., 2006. Infectious mononucleosis and risk for
endogenous retrovirus of pigs. Nature Medicine 3, 282–286. multiple sclerosis: a meta-analysis. Annals of Neurology 59, 499–503.
Perron, H., Garson, J.A., Bedin, F., Beseme, F., Paranhos-Baccala, G., Komurian-Pradel, Urisman, A., Fischer, K.F., Chiu, C.Y., Kistler, A.L., Beck, S., Wang, D., DeRisi, J.L., 2005.
F., Mallet, F., Tuke, P.W., Voisset, C., Blond, J.L., Lalande, B., Seigneurin, J.M., E-Predict: a computational strategy for species identification based on observed
Mandrand, B., 1997. Molecular identification of a novel retrovirus repeatedly DNA microarray hybridisation patterns. Genome Biology 6, R78.
isolated from patients with multiple sclerosis. The Collaborative Research Urisman, A., Molinaro, R.J., Fischer, N., Plummer, S.J., Casey, G., Klein, E.A., Malathi,
Group on Multiple Sclerosis. Proceedings of the National Academy of Sciences K., Magi-Galluzzi, C., Tubbs, R.R., Ganem, D., Silverman, R.H., DeRisi, J.L., 2006.
(USA) 94, 7583–7588. Identification of a novel gammaretrovirus in prostate tumors of patients
Petrosino, J.F., Highlander, S., Luna, R.A., Gibbs, R.A., Versalovic, J., 2009. homozygous for R462Q RNASEL variant. PLoS Pathogens 2, e25.
Metagenomic pyrosequencing and microbial identification. Clinical Chemistry van der Hoek, L., Pyrc, K., Jebbink, M.F., Vermeulen-Oost, W., Berkhout, R.J.,
55, 856–866. Wolthers, K.C., Wertheim-van Dillen, P.M., Kaandorp, J., Spaargaren, J., Berkhout,
Rector, A., Bossart, G.D., Ghim, S.J., Sundberg, J.P., Jenson, A.B., Van Ranst, M., 2004a. B., 2004. Identification of a new human coronavirus. Nature Medicine 10, 368–
Characterization of a novel close-to-root papillomavirus from a Florida manatee 373.
by using multiply primed rolling-circle amplification: Trichechus manatus Van Devanter, D.R., Warrener, P., Bennett, L., Schultz, E.R., Coulter, S., Garber, R.L.,
latirostris papillomavirus type 1. Journal of Virology 78, 12698–12702. Rose, T.M., 1996. Detection and analysis of diverse herpesviral species by
Rector, A., Tachezy, R., Van Ranst, M., 2004b. A sequence-independent strategy for consensus primer PCR. Journal of Clinical Microbiology 34, 1666–1671.
detection and cloning of circular DNA virus genomes by using multiply primed Victoria, J.G., Kapoor, A., Dupuis, K., Schnurr, D.P., Delwart, E.L., 2008. Rapid
rolling-circle amplification. Journal of Virology 78, 4993–4998. identification of known and new RNA viruses from animal tissues. PLoS
Relman, D.A., 1999. The search for unrecognized pathogens. Science 284, 1308– Pathogens 4, e1000163.
1310. Victoria, J.G., Kapoor, A., Li, L., Blinkova, O., Slikas, B., Wang, C., Naeem, A., Zaidi, S.,
Reyes, G.R., Kim, J.P., 1991. Sequence-independent, single-primer amplification Delwart, E., 2009. Metagenomic analyses of viruses in stool samples from
(SISPA) of complex DNA populations. Molecular and Cellular Probes 5, 473–481. children with acute flaccid paralysis. Journal of Virology 83, 4642–4651.
Rivers, T.M., 1937. Viruses and Koch’s Postulates. Journal of Bacteriology 33, 1–12. Wada, K., Wada, Y., Ishibashi, F., Gojobori, T., Ikemura, T., 1992. Codon usage
Rose, T.M., Strand, K.B., Schultz, E.R., Schaefer, G., Rankin Jr., G.W., Thouless, M.E., tabulated from the GenBank genetic sequence data. Nucleic Acids Research 20,
Tsai, C.C., Bosch, M.L., 1997. Identification of two homologs of the Kaposi’s 2111–2118.
sarcoma-associated herpesvirus (human herpesvirus 8) in retroperitoneal Wang, D., Coscoy, L., Zylberberg, M., Avila, P.C., Boushey, H.A., Ganem, D., DeRisi, J.L.,
fibromatosis of different macaque species. Journal of Virology 71, 4138–4144. 2002. Microarray-based detection and genotyping of viral pathogens.
Rose, T.M., Schultz, E.R., Henikoff, J.G., Pietrokovski, S., McCallum, C.M., Henikoff, S., Proceedings of the National Academy of Sciences (USA) 99, 15687–15692.
1998. Consensus-degenerate hybrid oligonucleotide primers for amplification Wang, D., Urisman, A., Liu, Y.T., Springer, M., Ksiazek, T.G., Erdman, D.D., Mardis,
of distantly related sequences. Nucleic Acids Research 26, 1628–1635. E.R., Hickenbotham, M., Magrini, V., Eldred, J., Latreille, J.P., Wilson, R.K., Ganem,
Sampath, R., Hofstadler, S.A., Blyn, L.B., Eshoo, M.W., Hall, T.A., Massire, C., Levene, D., DeRisi, J.L., 2003. Viral discovery and sequence recovery using DNA
H.M., Hannis, J.C., Harrell, P.M., Neuman, B., Buchmeier, M.J., Jiang, Y., Ranken, microarrays. PLoS Biology 1, E2.
R., Drader, J.J., Samant, V., Griffey, R.H., McNeil, J.A., Crooke, S.T., Ecker, D.J., 2005. Weber, G., Shendure, J., Tanenbaum, D.M., Church, G.M., Meyerson, M., 2002.
Rapid identification of emerging pathogens: coronavirus. Emerging Infectious Identification of foreign gene sequences by transcript filtering against the
Diseases 11, 373–379. human genome. Nature Genetics 30, 141–142.
Sanger, F., Nicklen, S., Coulson, A.R., 1977. DNA sequencing with chain-terminating Williamson, S.J., Rusch, D.B., Yooseph, S., Halpern, A.L., Heidelberg, K.B., Glass, J.I.,
inhibitors. Proceedings of the National Academy of Sciences (USA) 74, 5463– Andrews-Pfannkoch, C., Fadrosh, D., Miller, C.S., Sutton, G., Frazier, M., Venter,
5467. J.C., 2008. The Sorcerer II Global Ocean Sampling Expedition: metagenomic
Schoenfeld, T., Patterson, M., Richardson, P.M., Wommack, K.E., Young, M., Mead, D., characterization of viruses within aquatic microbial samples. PLoS One 3,
2008. Assembly of viral metagenomes from yellowstone hot springs. Applied e1456.
and Environmental Microbiology 74, 4164–4174. Wooley, J.C., Godzik, A., Friedberg, I., 2010. A primer on metagenomics. PLoS
Simons, J.N., Leary, T.P., Dawson, G.J., Pilot-Matias, T.J., Muerhoff, A.S., Schlauder, Computational Biology 6, e1000667.
G.G., Desai, S.M., Mushahwar, I.K., 1995a. Isolation of novel virus-like sequences Woolhouse, M.E., Howey, R., Gaunt, E., Reilly, L., Chase-Topping, M., Savill, N., 2008.
associated with human hepatitis. Nature Medicine 1, 564–569. Temporal trends in the discovery of human viruses. Proceedings. Biological
Simons, J.N., Pilot-Matias, T.J., Leary, T.P., Dawson, G.J., Desai, S.M., Schlauder, G.G., Sciences/The Royal Society 275, 2111–2115.
Muerhoff, A.S., Erker, J.C., Buijk, S.L., Chalmers, M.L., 1995b. Identification of two Xiao-Ping, K., Yong-Qiang, L., Qing-Ge, S., Hong, L., Qing-Yu, Z., Yin-Hui, Y., 2009.
flavivirus-like genomes in the GB hepatitis agent. Proceedings of the National Development of a consensus microarray method for identification of some
Academy of Sciences (USA) 92, 3401–3405. highly pathogenic viruses. Journal of Medical Virology 81, 1945–1950.
Smith, T.F., Waterman, M.S., Sadler, J.R., 1983. Statistical characterization of Zhang, T., Breitbart, M., Lee, W.H., Run, J.Q., Wei, C.L., Soh, S.W., Hibberd, M.L., Liu,
nucleic acid sequence functional domains. Nucleic Acids Research 11, 2205– E.T., Rohwer, F., Ruan, Y., 2006. RNA viral community in human feces:
2220. prevalence of plant pathogenic viruses. PLoS Biology 4, e3.

Please cite this article in press as: Bexfield, N., Kellam, P. Metagenomics and the molecular identification of novel viruses. The Veterinary Journal (2010),
doi:10.1016/j.tvjl.2010.10.014

You might also like