The Phylogeny of Staphylococcus Aureus - Which Genes Make The Best Intra-Species Markers?
The Phylogeny of Staphylococcus Aureus - Which Genes Make The Best Intra-Species Markers?
The Phylogeny of Staphylococcus Aureus - Which Genes Make The Best Intra-Species Markers?
28620-0
The ability to make informed decisions on the suitability of alternative marker loci is central for
population and epidemiological investigations. This issue was addressed using Staphylococcus
aureus as a model population by generating nucleotide sequence data from 33 gene fragments in
a representative sample of 30 strains. Supplementing the data with pre-existing multilocus
sequence typing data, an intra-species tree based on ~17?8 kb of sequence was reconstructed
and the goodness of fit of each individual gene tree was computed. No strong association was
Received 21 October 2005 noted between gene function per se and phylogenetic reliability, but it is suggested that candidate
Revised 21 January 2006 loci should possess at least the average degree of nucleotide diversity for all genes in the
Accepted 30 January 2006 genome. In the case of S. aureus this threshold is >1 % mean pairwise diversity.
the generation of MLST data. (ii) Although recombination DNA replication and processing, regulators; n=9), housekeeping
does occur (Robinson & Enright, 2004), S. aureus is basi- (HK; central and intermediary metabolism; n=13), and cell envel-ope
cally clonal, which allows the reconstruction of a and cellular processes (CE; n=5). We also characterized con-served
genes of unknown function (UF; n=7) and orphans (OR; unknown
reasonably robust tree. This then facilitates comparisons function, no similarity to other genes in the database; n=6). Genes of
between indi-vidual gene trees and a consensus tree. (iii) unknown function are referred to throughout using the SA ORF
The data will provide a valuable phylogenetic framework numbers proposed by Kuroda et al. (2001), except SA2439, which
for this impor-tant human pathogen. has subsequently been renamed sasF (Robinson & Enright, 2004).
(with the first 2000 trees discarded as ‘burn-in’). A 50 % majority than very uniform genes, variable genes are more likely to show a
rule consensus tree was then calculated using PAUP* version 4.0b10 closer fit. In order to draw independent comparisons between
(Swofford, 2000) with the posterior probabilities indicating the per- individual gene trees and the consensus, we constructed a further 37
centage of optimal trees supporting each node. consensus trees, in each case excluding a single gene. We then
compared each of these consensus trees in turn with the gene tree
Fit to the consensus tree. As very variable genes make a larger corresponding to the excluded gene. We used the Shimodaira–
contribution, in terms of informative sites, to the consensus tree Hasegawa (S-H) test (Shimodaira, 2002) in order to rank each gene
with respect to the differences in likelihood values between indivi- employed in this study. The genes are ranked according to the
dual gene trees and the corresponding consensus tree (using the likelihood differences between individual gene trees and the
concatenated data as the reference). The S-H test was implemented in
consensus tree (FCT). The value of p for all genes was 1?28
PAUP* version 4.0b10 (Swofford, 2000); a lower likelihood differ-
ence (S-H score) reflects a closer fit to consensus tree (FCT). %. Five of the six most uniform genes were classified as IP
genes (16S rRNA, 0?0 %; sarA, 0?02 %; tufA, 0?2 %; serS,
0?3 %; and sigB, 0?4 %). At the other extreme, three genes
appeared unusually diverse [agrC (IP), 5?5 %; aapA (CE), 5?
RESULTS 3 %; and SA1619 (OR), 4?0 %]. Although the dS/dN ratio
Table 2 gives the mean pairwise percentage nucleotide varies substantially both within and between gene classes,
diversity (p), the mol% G+C content, the codon adaptation none of the genes showed evidence of positive selection. The
index (CAI) and the dS/dN ratios of all the gene loci orphans tended to exhibit low dS/dN ratios (mean 3?1,
median 2?8), suggesting a low level of functional found to be relatively common amongst intravenous drug
constraint and rapid evolution. This is consistent with the users in Brighton, UK (Monk et al., 2004), and is an impor-
non-essentiality of these genes, as indicated by their tant community-acquired MRSA from the USA (Pan et al.,
absence from the sequenced genomes of the closely related 2005; Vandenesch et al., 2003). Group 2 contains the related
species S. epidermidis. CC8 clones (EMRSA-1, 2, 4, 5, 6, 11 and 17), which includes
the first MRSA lineage to be described (Crisostomo et al.,
2001), ST5 (EMRSA-3; the New York/Japan clone) (Oliveira
The phylogeny of S. aureus lineages et al., 2002) and ST1, which is the genotype of sequenced
Data for 37 loci were concatenated to produce a total of strains MSSA476 (Holden et al., 2004) and MW2 (Kuroda et
~17?8 kb for each of the 30 strains, and used to produce al., 2001). Group 2 also contains a relatively high number of
the unrooted Baysian tree presented in Fig. 2. This tree is sporadic or asymptomatic genotypes (e.g. ST20, ST9, ST13,
broadly consistent with one previously published based on ST101, ST7, ST97) and exhibits shorter branch lengths and
the concatenated sequences of the seven MLST genes, but lower clade credibility values than Group 1a.
is more robust and contains no unresolved branches. The
tree confirms the division into two main groups, as Relationship between gene function and fit to the
reported previously (Feil et al., 2003; Holden et al., 2004; consensus tree
Robinson et al., 2005) with Group 1 being further
subdivided in to Groups 1a and 1b. ST55 is an exceptional We ranked each gene tree with respect to its fit to a con-sensus
genotype, pre-viously being classified as Group 1 but tree (FCT) reconstructed excluding the gene under
appearing to fall at an intermediate position between the examination using the S-H test, as described in Methods
two main groups from the current data. (Table 2). All the genes showed significantly lower like-lihood
scores (P<0?001) against the consensus tree (com-pared with
Group 1a contains the major MRSA clones ST36 (EMRSA- the concatenated data) using the S-H test. The gene showing
15), ST22 (EMRSA-16) and ST45 (the Berlin clone) (Aires the closest FCT (i.e. the smallest likelihood difference) was
de Sousa & de Lencastre, 2004; Oliveira et al., 2002), as well sasF (SA2439) which is of unknown func-tion but likely to
as the common MSSA clone ST30 from which ST36 is encode a surface-associated protein as it contains an LPXTG
thought to have evolved (Enright et al., 2000). Interestingly, motif (Roche et al., 2003); it was one of several putative cell-
and in contrast to the MLST tree, these data suggest that ST45 wall-associated genes used for a fine-scale study of the micro-
and ST30 share a common ancestor. Group 1b con-tains no evolution of MRSA clonal lineages by Robinson & Enright
major nosocomial lineages; although ST59 was (2003). This result is surprising, as
MRSA lineages. Although this is an improvement on the isolates (MSSA476 and MW2) which only revealed 285
existing tree, the branching order cannot be reconstructed with single base changes in all orthologous gene pairs (~1 in
complete confidence in some parts of the tree. Would the tree 10,000 sites) (Holden et al., 2004). These results confirm
be improved by the addition of yet more data? Rokas et al. the high degree of genetic relatedness between isolates
(2003) examined phylogenetic congruence in eight yeast sharing identical STs. However, a more extensive
species and concluded that the concatenated data of a investigation of intra-clonal differences has proved
minimum of 20 genes are required to produce a robust tree. successful in providing detailed hypotheses concerning the
Although in terms of nucleotide sites our dataset of 38 gene emergence of closely related MRSA clones (Robinson &
fragments is of a similar size to the 20 genes of Rokas et al. Enright, 2003). This study utilized the highly variable sas
(2003), the use of a higher number of independently evolving genes, and our current results suggest that these genes
genes should increase the performance of the data. Therefore might also be highly infor-mative for reconstructing deeper
we feel that the intra-species tree we present here would not relationships within the S. aureus population. A second
be greatly improved by the addition of yet more data. The study, utilizing variable adhesin genes, provided some
broad consistency of this phylogeny with the basic groupings evidence that recombina-tion is more common within,
previously inferred from MLST genes (Feil et al., 2003), sas rather than between, clonal complexes (Kuhn et al., 2006).
genes (Robinson et al., 2005), adhesin genes (Kuhn et al.,
2006), AFLP clustering (Melles et al., 2004), PFGE Gene function, diversity and informative trees
(Grundmann et al., 2002) and microarray analysis (Lindsay
et al., 2006) provides additional support. These data are not only relevant for studies on S. aureus, but
also provide clues as to the extent to which the current criteria
The topology within Group 2 remains relatively poorly for choosing gene loci for phylogenetic, systematics or
supported, and contrasts with the much longer branches epidemiological studies can be justified or relaxed. Here we
evident in Group 1a. This difference between these groups has find little evidence to justify the current emphasis on
also been noted in an analysis of MLST and sas genes housekeeping genes, at least on an intra-species level, and
(Robinson et al., 2005). One possibility is that the globally indeed our results for S. aureus suggest that the MLST genes
disseminated Group 1a clones (clonal complexes –‘CCs’ – 30, for this species rate amongst the poorest phylogenetic
45 and 22) may be particularly efficient at out-competing markers. In contrast, the three genes which score highly
close relatives and that the longer branch lengths in Group 1a against the consensus tree are putatively associated with the
reflect a higher rate of stochastic extinction than in Group cell wall (sasF), modified in antibiotic-resistant strains
2. The relatively poor clade credibility scores in Group 2 are (pbpB) or an orphan (SA1619), all of which would have been
also consistent with a higher rate of recombination in Group avoided under classical MLST criteria.
2 strains, although comparisons of the two groups using
various tests for recombination did not produce strong We suggest that the emphasis on gene choice for intra-species
evidence to support this view (data not shown). phylogenetic markers should be shifted to the more tangible
parameter of nucleotide diversity, with gene func-tion being
Our data also suggest that Group 1 strains can be further regarded as secondary. Clearly, gene function and diversity
subdivided into Group 1a and Group 1b, a division not are not always independent; ‘informational path-way’ genes
recognized in previous phylogenetic studies. Although (in particular 16S rRNA) should generally be avoided due to
there is generally little association between phylogenetic the extremely low levels of diversity of these genes. It is not
distri-bution and epidemiological source, it is noteworthy clear what determines the point at which extra variation ceases
that Group 1b contains no major nosocomial lineages, to improve the tree, and more speci-fically why variation in
whereas Group 1a contains the two major MRSA clones reasonable excess of 1 % generally does not result in a closer
currently circulating in the UK (STs 36 and 22), as well as fit to the consensus phylogeny. Nevertheless, this analysis
the Berlin clone (ST45). Future studies aimed at provides a convenient ‘rule of thumb’ for identifying genes
identifying the gene-tic factors underlying the ability to which are likely to contain sufficient diversity, i.e. those
rapidly disseminate might therefore focus on comparisons containing at least the average for all genes. Our results also
between Group 1a and Group 1b strains. An interesting raise the possibility of a cor-relation between G+C content
observation in this context is the high degree of divergence and closeness of fit to a con-sensus tree. Given the large
between Group 1a and Groups 1b and 2 at aapA (see number of potential candidate loci for each gene it may
supplementary Fig. S2); an examination of the region therefore also be sensible to avoid those with extreme G+C
surrounding this gene might therefore shed some light on contents.
the epidemiological differences between the two groups.
We emphasize that we do not advocate changes to any
Although the phylogenetic emphasis of this study was on the established MLST scheme. The current MLST scheme for S.
relationships between the major clonal lineages, we included aureus has proved extremely successful in understanding the
duplicates of four STs (5, 22, 36 and 121). In each case, these population structure of this species and for assigning isolates
duplicates differed at five or fewer positions in the to particular lineages. In a highly clonal organism, almost any
concatenated sequence of 17 814 sites (<0?0004 %). This is gene will typically provide the same basic lineage
consistent with a comparative genome analysis of two ST1 assignments – in the case of S. aureus this is clear from the
http://mic.sgmjournals.org Downloaded from www.microbiologyresearch.org by
1303
IP: 114.142.171.22
On: Thu, 22 Mar 2018 16:34:52
J. E. Cooper and E. J. Feil
broad consistency of different genes as well as pan- Feil, E. J., Cooper, J. E., Grundmann, H. & 9 other authors (2003).
genome techniques such as PFGE (Grundmann et al., How clonal is Staphylococcus aureus? J Bacteriol 185, 3307–3316.
2002) and microarrary analysis (Lindsay et al., 2006).
However, indi-vidual genes may vary in their utility to Gevers, D., Cohan, F. M., Lawrence, J. G. & 8 other authors (2005).
reconstruct the relationships between these lineages, and Opinion: re-evaluating prokaryotic species. Nat Rev Microbiol 3,
733–739.
we find no evi-dence to suggest that MLST genes can be
considered the most reliable in this regard. Grundmann, H., Hori, S., Enright, M. C., Webster, C., Tami, A., Feil,
E. J. & Pitt, T. (2002). Determining the genetic structure of the natural
population of Staphylococcus aureus: a comparison of multi-locus
Concluding remarks sequence typing with pulsed-field gel electrophoresis, randomly amplified
polymorphic DNA analysis, and phage typing. J Clin Microbiol 40,
We present the most robust tree to date of the natural S. 4544–4546.
aureus population, and identify three distinct groups within Hanage, W. P., Fraser, C. & Spratt, B. G. (2005). Fuzzy species
the population. We propose an emphasis on gene diversity, among recombinogenic bacteria. BMC Biol 3, 6.
rather than gene function, when identifying suitable phylo- Holden, M. T., Feil, E. J., Lindsay, J. A. & 42 other authors (2004).
genetic markers. Although this may necessitate preliminary Complete genomes of two clinical Staphylococcus aureus strains:
work on candidate loci before final genes are chosen, we Evidence for the rapid evolution of virulence and drug resistance.
argue that this represents a sensible investment of resources. Proc Natl Acad Sci U S A 101, 9786–9791.
Finally, our analysis differs from studies on more deep-rooted Huelsenbeck, J. P. & Ronquist, F. (2001). MRBAYES: Bayesian
phylogenies (i.e. those between genera or orders) (Zeigler, inference of phylogenetic trees. Bioinformatics 17, 754–755.
2003). In this case, the presence of sufficient diver-sity is not Jolley, K. A., Kalmusova, J., Feil, E. J., Gupta, S., Musilek, M., Kriz,
likely to be problematic and the use of ‘core’ genes may well P. & Maiden, M. C. (2000). Carried meningococci in the Czech
be justified. At an intra-species level, however, given the Republic: a diverse recombining population. J Clin Microbiol 38, 4492–
4498.
choice of many candidate ubiquitous genes, we argue that the
presence of sufficient diversity should be con-sidered first and Kuhn, G., Francioli, P. & Blanc, D. S. (2006). Evidence for clonal
foremost, and other considerations relating to gene function evolution among highly polymorphic genes in methicillin-resistant
Staphylococcus aureus. J Bacteriol 188, 169–178.
should be secondary.
Kumar, S., Tamura, K. & Nei, M. (2004). MEGA3: integrated software
for Molecular Evolutionary Genetics Analysis and sequence align-ment.
Brief Bioinform 5, 150–163.
ACKNOWLEDGEMENTS Kunst, F., Ogasawara, N., Moszer, I. & 148 other authors (1997).
This work was funded by an MRC Career Development Award to E. The complete genome sequence of the gram-positive bacterium
J. F. We are grateful to Eduardo Rocha for calculation of the CAI Bacillus subtilis. Nature 390, 249–256.
values, to Mark Enright for the provision of strains and to Ashley Kuroda, M., Ohta, T., Uchiyama, I. & 34 other authors (2001).
Robinson for constructive comments on the manuscript. Whole genome sequencing of meticillin-resistant Staphylococcus
aureus. Lancet 357, 1225–1240.
Leski, T. A. & Tomasz, A. (2005). Role of penicillin-binding protein 2
(PBP2) in the antibiotic susceptibility and cell wall cross-linking of
REFERENCES Staphylococcus aureus: evidence for the cooperative functioning of
PBP2, PBP4, and PBP2A. J Bacteriol 187, 1815–1824.
Aires de Sousa, M. & de Lencastre, H. (2004). Bridges from
hospitals to the laboratory: genetic portraits of methicillin-resistant Lindsay, J. A., Moore, C. E., Day, N. P., Peacock, S. J., Witney, A.
Staphylococcus aureus clones. FEMS Immunol Med Microbiol 40, A., Stabler, R. A., Husain, S. E., Butcher, P. D. & Hinds, J. (2006).
101–111. Microarrays reveal that each of the ten dominant lineages of
Bapteste, E., Susko, E., Leigh, J., MacLeod, D., Charlebois, R. L. Staphylococcus aureus has a unique combination of surface-
& Doolittle, W. F. (2005). Do orthologous gene phylogenies really associated and regulatory genes. J Bacteriol 188, 669–676.
support tree-thinking? BMC Evol Biol 5, 33. Maiden, M. C., Bygraves, J. A., Feil, E. & 10 other authors (1998).
Crisostomo, M. I., Westh, H., Tomasz, A., Chung, M., Oliveira, Multilocus sequence typing: a portable approach to the identification
D. C. & de Lencastre, H. (2001). The evolution of methicillin of clones within populations of pathogenic microorganisms. Proc
resistance in Staphylococcus aureus: similarity of genetic backgrounds Natl Acad Sci U S A 95, 3140–3145.
in historically early methicillin-susceptible and -resistant isolates and Melles, D. C., Gorkink, R. F., Boelens, H. A. & 8 other authors
contemporary epidemic clones. Proc Natl Acad Sci U S A 98, 9865– (2004). Natural population dynamics and expansion of pathogenic clones
9870. of Staphylococcus aureus. J Clin Invest 114, 1732–1740.
Enright, M. C., Day, N. P., Davies, C. E., Peacock, S. J. & Spratt, B. Monk, A. B., Curtis, S., Paul, J. & Enright, M. C. (2004). Genetic
G. (2000). Multilocus sequence typing for characterization of methi- analysis of Staphylococcus aureus from intravenous drug user lesions.
cillin-resistant and methicillin-susceptible clones of Staphylococcus J Med Microbiol 53, 223–227.
aureus. J Clin Microbiol 38, 1008–1015. Nei, M. & Gojobori, T. (1986). Simple methods for estimating the
Feil, E. J., Maiden, M. C., Achtman, M. & Spratt, B. G. (1999). The numbers of synonymous and nonsynonymous nucleotide substitu-tions.
relative contributions of recombination and mutation to the diver-gence of Mol Biol Evol 3, 418–426.
clones of Neisseria meningitidis. Mol Biol Evol 16, 1496–1502.
Oliveira, D. C., Tomasz, A. & de Lencastre, H. (2002). Secrets of
Feil, E. J., Smith, J. M., Enright, M. C. & Spratt, B. G. (2000). success of a human pathogen: molecular evolution of pandemic clones of
Estimating recombinational parameters in Streptococcus pneumoniae meticillin-resistant Staphylococcus aureus. Lancet Infect Dis 2, 180–
from multilocus sequence typing data. Genetics 154, 1439–1450. 189.
Downloaded from www.microbiologyresearch.org by
1304 Microbiology 152
IP: 114.142.171.22
On: Thu, 22 Mar 2018 16:34:52
Gene function and phylogeny in S. aureus
Pan, E. S., Diep, B. A., Charlebois, E. D., Auerswald, C., Carleton, Rokas, A., Williams, B. L., King, N. & Carroll, S. B. (2003). Genome-
H. A., Sensabaugh, G. F. & Perdreau-Remington, F. (2005). Pop- scale approaches to resolving incongruence in molecular phylogenies.
ulation dynamics of nasal strains of methicillin-resistant Staphy- Nature 425, 798–804.
lococcus aureus – and their relation to community-associated disease Ronquist, F. & Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylo-
activity. J Infect Dis 192, 811–818. genetic inference under mixed models. Bioinformatics 19, 1572–1574.
Pinho, M. G., Filipe, S. R., de Lencastre, H. & Tomasz, A. (2001). Sharp, P. M. & Li, W. H. (1987). The codon adaptation index – a
Complementation of the essential peptidoglycan transpepti-dase function measure of directional synonymous codon usage bias, and its poten-tial
of penicillin-binding protein 2 (PBP2) by the drug resistance protein applications. Nucleic Acids Res 15, 1281–1295.
PBP2A in Staphylococcus aureus. J Bacteriol 183, 6525–6531.
Shimodaira, H. (2002). An approximately unbiased test of phylo-
genetic tree selection. Syst Biol 51, 492–508.
Rice, P., Longden, I. & Bleasby, A. (2000). EMBOSS: the European
Molecular Biology Open Software Suite. Trends Genet 16, 276–277. Sieradzki, K. & Tomasz, A. (1999). Gradual alterations in cell wall
structure and metabolism in vancomycin-resistant mutants of
Robinson, D. A. & Enright, M. C. (2003). Evolutionary models of the
Staphylococcus aureus. J Bacteriol 181, 7566–7570.
emergence of methicillin-resistant Staphylococcus aureus. Antimicrob
Agents Chemother 47, 3926–3934. Spratt, B. G. & Maiden, M. C. (1999). Bacterial population genetics,
evolution and epidemiology. Philos Trans R Soc Lond B Biol Sci 354,
Robinson, D. A. & Enright, M. C. (2004). Evolution of Staphylo-
701–710.
coccus aureus by large chromosomal replacements. J Bacteriol 186,
1060–1064. Swofford, D. L. (2000). PAUP* – Phylogenetic Analysis Using
Parsimony*, and Other Methods. Sunderland, MA: Sinauer Associates.
Robinson, D. A., Monk, A. B., Cooper, J. E., Feil, E. J. & Enright, M.
C. (2005). Evolutionary genetics of the accessory gene regulator (agr) Vandenesch, F., Naimi, T., Enright, M. C. & 8 other authors (2003).
locus in Staphylococcus aureus. J Bacteriol 187, 8312–8321. Community-acquired methicillin-resistant Staphylococcus aureus
Roche, F. M., Massey, R., Peacock, S. J., Day, N. P., Visai, L., Speziale, carrying Panton-Valentine leukocidin genes: worldwide emergence.
P., Lam, A., Pallen, M. & Foster, T. J. (2003). Characterization of novel Emerg Infect Dis 9, 978–984.
LPXTG-containing proteins of Staphylococcus aureus identified from Zeigler, D. R. (2003). Gene sequences useful for predicting relatedness
genome sequences. Microbiology 149, 643–654. of whole genomes in bacteria. Int J Syst Evol Microbiol 53, 1893–1900.