Previous Article | Next Article ![]()
Infection and Immunity, April 2002, p. 1971-1983, Vol. 70, No. 4
0019-9567/02/$04.00+0 DOI: 10.1128/IAI.70.4.1971-1983.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520,1 Department of Infectious Disease Epidemiology, Imperial College School of Medicine, University of London, London W2 1PG,2 Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom3
Received 20 September 2001/ Returned for modification 28 December 2001/ Accepted 7 January 2002
|
|
|---|
|
|
|---|
One striking feature of group A streptococcal disease is the different degree to which throat and skin infections occur (8, 34). In many temperate regions of the world, the incidence of pharyngitis peaks in winter, whereas impetigo, although less common, peaks during the summer months. In contrast, for many tropical host populations, impetigo is hyperendemic year round, whereas throat infection, be it pharyngitis or asymptomatic carriage, ranges from very low to moderate levels. These distinct epidemiological trends in streptococcal disease can lead to wide spatial and temporal distances between organisms infecting the throat or the skin and thereby might limit the opportunities for lateral gene exchange.
The recognition of distinct throat and skin strains of S. pyogenes emerged from several decades of field epidemiology, and this concept is now widely accepted in the streptococcal field (1, 8, 10, 14, 17, 18, 31, 34, 35, 41, 47). M proteins form surface fibrils which contain the antigenic targets of a serological typing scheme; the type-specific determinants lie at the amino-terminal ends of the fibril tips (15, 24). Certain M-types were frequently isolated from cases of pharyngitis but were rarely isolated from impetigo lesions. Likewise, other M-types were found to be far more common in impetigo. From this arose the concept of distinct sets of throat and skin types. Using a related typing scheme based on variation in the emm genes which encode the M proteins, >150 emm types have been characterized according to nucleotide sequence heterogeneity at the 5' ends (2, 20). Antibodies directed to the type-specific epitopes of M proteins are protective and thereby emm genes are under strong selection by the host immune response.
Closer examination of emm gene structure reveals four emm gene subfamilies, defined by nucleotide sequence divergence at the 3' end (27). Analysis of the chromosomal content of emm subfamily genes, and their relative arrangement, led to the identification of five basic patterns (designated A through E), which account for
99% of all strains. A worldwide collection of genetically diverse isolates, recovered from cases of pharyngitis or impetigo over a time frame spanning nearly 50 years, revealed that isolates with emm patterns A, B, and C have a strong tendency to be recovered from the throat, whereas emm pattern D strains are most often isolated from impetigo lesions; as a group, pattern E strains display no obvious tissue site preference (7).
It has since been confirmed in population-based studies that emm pattern A-C isolates represent the historical throat types and pattern D isolates represent the skin types. Nearly all pharyngeal isolates (>99%) collected from hospitals in Rome were found to be of emm types typically giving rise to emm patterns A-C or E, whereas <1% were of emm types associated with pattern D (16). In a rural aboriginal community in tropical Australia where impetigo is hyperendemic, not a single case of GAS pharyngitis was detected during the 25-month surveillance period; of the total GAS isolates recovered, nearly 85% were of either emm pattern D or E, with relatively few emm pattern A-C strains found (5). An experimental model for streptococcal impetigo lends further support to a strong association between emm pattern D strains and superficial infection at the skin (44, 45). Although the differences in relative emm pattern distribution among throat and skin isolates are statistically significant, the associations between emm pattern A-C and the throat, and emm pattern D and the skin, are not absolute.
Sequence divergence among neutral (i.e., housekeeping) genes has proved useful for identifying discrete ecological populations (12). emm pattern A-C strains and pattern D strains can be regarded as distinct ecological populations because they exploit different niches. The emergence of a new ecological population from an ancestral population is believed to begin with some adaptive change that allows the exploitation of a new ecological niche. Subsequently, the periodic emergence of fitter variants (periodic selection) within the new ecological population results in the purging of neutral gene diversity within this population but not in the ancestral population (and vice versa). If recombination arising from lateral gene transfer between the new and ancestral populations remains sufficiently low, multiple rounds of periodic selection events are expected to lead to substantial neutral gene divergence between the two ecological populations, but not within a population. This is an early step in the evolution of new bacterial species (12).
In this report, the extent of genetic recombination and neutral gene divergence, both within and between the emm pattern subpopulations, is measured for seven housekeeping loci. No evidence for neutral gene divergence between strains with differing emm patterns or between isolates recovered from the throat versus skin was observed. The findings suggest that recombination between the two ecological populations is too high to allow for the emergence of neutral gene sequence clusters. In cases where neutral alleles are randomly distributed with respect to ecologically distinct populations, genetic variation that is strongly associated with the different ecological populations may be directly responsible for adaptation to the ecological niche, that is to say, emm gene products (or closely linked genes) may have a direct role in tissue tropism.
|
|
|---|
Genotyping. Chromosomal DNA used as a template for PCR was prepared from freshly grown bacteria according to previously described methods (6).
Multilocus sequence typing (MLST) was determined as previously reported (19). In brief, internal fragments of the glucose kinase (gki), glutamine transporter protein (gtr), glutamate racemase (murI), DNA mismatch repair protein (mutS or hexA), transketolase (recP or tktB), xanthine phosphoribosyl transferase (xpt), and acetyl-CoA acetyltransferase (yqiL, not atoB as previously indicated) genes were amplified by PCR and subjected to nucleotide sequence determination. The relative positions of the seven housekeeping loci on the M1 genome (22) are as follows: SPy0140 (yqiL), SPy0361 (murI), SPy1136 (xpt), SPy1506 (gtr), SPy1529 (gki), SPy1676 (recP), and SPy2148 (mutS). For each locus, every different sequence was assigned a distinct allele number, and each isolate was defined by a series of seven integers (the allelic profile) corresponding to the alleles at the seven loci, in the order (alphabetical) gki-gtr-murI-mutS-recP-xpt-yqiL. Isolates with identical allelic profiles were assigned to the same sequence type (ST).
emm sequence typing was determined as reported previously (19). emm sequence typing is based on the 5' end of the central emm gene within the emm chromosomal region (for map, see references 5 and 6). There is a very strong correspondence between the M type, as determined by serology, and the emm type that meets the stated definition (2, 20). Until validation is complete, new emm types are designated emmst, which stands for emm sequence type (20) and is not to be confused with ST, which refers to the MLST allelic profile. A complete listing of emm types found in association with S. pyogenes is maintained on the Internet (www.cdc.gov/ncidod/biotech/strep/strains.html).
The emm pattern was determined for at least one representative of each unique emm type-ST combination (n = 104) according to previously described methods (5, 6) involving a PCR-based mapping approach that uses oligonucleotide primers specific for each of the four major lineages that arise from phylogenetic trees based on the 3' ends of emm genes (26, 27).
Computations and phylogenetic analysis. A matrix of pair-wise differences in allelic profiles between strains was constructed by cluster analysis based on the proportion of loci having shared alleles using the unweighted pair-group method with arithmetic averages (UPGMA) and the percent disagreement distance measure (Statistica version 5.5; StatSoft, Tulsa, Okla.).
The 212 isolates of S. pyogenes were represented by 100 STs, or clones. In order to minimize any sampling bias that might arise from inclusion of single and double locus variants (i.e., clonal complexes), representative strains were selected at a linkage distance of 0.3, based on the UPGMA dendrogram (number of STs = 72). The 28 STs eliminated by linkage distance 0.3 calculations are as follows: 3, 4, 7, 8, 12, 16, 18, 21, 23, 26, 29, 38, 41, 45, 50, 58, 61, 66, 68, 69, 70, 74, 75, 77, 79, 84, 94, and 96. The 72 STs at a linkage distance of 0.3 were not chosen at random, but were selected based on being the most representative of the S. pyogenes strains. First, in the few instances where there was more than one emm type associated with an ST, an isolate of the ST was included in the set of 72. The second ranking criterion was that the ST which was most abundant among the set of 212 isolates was retained. The third criterion was that STs which had a single locus variant within the 100-ST set were retained. Finally, if all else was equal, the ST having a variant allele with the lowest allelic frequency among the set of 212 GAS isolates was removed. In all instances, the eliminated ST was of the same emm pattern as the other characterized descendants of that node, except for ST7 and ST61.
For analysis of the subset of isolates (N = 91) recovered from either the upper respiratory tract (URT) or impetigo lesions, the criteria stated above were also applied for selecting strains at a linkage distance of 0.3. In addition, the emm pattern was determined for all 91 of these isolates.
Phylogenetic trees of the nucleotide sequences from housekeeping loci were constructed using the maximum likelihood (ML) method available in the PAUP* package (version 4.0, beta 8). The initial start-up trees were obtained using the neighbor-joining method. ML trees were reconstructed using optimized values for the transition-to-transversion ratio (Ts/Tv) and the
parameter, which describes the extent of rate variation among nucleotide sites assuming a discrete gamma distribution with eight categories; both were estimated from the empirical data during tree constructions. Optimization of the desired evolutionary model of DNA substitution and the parameters was done using hierarchical likelihood ratio tests (29), with the aid of MODELTEST, version 3.0 (40).
For congruency analysis, the HKY85 + G model for DNA substitution was used for all tree constructions. An ML method was used to determine the extent of congruency between housekeeping gene trees (21, 28). For each housekeeping gene, the differences in log likelihood (
-ln L) were computed between the ML tree for the reference gene and the ML trees of the other housekeeping loci. The comparison trees were constructed so that their topologies were used with the sequence data from the reference tree, but with branch lengths optimized to maximize the likelihood of the second tree. To determine whether differences in log likelihood scores were significant, 200 random tree topologies were generated for each reference gene, and the likelihoods for the random trees were estimated by optimizing the branch lengths. The differences in likelihood between the reference tree and the random tree topologies are considered a null distribution of
-ln L values, as would be obtained when there is no more similarity in topology among gene trees than expected by chance. If the
-ln L values between ML trees of different loci are less than the 99th percentile of the null distribution for random tree topologies, then the ML trees are considered to be significantly different and thereby incongruent. A previous report found that this method was not adversely affected by very similar sequences that contained little phylogenetic signal (21).
Split decomposition analysis was performed using SplitsTree (version 3.1). Distance measures were derived using the optimized model of evolution as determined using the ML approach. Splits graphs were also constructed using uncorrected Hamming distance measures (30).
Nucleotide diversity (average pair-wise sequence divergence;
) and average nucleotide divergence between populations (K) were calculated using DnaSP (version 3.52).
|
|
|---|
![]() View larger version (47K): [in a new window] |
FIG. 1. Cluster analysis of strains of defined emm pattern. The dendrogram was constructed using UPGMA and shows all unique emm type-ST combinations (n = 104). Labels at branch tips indicate emm type, followed by emm pattern group (A-C [ABC], D, or E); for cases where there is more than one ST associated with a given emm type, the string ends with a numerical assignment. In all instances, isolates of a given emm type share the same emm pattern grouping. Large asterisks denote nodes at linkage distances of 0.6, with descendants represented by six or more distinct clones (i.e., STs) and/or emm types. From the top of the dendrogram, the clusters are comprised of the following numbers of distinct emm types, STs, and emm patterns: cluster 1, six emm types and six STs (five of pattern A-C, one of pattern D); cluster 2, eight emm types and seven STs (six of pattern D, one of pattern E); cluster 3, seven emm types and nine STs (three of pattern A-C, six of pattern D); cluster 4, six emm types and six STs (one of pattern A-C, one of pattern D, four of pattern E).
|
Of the 212 isolates of S. pyogenes, 91 were actually known to be recovered from either the URT or impetigo lesions. Forty-eight emm types were found among the throat and skin isolates. Also, 57 STs and 61 unique emm type-ST combinations are represented among this subset of strains. Of the 91 isolates, 55 (60.4%) originated from the URT and 36 (39.6%) from impetigo lesions. For the 55 URT isolates, the majority were of emm pattern A-C (37 [67.3%]), 3 (5.5%) were of emm pattern D, and 15 (27.3%) were of emm pattern E. For the 36 impetigo isolates, only a small minority were of emm pattern A-C (4 [11.1%]), 17 (47.2%) were of emm pattern D, and 15 (41.7%) were of emm pattern E. Thus, among this set of isolates, which were originally selected in large part based on diversity in emm type (19), the correlation was high between emm pattern A-C and URT infection and between emm pattern D and impetigo.
For the 61 unique emm type-ST combinations of URT and impetigo isolates, a matrix of pair-wise distances between allelic profiles was constructed and the similarity between strains was evaluated by cluster analysis (Fig. 2). Both throat and skin isolates were widely distributed throughout the dendrogram. Of the 17 emm type-STs represented by >2 isolates, 10 contained only URT isolates and 3 were limited to impetigo isolates, whereas 4 had a mixture of both URT and impetigo isolates. Three distinct, major clusters were evident, and each cluster contains six or more unique emm type-ST combinations separated by linkage distances of <0.6. One cluster consists of a mix of both throat and skin isolates, with seven unique emm type-STs and seven URT and eight impetigo isolates; both emm patterns A-C and D are represented. However, two clusters were homogenous with regard to the tissue site origin of strains. One cluster was represented by six unique emm type-STs and contained seven impetigo isolates that were all emm pattern D. Another cluster of eight unique emm type-STs was represented by 10 URT isolates, all of emm pattern A-C. Thus, some noticeable degree of clustering is evident for both URT and impetigo isolates.
![]() View larger version (46K): [in a new window] |
FIG. 2. Cluster analysis of strains of defined tissue site of isolation. The dendrogram shows all unique emm type-ST combinations (n = 61) of isolates known to be recovered from either the URT or impetigo lesions. Labels at branch tips indicate emm type, followed by emm pattern group (A-C [ABC], D, or E); for cases where there is more than one ST associated with a given emm type, the string ends with a numerical assignment. In all instances, isolates of a given emm type share the same emm pattern grouping. Large asterisks denote nodes at linkage distances of 0.6, with descendants represented by six or more distinct clones (i.e., STs) and/or emm types. The number of URT ( ) and impetigo ( ) isolates for each emm type-ST combination is indicated to the left of the branch tip labels.
|
Another approach for determining whether the throat and skin strains of S. pyogenes comprise distinct evolutionary lineages is to directly assess the phylogenetic relationships inferred from sequences of the housekeeping gene sequences of each strain. Since the DNA fragments obtained for each locus are only about 450 bp, a phylogenetic tree generated by the ML method was constructed for concatenates of the seven housekeeping loci for known URT and impetigo isolates (Fig. 3A). The concatenated alleles are 3,134 bp in length; this length exceeds sequences tested in computer simulations, showing that longer sequences improve the accuracy of phylogenetic inference (43). Taxa were selected based on a linkage distance of 0.3 in the UPGMA dendrogram (Fig. 2). This truncation value eliminates from the analysis multiple isolates that differ from each other at only one or two of the seven housekeeping loci used in MLST and which are likely to have arisen following recent clonal diversification (19). Also, by evaluating slightly deeper phylogenetic relationships, any effects of sampling bias (e.g., multiple isolates of the same emm type) are minimized.
![]() View larger version (20K): [in a new window] |
FIG. 3. ML trees of concatenated housekeeping alleles. Unrooted, radial gene trees generated by the ML method, using the concatenated sequences of the seven housekeeping alleles for each strain (nucleotide sequence length = 3,134 bp), are shown. (A) Taxa for throat (n = 20) and impetigo (n = 22) isolates were selected at a linkage distance of 0.3; the most appropriate model for evolution was used (TrN + G + I). (B) Taxa for emm pattern A-C (n = 17) and D (n = 19) strains were selected at a linkage distance of 0.3; the most appropriate model for evolution was used (HKY85 + G + I). To determine the significance of the observed groupings, bootstrap analysis with 1,000 replicates was performed, using trees reconstructed by the neighbor-joining method to avoid excessive computational time while incorporating the same ML substitution parameters. Bootstrap values of >50% are indicated. The scale bars indicate the number of nucleotide substitutions per site. In panel A, strains for which there were descendants isolated from both tissue sites based on the dendrogram in Fig. 2 are indicated (*); branch labels indicate throat (T) or skin (S) sources of each isolate.
|
A phylogenetic tree generated by the ML method was also constructed for concatenates of the seven housekeeping loci for emm pattern A-C and D isolates, which as a group display strong preferences for the throat and skin, respectively (Fig. 3B). Taxa were selected based on a linkage distance of 0.3 (Fig. 1). Two branches that were supported by bootstrap values of >70% contained a mix of emm pattern A-C and D isolates, consistent with the idea that certain pattern A-C and D strains might share a recent common ancestor. Furthermore, isolates represented by one of the well-supported branches were known to be recovered from both tissue sites (m3ABC from URT; m53D, m91D, and M93D2 from impetigo lesions). However, most branches of the ML tree were not well supported, and therefore, it is difficult to draw conclusions on evolutionary relationships between emm pattern A-C and D strains. Nonetheless, the tree provided no evidence for any sharp division between emm pattern A-C and D strains.
The weak support for most branches in the ML trees can be explained by the very low phylogenetic signal in the housekeeping alleles and/or a history of recombination. Very low levels of average pair-wise sequence divergence (
) were observed for alleles found within both the emm pattern A-C and D subpopulations (Table 1). This finding is similar to the result obtained for the entire S. pyogenes population, as analyzed in a prior report (21). Furthermore, the average nucleotide divergence (K) between the emm pattern A-C and D populations was also low. Of the 3,134 bp of total nucleotide sequence for the seven housekeeping genes, 129 nucleotide sites are polymorphic for the combined alleles found among emm pattern A-C and D strains selected at a linkage distance of 0.3. Of these, 50 polymorphic sites (38.8%) are shared among pattern A-C and D strains.
|
View this table: [in a new window] |
TABLE 1. DNA divergence between emm pattern A-C and D subpopulationsa
|
) for emm pattern A-C and D strains (Table 1), which approaches unity for all loci (range, 1.016 to 1.071) (37). Recombination between S. pyogenes strain subpopulations. Recombination can disrupt phylogenetic relationships. One approach for determining the extent of recombination is to estimate the level of congruency between tree topologies for different genes. Congruent trees signify relatively low levels of recombination, whereas incongruent trees indicate that recombination has been sufficiently frequent to eliminate the phylogenetic signal.
The extent of congruence between tree topologies for all pair-wise comparisons of the seven housekeeping loci was determined by the ML method (Table 2). Congruence of gene tree topologies was determined for all 72 STs of S. pyogenes, selected at a linkage distance of 0.3 based on the UPGMA dendrogram (Fig. 1) and representing all three emm pattern groups. Out of a possible 42 pair-wise tree comparisons, there were three examples of significant congruence between trees, whereby the differences in likelihoods of the trees was less than the 99th percentile of the random distribution of random tree topologies (Fig. 4, gtr, recP, and xpt). However, for each instance, the comparisons fell just marginally outside of the 99th percentile of the random distribution of random tree topologies.
|
View this table: [in a new window] |
TABLE 2. Statistical tests of congruence of loci
|
![]() View larger version (24K): [in a new window] |
FIG. 4. ML analysis of gene tree congruence. The ML trees of each housekeeping locus are compared with the ML trees from the other six housekeeping loci. The differences in log likelihood scores ( -ln L) are shown between housekeeping loci (squares) and 200 trees of random topology (diamonds). The lower 99th percentile of the likelihood differences between the ML tree for each locus and the 200 random tree topologies is indicated (vertical line).
|
The extent of intragenic recombination within each of the housekeeping loci was assessed by split decomposition analysis using the alleles at each of the seven loci (Fig. 5). In splits graphs, conflicting phylogenetic signals are depicted by parallel edges, and the extent of both bifurcating and networked evolution can be illustrated. For the yqiL locus, all 22 alleles that have been identified among the 212 isolates of S. pyogenes are shown; a fit parameter of 99 indicates that near complete resolution of the relationships between alleles was achieved. The four alleles occupying the corners of the central network (yqiL01, yqiL04, yqiL10, and yqiL15) differ from one another at two nucleotide sites (109 and 237), and all four possible combinations of two nucleotides-two sites are found. Strains from all three emm pattern groupings are represented by alleles that occupy the central network. The non-networked branches that emanate from three of the central alleles represent housekeeping alleles found in strains corresponding to all emm patterns. The split decomposition findings for yqiL are indicative of intragenic recombination between alleles of strains belonging to different emm patterns.
![]() ![]() View larger version (40K): [in a new window] |
FIG. 5. Split decomposition analysis of housekeeping alleles. Splits graphs are shown for all known alleles of yqiL (A) and mutS (B) found in S. pyogenes. Representation by isolates of the three emm pattern groupings is indicated for each allele. For yqiL, nucleotide differences between the four alleles occupying the corners of the central network are indicated. Pair-wise distances between sequences were estimated using the HKY85 model of evolution incorporating ML optimized parameters. Nearly identical graphs were obtained using uncorrected Hamming distances. The scale bars indicate the number of nucleotide substitutions per site. A fit parameter of 100 indicates that all conflicts in phylogenetic signals are depicted in the graph. Splits graphs for the other five housekeeping loci are available from the authors upon request.
|
Several alleles of yqiL and mutS are shared by strains belonging to different emm pattern groups (Fig. 5). The emm pattern distribution of shared alleles among strains selected at a linkage distance of 0.3 is summarized for all seven loci (Table 3). For the 212 isolates of S. pyogenes, a total of 197 alleles are found for the seven housekeeping loci. For the 72 STs selected at a linkage distance of 0.3, 180 unique alleles are represented. Of these, 64 (35.6%) are shared among the 72 strains that differ from each other by MLST at two or more loci. Of the 64 shared alleles, only 20 are restricted to strains of a single emm pattern. Twenty-four alleles are common to both emm pattern A-C and D strains.
|
View this table: [in a new window] |
TABLE 3. Distribution of shared alleles among strains according to emm pattern group
|
The distribution of shared alleles among isolates known to be recovered from either the URT or impetigo lesions was also calculated. For the 91 URT and impetigo isolates (representing 57 STs), a total of 133 alleles were found for all seven housekeeping loci. For the 42 STs selected at a linkage distance of 0.3, 120 alleles are represented. Of these, 38 (31.7%) alleles are shared among the 42 strains that differ from each other at two or more loci. Of the 38 shared alleles, only eight are restricted to strains recovered from either the throat or skin. Thirty shared alleles (78.9%) are distributed among both URT and impetigo isolates.
In summary, for isolates that differ from all others at two or more housekeeping loci, the majority of shared alleles are distributed among strains belonging to different emm pattern groups or among isolates recovered from different tissue sites. It is highly probable that this distribution arose, in part, from horizontal gene movements involving housekeeping alleles and/or emm genes. The alternative hypothesis (mutation only) is less likely because it would require at least three independent events as follows: mutations in two or more housekeeping loci plus multiple changes in the emm type-emm pattern locus.
Recombination within emm pattern subpopulations. Several lines of evidence support the notion that there is extensive recombination between strains of different emm pattern groups and between URT and impetigo isolates. The extent of congruence between tree topologies for all pair-wise comparisons of the seven housekeeping loci was also determined separately for each emm pattern grouping (Table 1). Taxa were selected based on a linkage distance of 0.3 in the UPGMA dendrogram of allelic profiles (Fig. 1). emm pattern D strains showed differences in log likelihood scores falling within the 99th percentile of the null distribution for trees of random topology at all loci (Table 2). Gene tree comparisons among emm pattern E strains revealed only one example where the difference in log likelihood was less than that of the random trees (for gtr, when optimized to the recP gene tree). Pattern A-C strains had five comparisons for which the likelihood differences were less than those with the random trees, although two of these comparisons fell just marginally outside of the 99th percentile of the random distribution (all plots are available from the authors upon request). Thus, for all emm patterns, most comparisons between gene trees showed no significant congruence. The slight differences in tree congruency among strains of the three emm pattern groups seem largely insignificant and may merely reflect the number of isolates that have a recent common ancestor retained in the data set by truncation at a genetic distance of 0.3.
In summary, these data indicate that there has been a history of recombination among housekeeping loci within each of the three emm pattern groupings. The findings for all three emm pattern groups of S. pyogenes are in sharp contrast to those of Escherichia coli, for which nearly all ML tree comparisons of seven housekeeping loci fall outside the 99th percentile of the random distribution, even for taxa selected at linkage distances of >0.3, indicating significantly higher levels of congruency (21).
|
|
|---|
An analysis of ecological isolation depends on the accurate classification of each strain as throat or skin tropic. Based on epidemiological findings, the emm patterns provide an approximation of throat- and skin-tropic strains (patterns A-C and D, respectively) and segregate those strains whose tissue site preference is less clear (emm pattern E). An alternate approach is to categorize each isolate based on the actual tissue site from which it was recovered. The drawback here is that strains having only weak tissue site preferences are included in the analysis. Despite the different strengths and weaknesses for each of the two classification schemes, their findings were highly concordant.
The data provide strong evidence for a lack of clustering of S. pyogenes strains according to emm pattern. The UPGMA dendrogram derived from MLST allelic profiles shows that strains of different emm pattern groups are widely distributed along the branches. This finding is further supported by the lack of evolutionary divergence between emm pattern A-C and D strains, as revealed by a phylogenetic tree constructed using the concatenated alleles of the seven housekeeping loci. For all housekeeping loci, splits graphs show networked evolution suggestive of intragenic recombination and involving strains representing all three emm pattern groups. Furthermore, recombination between strains of S. pyogenes has been sufficiently frequent over the long term to result in a lack of congruency between the gene trees for the seven housekeeping loci. Whereas emm type often equates with clone or clonal complex (19), the strains of the many emm types that comprise each emm pattern subpopulation are genetically distant. Thus, recombination has obscured the phylogenetic relationships between the emm pattern subpopulations, but in most cases, it has not been significant enough to mask the relationships between isolates of the same emm type. BURST analysis offers a new and powerful tool for estimating the ratio of recombination to mutation, based on recent clonal diversification; however, even with 212 isolates, the present S. pyogenes sample set is not sufficiently large to make these estimates (21). This is because the data set was chosen to represent diversity within the S. pyogenes population, and thus there were relatively few large clonal complexes of strains which are needed for BURST.
For isolates that were known to be derived from either the URT or impetigo lesions, the UPGMA dendrogram revealed a prominent cluster of throat-derived strains (all pattern A-C) and a second major cluster of skin-derived strains (all pattern D). One possible explanation for these two clusters is that different strains arose via clonal diversification that did not alter the emm pattern and the descendant strains retained their tissue tropisms. However, genetic recombination between throat and skin strains is clearly evident. A third major cluster contains a mix of both throat and skin strains, and furthermore, both throat and skin isolates are widely distributed throughout the dendrogram. There is a lack of evolutionary divergence between throat and skin isolates, as indicated by a phylogenetic tree constructed using the concatenated alleles of the seven housekeeping loci. When the entire set of URT- and impetigo-derived strains are considered, the vast majority of alleles shared by different strains are found among both throat and skin isolates. The majority of shared alleles are also found among strains of different emm patterns. Importantly, no fixed nucleotide differences were found in any of the housekeeping genes that distinguished throat isolates from skin isolates or emm pattern A-C strains from emm pattern D strains.
In summary, there is no clear evidence that genetic variation in housekeeping loci is tightly linked to emm pattern or to actual tissue site origin. Therefore, niche separation appears to have little or no effect on neutral gene divergence.
Our findings on the relative lack of sequence divergence among neutral genes of strains from different clonal complexes or different emm pattern subpopulations closely parallels findings on genes encoding extracellular putative virulence factors of S. pyogenes (42). In that study, for 11 concatenated genes exceeding 26 kb in length, derived from isolates of 12 emm types (emm types typical of emm patterns A-C and E only), with few exceptions the bootstrap confidence limits for the ancestral nodes were <70%. Furthermore, the topologies of individual gene trees were not congruent with the topology of the tree made with concatenated sequences; the findings of split decomposition and maximum
2 analysis suggest that multiple recombination events contribute to allelic diversity. Also consistent with our data on the relationship between emm type and ST (19), restricted allelic variation is observed for extracellular virulence factor genes among strains of the same M type (42).
Humans are believed to be the sole natural host for S. pyogenes, and this bacterial species is not known to have an environmental reservoir. Horizontal gene movements between different strains of S. pyogenes are likely to be favored when the donor and recipient strains occupy the same human host tissue site at the same time. Presumably, this occurs during coinfection (or cocolonization) at either the throat or skin (5, 11). Long-term carriage at the URT might further increase the probability for coinfection with multiple strains. But even for strains with a strong tissue site preference, there are occasions when an organism ends up at the other tissue site. Secondary infection of the URT can arise following impetigo, although transfer of streptococci in the opposite direction (from throat to skin) is far less common (8, 23). Perhaps strains that lack tissue site preferences (e.g., emm pattern E) act as shuttles and facilitate the movement of genes between throat- and skin-specific strains (6). The primary vehicle(s) for lateral gene exchange between strains of S. pyogenes is not known for certain, but generalized transduction by bacteriophage is likely to play an important role (9).
Horizontal gene transfer and genetic recombination can lead to random associations between loci (linkage equilibrium), as found for the neutral housekeeping loci. Recombination between emm pattern A-C and D strains should result in the dissociation of emm pattern from the genes which confer tissue tropism, unless the emm genes themselves, or loci that are physically close to emm on the chromosome, directly contribute to tissue site preference. Therefore, the strong nonrandom association that is observed between emm pattern and tissue site provides compelling evidence that emm genes, or emm-linked genes, act as determinants of host tissue tropism. In fact, all physically distant loci that are in strong linkage disequilibrium with emm pattern A-C or D are strong candidates for having a tissue-specific role in GAS infection. Analysis of the population genetic structure of a bacterial species in combination with careful epidemiologic sampling can provide new insights on the molecular basis for pathogenicity and virulence.
The emm gene products (M and M-like proteins) are virulence factors, displaying a wide array of functions that include specific binding of several tissue and plasma proteins of the human host. In some instances, M protein-derived phenotypes are uniquely associated with emm pattern. For example, binding activity for immunoglobulin A is limited to emm pattern E (3). The high-affinity-binding site for human plasminogen (known as PAM [plasminogen-associated M protein]) is restricted to emm pattern D strains (46). Through the action of streptokinase, a plasminogen activator produced by S. pyogenes, bacterial-bound plasminogen can be converted to plasmin, a broad-spectrum protease. Conceivably, this M protein-based phenotype specifically enhances infection at the skin.
Structural domains within M and M-like proteins, whose functional significance is largely unknown, also correlate strongly with emm pattern. For example, emm pattern A through D strains have the class I antigenic epitope which is lacking in emm pattern E (class II) (4). The emm patterns themselves are defined based on differences within the 3' region of emm genes, which encodes a peptidoglycan-spanning domain; the combination of domains for each emm pattern is unique but is shared by all strains of that emm pattern (26, 27). Four peptidoglycan-spanning domain forms are defined, and they differ markedly from one another in both amino acid sequence composition and length, suggesting that each domain form reflects an adaptation to a different cell wall structure. Those phenotypes, or combinations of phenotypes, that are specifically associated with either emm pattern A-C or D strains are strong candidates for having a tissue-specific function.
Perhaps the best proven example of a tissue-specific adaptive trait for S. pyogenes infection is the secreted cysteine proteinase activity due to SpeB (45). An epidemiological survey shows highest mean levels of SpeB activity for emm pattern D strains, whereas pattern A-C strains have significantly lower levels. Direct evidence for a role of SpeB in impetigo is provided by
speB mutants, which show a loss in virulence and reproductive growth when tested in an experimental model for skin infection. Based on the genome of strain SF370 (M-type 1, emm pattern A-C), the emm and speB loci lie
13 kb apart on the chromosome (22). The genotypic basis for differential levels of SpeB activity is not precisely known, and since the regulation of SpeB activity is complex (13, 25, 32, 33, 36, 38, 39), there may be several different adaptive mutations which can lead to the "fitter" phenotype. The strong association between the emm pattern D genotype and the high SpeB activity phenotype is presumably maintained either by coinheritance due to chromosomal linkage, or alternatively, by strong coselective pressures. Loss of either trait is expected to result in a large loss of fitness for that isolate, because it would no longer be well-adapted to the skin, yet it would not necessarily be well-adapted to the throat either.
In a clonal population of bacteria, recombination rates are low and linkage disequilibrium is strong for all loci in the genome. High rates of recombination disrupt genetic linkages, leading to random associations between neutral housekeeping genes. Since S. pyogenes displays random associations between housekeeping alleles, irrespective of emm pattern grouping, it should be feasible to use predictive models to identify loci that have a tissue-specific adaptive function. Even against a background of randomly associated neutral genes, key adaptive loci will retain strong nonrandom associations because of epistatic effects. In-depth studies in population genomics should lead to the identification of the genotypes and phenotypes that are uniquely associated with infection at either the throat or skin. In addition to providing deeper insight on pathogenic mechanisms, tissue-specific adaptive loci are ideal targets for vaccination.
This work was supported by grants from the Wellcome Trust (to B.G.S.), the National Institutes of Health (AI-28944, to D.E.B., and GM-60793, to D.E.B. and B.G.S.), the American Heart Association (grant-in-aid, to D.E.B.), and a Brown-Coxe Postdoctoral Fellowship (to A.K.). M.C.E. is a Royal Society University Research Fellow.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2010 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»