Previous Article | Next Article ![]()
Infection and Immunity, January 2006, p. 578-585, Vol. 74, No. 1
0019-9567/06/$08.00+0 doi:10.1128/IAI.74.1.578-585.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.

Graduate Group in Infectious Diseases and Immunity, School of Public Health, University of California, Berkeley, California 94720
Received 3 June 2005/ Returned for modification 25 August 2005/ Accepted 24 October 2005
|
|
|---|
|
|
|---|
Strains of C. trachomatis infecting humans are subdivided into two biovars, the trachoma biovar, consisting of strains infecting columnar epithelial tissue, and the lymphogranuloma venereum (LGV) biovar, which is made up of strains infecting primarily lymphatic tissue. Strains in each biovar are further subdivided by serological typing using monoclonal antibodies that recognize epitope differences on the surface-exposed major outer membrane protein (MOMP) (46, 51). The trachoma biovar includes serovars A through K, of which serovars A, B, Ba, and C are associated with ocular trachoma and serovars D through K are associated with urogenital infection. The lymphogranuloma venereum biovar consists of three serovars, L1, L2, and L3. Additional strain differentiation within serovars has been achieved by nucleotide sequence analysis of ompA, the gene encoding MOMP (45). ompA is one of the most polymorphic single-copy genes known in bacteria; sequence variation has been detected at over 25% of its nucleotide sites, resulting in a comparable level of amino acid sequence polymorphism (11). MOMP is the main target of host immune response in humans, and its variability is thought to be due to immune selection (4-6). A large body of ompA sequence data currently exists, and these data have been used extensively as the primary point of reference for delineating relationships among strains (11, 15, 28, 29, 47, 48).
It can be questioned, however, whether ompA truly reflects the variation among strains of C. trachomatis. Phylogenetic analysis of ompA subdivides strains into three distinct and well-supported groups, the B-complex (serovars B, Ba, D, E, L1, and L2), the C-complex (serovars A, C, H, I, Ia, J, K, and L3), and the intermediate complex (serovars F and G) (11, 15, 28, 29, 47, 48). As noted over a decade ago by Fitch et al. (11) and subsequently by Stothard and others (47), these divisions are not congruent with groupings based on the tissue tropisms and pathobiological profiles of C. trachomatis. More recently, sequence characterization of genes encoding other putative surface-exposed proteins in C. trachomatis has yielded strain groupings that are generally consistent with strain pathobiology but which are discordant with the ompA phylogeny (15, 48). The apparent discordance between strain relationships based on ompA phylogeny and those based on other features of C. trachomatis biology has not been explained. One possible explanation is recombination between strains. There is evidence that recombination has occurred within the ompA gene (3, 17, 29); recombination involving other genes in the C. trachomatis genome might yield gene combinations with different phenotypic profiles. This raises the more general question of the extent to which recombination has played a role in shaping the chlamydial genomes and in generating diversity among strains.
To gain a better understanding of the genomic relationships and sequence diversity among strains of C. trachomatis, we have performed comparative sequence analysis on representative genome segments from each of 16 serovars. Our genome survey includes loci from six genes encoding housekeeping enzymes, five noncoding regions, and a gene for an outer membrane protein in addition to ompA. We augment these analyses with parallel characterizations of recently published sequence data for four genes in the polymorphic membrane protein (pmp) family (15, 48). Overall, these analyses yield a picture of the tempo and mode of C. trachomatis evolution that is strikingly different from that based on the ompA gene alone.
|
|
|---|
DNA isolation from culture. The DNA was isolated using a standard protocol employing proteinase K digestion, phenol-chloroform-isoamyl extraction, and ethanol precipitation (10).
Loci of interest. Primers were designed to amplify portions of six genes encoding housekeeping enzymes, five noncoding regions along with portions of flanking coding sequence, and the complete porB and ompA genes. Each PCR product was prepared for sequencing using the exonuclease I and shrimp alkaline phosphatase procedure (Amersham Pharmacia Biotech, Piscataway, NJ) (21). The PCR products were sequenced using ABI Big-Dye Terminator chemistry and an ABI 377 sequencer (Applied Biosystems, Foster City, CA). To ensure accuracy, each locus was amplified twice and sequenced in both directions (fourfold coverage). Any discrepancies in the consensus sequence derived from each 4x sequence were resolved through visual inspection of the electropherogram output. A listing of the sequenced regions and the GenBank accession numbers are shown in Table 1. Orthologous regions from the mouse pneumonitis (MoPn) biovar of C. trachomatis strain Nigg were retrieved from the genome sequence (AE002160) (38). The nucleotide sequences for pmpC (AF519747 to AF519765), pmpE (AY184140 to AY184154), pmpH (AY184155 to AY184169), and pmpI (AY184170 to AY184184) were retrieved from GenBank.
|
View this table: [in a new window] |
TABLE 1. Genome regions sequenceda
|
. The average nucleotide variation at synonymous (
s) and nonsynonymous (
a) sites was calculated using the Nei-Gojobori method (31). Recombination. Aligned sequences were tested for recombination using the software package RDP (Recombination Detection Program), version 2. This package implements a set of six published methods found to be sensitive for the identification of recombination and to yield the fewest false-positive findings (27, 36, 37). These six methods are RDP (26), GENECONV (35), Bootscan (40), MaxChi (42), Chimaera (37), and SiScan (14). Each method employs a different test for detecting potentially recombinant regions within aligned sequences. The null hypothesis is clonality, i.e., that the pattern of sequence variation among the aligned sequences shows no indication of recombination. Recombination was deemed to occur in a locus if clonality was rejected by three or more tests at a significance level of P < 0.001.
|
|
|---|
total) was 0.0015; the corresponding values for synonymous sites (
s) and nonsynonymous sites (
a) were 0.0022 and 0.0013, respectively. The
s/
a ratio was 1.65, a low value compared to other microbes, where ratios are typically in the 2 to 20 range (33). |
View this table: [in a new window] |
TABLE 2. Summary of nucleotide sequence variation in sampled regions of the C. trachomatis genomea
|
total for this region was 0.0073, about three times higher than that of the housekeeping gene loci.
The porB gene encodes a surface-exposed protein with porin activity (24); MOMP is also a porin, and the porB sequences thus provide a contrast to the ompA sequences (Table 2). porB is much less variable than ompA, exhibiting only 11 variable nucleotide sites over the 1,023-bp gene, 8 of which resulted in amino acid replacement. The
total (0.0032) was in the same range as the corresponding values for the housekeeping gene and noncoding loci. The level of synonymous substitution (
s= 0.0029) was nearly equal to that observed for the housekeeping gene regions, but the nonsynonymous substitution level was lower (
a = 0.0034). The
s/
a ratio of 0.86 indicates that nucleotide substitutions in porB favor amino acid change. Interestingly, strain D/IC-CAL8 contained a substitution at base 977 resulting in a premature stop codon and a predicted protein with a 15-amino-acid truncation.
The full ompA gene in each of the surveyed stains was sequenced to verify the serovar and strain designations (Table 2);the data were consistent with sequence data from previous studies. Among the 18 serovar strains sampled, 331 of 1,194 nucleotide sites were polymorphic (27.7%), a level of variability considerably higher than any of the other sampled regions. The
total for ompA was 0.1215, a value 15 to 60 times higher than that seen in the housekeeping gene, noncoding, and porB regions. More remarkable was the
s of 0.2893, indicating a synonymous substitution rate 40 to 145 times greater than that detected in the other regions.
Table 2 summarizes published sequence data for four pmp genes, pmpC, pmpE, pmpH, and pmpI (15, 48); though the strains used in those studies did not fully overlap the strains used in this study, representative sequences are provided for each serovar. The pmp's are part of the nine-member paralogous gene family that are thought to encode outer membrane proteins (44), though only three, PmpE, PmpG, and PmpH, have been demonstrated to be surface exposed (30, 49). Each of the four pmp gene sequences exhibited more differences than the housekeeping gene, noncoding, and porB regions, but none was as variable as ompA. Interestingly, the two that encode proteins known to be surface exposed, pmpE and pmpH, exhibited much higher nucleotide variability (
total = 0.0262 to 0.0367) than pmpC and pmpI (
total = 0.0051 to 0.0055). The range for synonymous substitutions varied considerably (
s = 0.0053 to 0.1009) but, again, was well below that for ompA.
Sequence divergence of human strains from the mouse strain of C. trachomatis.
To assess whether differences in mutation rates might account for the observed variability in nucleotide polymorphism patterns shown in Table 2, the nucleotide divergence (K) at each gene region was measured relative to the orthologous gene region of the outgroup MoPn strain; the extent to which the divergence values parallel the corresponding variation in
total,
s, and
a provides an indication of locus-specific differences in mutation rates. As shown in Table 3, the nucleotide divergence over all sites (Ktotal) varied less than twofold among the sampled gene regions, with the pmp genes exhibiting the most divergence. The divergence values for synonymous sites in coding regions were even more similar in magnitude (Ks = 0.528 to 0.597). Sequence divergence at nonsynonymous sites exhibited a somewhat greater range, with the housekeeping gene and porB sequences registering divergence values (Ka) of about 0.05, whereas the values for the ompA and pmp genes ranged two- to threefold larger.
|
View this table: [in a new window] |
TABLE 3. Sequence divergence between human and mouse strains of C. trachomatisa
|
s/
a) and between (Ks/Ka) populations provides an additional perspective. The within and between ratios for ompA and the pmp genes are similar in magnitude (3.6 to 6.6), whereas the ratios for porB and the housekeeping genes are distinctly different (1.7 versus 10.6 and 0.7 versus 11.4, respectively). This variation is likely due to differences in selection pressures operating at the disparate loci. Phylogenetic reconstruction. Gene trees were constructed for ompA, porB, and each individual housekeeping gene and noncoding region (data not shown) using the MoPn strain of C.trachomatis as the outgroup. The gene tree for ompA (Fig. 1) is consistent with ompA trees reported previously (11, 15, 28, 29, 47, 48). The ompA sequence alignments include a large number of parsimony informative sites and yield a gene tree with strong bootstrap support. As previously noted, this tree subdivides the serovars into three distinct groups, the B-complex (serovars B, Ba, D, E, L1, and L2), the C-complex (serovars A, C, H, I, Ia, J, K, and L3), and the intermediate complex (serovars F and G).
![]() View larger version (13K): [in a new window] |
FIG. 1. Gene tree for ompA nucleotide sequences from 18 different human-specific strains of C. trachomatis. The tree was based on uncorrected p-distances and was generated using the neighbor-joining method with the MoPn strain ompA sequence as the root. Branch lengths are proportional to the number of substitutions per nucleotide site. The numbers at the nodes are percent bootstrap values for 1,000 replications.
|
![]() View larger version (9K): [in a new window] |
FIG. 2. Phylogenetic relationships of 18 different human-specific strains of C. trachomatis based on concatenated nucleotide sequences from segments of nine genes encoding housekeeping enzymes, six intergenic noncoding segments, and the porB gene. The tree was constructed as described for Fig. 1.
|
It is to be noted that the two D serovar strains were nearly identical in ompA sequence but were differentiated in the concatenated sequence tree (Fig. 2). In the latter tree, the D/IC-Cal8 strain is in the clade with the E/Bour and F/IC-CAL3 strains, whereas the D/UW-3 strain is in the clade containing the other urogenital strains. The sequence analysis differentiating these two strains was repeated to verify that the assessments were not a consequence of sample mix-up or sequencing error. It is notable that these two strains would be considered clinically identical based on serological and ompA classification but are clearly different elsewhere in their genomes.
Recombination. Each of the sampled gene regions was tested for evidence of recombination using the six test algorithms included in the RDP. No trace of recombination was detected in any of the housekeeping gene, noncoding, or porB gene regions. In contrast, the ompA sequences were found to deviate from clonality by all six recombination tests (P < 0.001); this finding is consistent with previous reports of recombination in ompA (3, 17, 29). To test for recombination between gene regions, the non-ompA sequence data were concatenated and analyzed by RDP. Again, no indication of recombination was detected. Analysis of the published pmp gene sequence data provided evidence for recombination within pmpE and pmpH, the two most diverse members of the family, but not for pmpC or pmpI. In contrast to ompA, however, the pmpE and pmpH gene trees are similar in branching topology to the concatenated genome sequence tree.
|
|
|---|
Variation in nucleotide substitution rate. This study demonstrates a spectrum of nucleotide substitution patterns among different loci in C. trachomatis: the rates of substitution at synonymous sites vary over 100-fold, and the rates of replacement substitution vary over 50-fold. Although substitution rate differences approaching this magnitude have been noted in comparisons of homologous sequences between species (18, 33), to the best of our knowledge the extent of rate variation detected within the human C. trachomatis serovars is unprecedented. There is no simple apparent explanation for the variation in substitution rates, i.e., there are no striking differences in GC content, nor is there evidence of significant codon usage bias among the coding loci. Although only small segments of the genome were sampled, there is no indication that the rate differences can be attributed to chromosomal position or to location of the coding sequence on a leading or lagging strand. There is experimental evidence suggesting that in some microbial species spontaneous mutation rates increase in highly transcribed genes (18, 52) and in genomes under environmental stress (2); the former mechanism might account for the high rate of substitution in ompA given that it is one of the most highly expressed genes in the C. trachomatis genome. Neither of these mechanisms can be excluded based on the data in hand; whether either is actually in play in C. trachomatis is an open question.
Variation in the rates of nonsynonymous substitution can be linked to predicted variation in selection pressures. The very high level of replacement substitution in the ompA gene can be attributed to immune selection pressure on the protein it encodes, MOMP, as it is known that MOMP elicits a strong immune response (4, 20). The high level of replacement substitution in pmpE and pmpH may also be associated with immune selection pressure, since both encode surface-exposed proteins (49). The surface exposure of the less variable PmpC and PmpI proteins is not known, and nothing is known of their immunogenic potential. The PorB protein is surface exposed but appears not to be a natural target for the immune response (22); this may account for its relatively low level of replacement substitution, a level only marginally higher than that seen for the genes encoding the housekeeping enzyme. The pattern of nucleotide substitution in the noncoding regions suggests that they may also be under selection pressure, presumably to conserve regulatory element binding sites. Two lines of evidence support this idea, both derived from comparison of the noncoding sequences from the human and mouse strains. First, the sequence divergence in the noncoding regions is substantially lower than what would be expected if the substitutions in the noncoding regions were neutral, assuming that neutral substitution rates are reflected in the divergence values for synonymous sites in the coding regions (noncoding K = 0.211, versus Ks = 0.528 to 0.597). This rationale has been applied to account for differences between K and Ks in other bacteria (19). Second, the noncoding region sequence alignments contain multiple runs of invariant sequence greater than 15 bp in length; such runs would not be expected, given a random mutation model.
The substantial differences seen with synonymous site substitution rates at the different loci in C. trachomatis contrast with the relative uniformity of divergence rates at synonymous sites between the human C. trachomatis strains and the mouse strain MoPn. This pattern might be accounted for under an evolutionary scenario in which it is assumed that synonymous substitutions are essentially neutral and serve as a molecular clock for events in C. trachomatis evolution (23). Under this scenario, the uniformity of divergence rates indicates a common point in time for the split between the mouse and the human strain genomes. Along the human strain lineage, however, different genes diverged at different points in time, with ompA at the deepest remove in time, then the pmp genes and, most recently, the housekeeping and porB genes. A possible implication of this is that the divergence of the ompA gene andpossibly the pmp genes began before the division of the genomic lineages leading to the contemporary serovars.
The alternative to this scenario is that the variation in substitution rates among the genes within the human C. trachomatis strains developed after the division of the contemporary serovar lineages. This scenario would entail extraordinary locus-specific mechanisms involving acceleration of substitution rates at some gene loci (notably ompA), intense selection constraints extending to the codon level within the genes exhibiting very low substitution rates, or some combination of these. Obviously, explication of such mechanisms, should they exist, would be of considerable interest for advancing understanding of the biology of C. trachomatis.
Phylogenetic relationships among serovars within C. trachomatis. The comparative sequence analyses presented here provide evidence that the C. trachomatis genome has evolved along at least two distinct phylogenetic trajectories. The phylogenetic branching pattern represented by the ompA gene represents one of the trajectories. The ompA tree distinguishes three very strongly supported groups (Fig. 1); ompA variants within a group differ in sequence by 0 to 8%, whereas variants in different groups differ by 12 to 20%. The other phylogenetic trajectory is represented, at least to the extent indicated by the data described here, by the remainder of the C. trachomatis genome. Tree building based on the multiple sites sampled from around the genome as well as the from four pmp genes yields a coherent and internally consistent tree with a different topology than the ompA tree (Fig. 2). Significantly, the four main branches of this genome tree coincide with the tissue tropisms and patterns of disease presentation associated with C. trachomatis infection in the human host. This differentiation of serovars is supported by recent microarray-based genome surveys (8). It is worth speculating that the divergent phylogenetic trajectories observed here may reflect different selection pressures associated with the biphasic life cycle of C. trachomatis: the general genomic trajectory tied to the organism's functioning in the intracellular milieu and the ompA trajectory determined by immune and possibly other host niche selection pressures during its extracellular phase.
Recombination. The discordant phylogeny of ompA compared to other genes in the C. trachomatis genome is prima facie evidence of recombination involving ompA (36). Mosaic gene structures of ompA have been identified in previous studies and attributed to recombination involving segments within the ompA gene (3, 17, 29). This study provides two additional lines of evidence for recombination involving ompA. First, all six computational approaches used by the Recombination Detection Program indicated a high probability of recombination events among the aligned ompA sequences. Second, we detected two strains that unambiguously fall into different genome groups based on sequence differences in the sampled gene set but which would be classified in serovar D based on ompA sequence typing; the most parsimonious explanation is that an entire ompA gene has moved from one background genome type to another. Thus, it appears possible the entire ompA gene as well as ompA gene segments can undergo recombinational transfer between strains.
Recombination provides a ready explanation for the dissimilarity of ompA genes among otherwise biologically related strains, for example, the LGV strains. Recombination can also account for two recently reported observations of incongruent associations between genomic markers and serovar types. In the first case, clinical isolates of ocular origin are differentiated from urogenital strains by carriage in the former of defective trp operon genes; urogenital strains have functional trp operons. Although isolates in serovar B are typically of ocular origin and have defective trp operons, some urogenital tract isolates carrying intact trp operons have been identified as belonging to serovar B (7); it is possible that these isolates are the result of a transfer of an ompA gene from an ocular strain into a urogenital strain genomic background. The second apparent anomaly is the detection in some clinical isolates of serovar incongruent pmpC sequences (15); though this was attributed to recombination involving the pmpC gene, given the findings of this study it is more likely that the recombinational transfer involves the ompA gene.
The likelihood that recombination involving ompA has occurred in C. trachomatis prompts the question of whether it occurs elsewhere in the genome as well. The RDP provides statistical evidence of recombination in the pmpE and pmpH genes but finds no significant deviation from clonality across the remainder of the genome. To the extent recombination may occur at the two pmp loci, it appears not to have resulted in substantial departures from the general genome phylogeny as defined by the concatenated sequence data. There are a sufficient number of polymorphic sites in the segments of genome sampled here to have allowed detection of a recombination event had one occurred. Thus, the available evidence argues against genetic fluidity in the C. trachomatis genome. Rather, it suggests that recombination is probably uncommon and is localized to a few sites, most prominently at ompA. Although diverse recombination mechanisms (including gene conversion) have been invoked to account for localized genetic variation in other organisms (9, 16, 41), the details of recombination in C. trachomatis remain to be identified and are questions for future study.
Finally, a clinical and epidemiological consequence of ompA recombination is that serovar classification based on ompA sequence variation does not necessarily reflect the genetic content of the remainder of the C. trachomatis genome. To the extent that the content of the genome determines the pathobiology and the epidemiological success of the organism, strain typing based on ompA alone may paint an incomplete picture. The same consideration applies for studies looking at possible associations between chlamydia infection and other maladies, such as cervical cancer. A classification system for C. trachomatis that incorporates more extensive genomic characterization would be beneficial.
This work was supported in part by a Faculty Bridging Grant to G.F.S.
Present address: Virus and Prion Diseases of Livestock Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA 50010. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»