Previous Article | Next Article ![]()
Infection and Immunity, September 2008, p. 4000-4008, Vol. 76, No. 9
0019-9567/08/$08.00+0 doi:10.1128/IAI.00516-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Anh-Hue T. Tu,¶ and
Ann E. Loraine
Department of Genetics, University of Alabama Birmingham, Birmingham, Alabama 35294
Received 25 April 2008/ Returned for modification 28 May 2008/ Accepted 10 June 2008
|
|
|---|
|
|
|---|
The genome sequences of several species of mycoplasma are known (see http://cbi.labri.fr/outils/molligen/). Other than Ureaplasma, which utilizes the hydrolysis of urea to generate ATP, all of the mycoplasma species that have sequenced genomes are glycolytic. Some species of mycoplasma such as M. arthritidis are nonglycolytic, and thus it was anticipated that the sequence of the M. arthritidis genome would reveal new aspects of mycoplasma physiology. Nonglycolytic mycoplasmas generally catabolize arginine as a major source of energy, and, indeed, the required genes for arginine catabolism were identified in the genome sequence of M. arthritidis.
Described here is the genome sequence of the virulent M. arthritidis strain 158L3-1 (3, 31). Strain 158L3-1 is a lysogen containing the 16-kb genome of the MAV1 bacteriophage. No additional prophages were noted in the genome. The sequence revealed features offering insight into the mechanisms by which the mycoplasma causes chronic inflammation, including two families of genes that are predicted to code for phase-variable proteins and a predicted gene product related to but distinct from MAM.
We also describe a transposon library of M. arthritidis strain 158, the nonlysogenic parent of 158L3-1. The library was created using minitransposons derived from Tn4001 that inserts into the genome at essentially random sites. The genomic location of the minitransposon was determined for 1,113 library members. Using criteria for gene inactivation previously developed for transposon mutagenesis of Mycoplasma pulmonis, 218 of the predicted protein coding regions, including three ribosomal protein genes, were inactivated.
|
|
|---|
Genome analysis. Open reading frames (ORFs) and the potential start and stop sites of genes were identified using the NCBI GLIMMER 3 server located at http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi (5). Gene products were annotated by analysis using a combination of BLAST and the Clusters of Orthologous Groups database at NCBI. The identification of orthologs in the genomes of Bacillus subtilis, M. pulmonis, and Mycoplasma genitalium was facilitated by additional BLAST analysis at http://genolist.pasteur.fr/SubtiList/, http://genolist.pasteur.fr/MypuList/, and http://cbi.labri.fr/outils/molligen/. Final annotations of each ORF to determine its likelihood as a coding region and its most likely 5' start took into consideration potential start codons (ATG and GTG start codons were favored over TTG starts), alignments to orthologs, and potential Shine-Dalgarno sequences. Proteins with predicted signal peptides or transmembrane regions were detected by using the InterProScan Web service at http://www.ebi.ac.uk/Tools/webservices/services/interproscan. Potential promoters in genes containing upstream poly(T) or poly(A) tracts were identified using BPROM at http://www.softberry.com/berry.phtml. Amino acid sequence identity between protein pairs was calculated using the default setting of the SIM application at http://us.expasy.org/. Clustal W analysis of protein families was performed using MacVector (Accelerys), version 7.2.
Library construction. Plasmid pTF85 carrying the minitransposon Tn4001TF1 was constructed by inserting the tetM gene into the BamHI site of pMT85, a plasmid containing a mini Tn4001 but without a tetracycline resistance determinant (36). The tetM gene was obtained by amplifying a 2.3-kb region of transposon Tn916 using the primer pair GTAATGGATCCCTCTCTTTGATAAAAAATTG and CTTATTGGATCCGAAACCATATTTATATAACAAC. Plasmid pTF20 carrying the minitransposon Tn4001TF2 was constructed by amplifying a 3.2-kb region comprising tetM and its upstream promoter and downstream terminator sequences from Tn916 using the primers GGTTGTGGATCCTTGTGGGTACTTTTAGGGC and CTTATTGAATTCGAAACCATATTTATATAACAAC. This fragment was ligated into the BamHI/EcoRI sites of pMT85. The gentamicin resistance determinant of pMT85 was then removed by digestion with BglII and XbaI, the vector was blunt-ended with T4 DNA polymerase, and an SalI linker, CCGGTCGACCGG (NEB), was inserted into the blunted BglII/XbaI site.
Mycoplasma strain 158 was transformed using the polyethylene glycol-mediated method as described previously (29). Plasmids pTF85 and pTF20 do not replicate in mycoplasmas, and transformants are obtained only when the minitransposon transposes into the mycoplasma genome. Transformants were selected on agar supplemented with 5 µg of tetracycline/ml. Individual transformant colonies were picked and grown at 37°C in 1 ml of broth containing tetracycline. Each culture was frozen at –80°C in broth supplemented with 15% glycerol. Most (90%) of the library consists of transformants containing Tn4001TF1, our first-generation minitransposon for M. arthritidis. Later, we switched to using Tn4001TF2 because higher transformation frequencies were obtained.
Mapping of transposon insertion sites. The genomic location of the transposon was determined for each transformant by DNA sequence analysis of the junction between the transposon and the adjacent genomic DNA. The junction was amplified by inverse PCR. The sequence of the PCR product was compared to the complete genome sequence for identification of the transposon insertion site. The inverse PCR conditions and primers were similar to those used to map the transposon location in transformants of M. pulmonis (24) except that genomic DNA was digested with Sau3AI instead of NlaIII, and the primers for inverse PCR were CCTTCGGAAAAAGAGTTGGTAGC and TCCTGCGTTATCCCCTGATTC. The inverse PCR product was purified from an agarose gel, and the nucleotide sequence was determined using the primer TTTGAGTGAGCTGATACCGCTCGC.
Confirmation of gene dispensability. To confirm that an ORF was dispensable, at least one transformant with the gene disrupted was analyzed by direct PCR to verify that the transposon integration site was correct and to determine whether an intact copy of the gene might be present in the template preparation. To confirm the genomic location of the transposon, the primer described above to sequence the inverse PCR products was paired with a gene-specific primer that annealed to sequences in the adjacent genomic DNA. A negative control consisted of amplification of a wild-type mycoplasma DNA preparation that lacked the transposon. The possible presence of an intact copy of the gene was assessed by using two gene-specific primers that flanked the site of transposon integration. The lack of a PCR product would indicate that no intact gene was present in the transformant. Template DNA from the wild-type parent strain was used as a positive control. If the PCR data confirmed the location of the transposon and indicated that no intact copy of the gene was present in the transformant, it was concluded that the gene was indeed mutable. If the transposon location was confirmed but a PCR product corresponding to an intact copy of the gene was also obtained, the transformant was subcloned. Individual filter clones were analyzed by direct PCR as described above to determine whether the transposon still resided in the gene and whether an intact copy of the gene was present in the template. Again, if in any subclone the transposon disrupted the gene and no intact copy of the gene was identified by PCR, it was concluded that the gene was mutable.
Nucleotide sequence accession number. The M. arthritidis genome sequence was deposited in the GenBank database under accession number CP001047.
|
|
|---|
As predicted for an organism that catabolizes arginine, genes coding for arginine deiminase, ornithine carbamoyltransferase, and carbamate kinase were identified. The genes coding for the latter two enzymes are apparently organized as an operon. M. arthritidis may be able to catabolize compounds in addition to arginine for energy production (20). Genes coding for the enzymes required to convert glycerol to pyruvate and then to lactate were identified. Other predicted enzymes include alcohol dehydrogenase and acetate kinase, suggesting that acetylaldehyde and acetyl phosphate can be substrates for generating ATP. The predicted gene products are inconsistent with a report of oxidation of fatty acids in M. arthritidis (27).
Other than the bacteriophage MAV1 genome, no mobile elements were evident in the genome of 158L3-1. The MAV1 genome is integrated at nucleotide positions 706579 through 722218. A 230-bp fragment that is homologous to the left end of the MAV1 genome was identified at nucleotide positions 38110 to 38340. The location of this MAV1 DNA fragment may be of interest because it is 228 bp upstream of the 5' end of the mam gene (Marth_orf036), suggesting that mam may have originally been of phage origin. Compared to the MAV1 genome sequence that was reported previously (30), the MAV1 sequence of 158L3-1 has one nucleotide addition, one deletion, and two substitutions. These differences result in a single amino acid change in the predicted MAV1 DNA integrase and a lengthening of the MAV1 HtpN protein by 23 amino acids at its amino terminus. Other than the MAV1 integrase, the only other potential site-specific DNA recombinase encoded by the genome is the Marth_orf195 gene product. This predicted protein has 62 amino acids that share amino acid similarity with a portion of the carboxy half of the IS1630 transposase from Mycoplasma penetrans. Marth_orf195 may be a truncated gene coding for a nonfunctional recombinase. M. arthritidis, therefore, appears to lack the intricate site-specific DNA inversion systems found in some species of mycoplasma (14, 15, 17, 22).
A barrier to genetic transformation of M. arthritidis is the MarI restriction enzyme, an isoschizomer of AluI (29). Strain 158L3-1 DNA is resistant to cleavage by AluI and contains AGCT sequences modified by methylation of the C nucleotide. In addition to MarI, the MAV1 genome is predicted to code for a restriction and modification system of unknown specificity (30). No obvious gene that would encode the MarI endonuclease was identified in the genome sequence, but a candidate for the gene encoding the MarI DNA methyltransferase is Marth_orf138. The Marth_orf138 product has features characteristic of cytosine-specific DNA methyltransferases and has 75% amino acid sequence identity to the HhaI methyltransferase. Marth_orf138 was likely acquired as a result of horizontal gene transfer because the top hit by BLASTn analysis of the nucleotide collection (nonredundant nucleotide) database was the HhaI DNA methyltransferase gene of Haemophilus haemolyticus (71% nucleotide identity; E value of 6e-123). It is apparent that M. arthritidis lacks the complex type I restriction systems found in some mycoplasmas such as M. pulmonis (11). Marth_orf681 is related to genes coding for the HsdM subunit of type I restriction enzymes, but no type I activity in M. arthritidis is predicted because absent are genes coding for the other subunits required for enzymatic activity, HsdS and HsdR.
Phase variation and gene families. One mechanism by which many mycoplasmas evade the host's adaptive immune responses is through the phase-variable production of critical surface proteins (6). In some cases, phase variation is achieved by slipped-strand mispairing (SSM) at a run of homonucleotides located upstream of the gene's coding region (9). Gain or loss of nucleotides in this region acts as an on/off switch for promoter activity by changing the spacing between the promoter's –10 and –35 regions. To identify sequences that might be associated with phase variation, we examined the genome of 158L3-1 to identify homonucleotide repeats of at least 10 bp in length. Twenty-nine homonucleotide repeat regions were identified. Twenty-five of these repeats were located in the predicted promoter region of genes that likely code for phase-variable surface proteins (Table 1). Many of the predicted phase-variable genes are members of one of two families (the massive surface protein, or Msp, and major surface antigen, or MSA, families). One of the Msp family members (Marth_orf471) was not initially identified as having a homonucleotide repeat because the length of the repeat was only 7 bp. However, the structure of the promoter region of this gene was essentially the same as for other family members. Therefore, a total of 26 gene products are predicted to be subject to phase variation. In nearly all cases, the homonucleotide repeat on the gene's coding strand consisted of poly(T). This is in contrast to the vlp genes of Mycoplasma hyorhinis, which have poly(A) in the promoter region of the coding strand (35).
|
View this table: [in a new window] |
TABLE 1. Predicted phase-variable genes containing homonucleotide repeats in the putative promoter
|
Eleven of the 26 predicted phase-variable proteins belong to the Msp family. Each of the MspA through MspK proteins is considered to be phase variable; the proteins are encoded by ORFs Marth_orf057, Marth_orf358, Marth_orf469, Marth_orf481, Marth_orf492, Marth_orf497, Marth_orf647, Marth_orf653, Marth_orf150, Marth_orf471, and Marth_orf727 (Table 1), respectively. An additional Msp protein (MspL, encoded by Marth_orf0275) may not be phase variable because the promoter lacks homonucleotide repeats. Most of the Msp proteins have a predicted mass of 200 or 300 kDa. The proteins apparently have a single transmembrane domain and should be exposed on the outside surface of the mycoplasma except for a short cytoplasmic domain at the amino terminus. There are no Msp orthologs identified in other bacteria.
A Clustal W analysis of the Msp proteins is illustrated in Fig. 1. The most divergent proteins are MspK and MspL, but these proteins are clearly in the Msp family. For example, MspK and MspI share 40.3% amino acid sequence identity in a 1,005-residue region, and MspL and MspF share 43.7% amino acid identity in a 955-residue region. The shortest protein is MspJ (337 amino acids), which shares 95.7% amino acid sequence identity with MspI in a 308-residue region. A 5-kb region that includes the mspJ gene (Marth_orf471) and the 3' half of the mspD gene (Marth_orf00481) is shown in Fig. 2. Four different single nucleotide changes in this region of the 158L3-1 genome (deletion of T at nucleotide position 410113, insertion of A at position 410668, substitution of C for T at 411146, and deletion of A at 411505) would merge ORFs Marth_orf471, Marth_orf472, and additional downstream sequences into a single gene that would code for a 922-amino-acid protein that shares 91.8% sequence identity with MspI. It is probable that Marth_orf471 and Marth_orf472 are remnants of a single ancestral msp gene.
![]() View larger version (17K): [in a new window] |
FIG. 1. Clustal W analysis of the Msp protein family. The scale indicates the expected number of substitutions per site.
|
![]() View larger version (17K): [in a new window] |
FIG. 2. Schematic of a 5-kb region (nucleotides 409106 to 414105) of the genome sequence illustrating the nucleotide substitutions that would merge the ORFs of strain 158L3-1 (top) into an ancestral msp gene (bottom). Numbers refer to Marth_orf designations. Not all of Marth_orf00481 is shown.
|
![]() View larger version (20K): [in a new window] |
FIG. 3. Schematic illustrating features of MSA family members including the locations and lengths of poly(A) or poly(T) tracts in promoter regions. Other than black regions, which are nonhomologous, regions of like color are homologous.
|
The MAA proteins are thought to contribute to adherence to host tissue, and MAA2 may not be the only MSA family member (Fig. 3) with a role in adherence. Some proteins within the MSA family such as MAA2 and MIA are strongly recognized by the immune system, and the phase-variable production of many of the MSA proteins likely contributes to immune avoidance (26, 33). Other than putative adhesins and MAM, little is known about factors that might contribute to pathogenesis. The M. arthritidis genome codes for peptide methionine sulfoxide reductase (Marth_orf807), required for full virulence in M. genitalium (7), and a potential hemolysin (Marth_orf698).
M. arthritidis transposon library. Unlike previous descriptions of mycoplasmal transposon libraries (12, 13), the M. arthritidis library was created with minitransposons that could initially insert into the genome but then would be unable to transpose to secondary sites. The minitransposon Tn4001TF1 was used to generate the first 1,000 mutants in the library, and minitransposon Tn4001TF2 was used for the remaining mutants. With rare exceptions, the genomic location of the transposon was different for each member of the library. A total of 1,113 different genomic sites for transposon insertion were mapped. We estimate that the library contains null mutations in 226 different genes, using the criteria that null mutations are produced when the transposon truncates at least 10% from the 5' end or 15% from the 3' end of an ORF. These criteria are based in part on results from transposon mutagenesis of M. pulmonis (12). An exception, however, was made for predicted lipoprotein gene products. A lipoprotein gene was considered inactivated even if less than 10% of the 5' end was truncated, provided that the fatty acid addition site would be missing from the remainder of the gene product. Our final table of mutated genes (see Table S1 in the supplemental material) has 218 entries, not 226. For eight genes, analysis of the transposon's genomic location by direct PCR failed to confirm the inverse PCR sequencing data, or an intact copy of the gene was detected by direct PCR in each of the subclones of the transformant. Similar findings of genes that were disrupted in the primary transformant but not in subclones thereof have been reported for transposon mutagenesis of M. genitalium (13) and M. pulmonis (12).
Of the 218 genes disrupted in M. arthritidis (see Table S1 in the supplemental material), 36 have essential homologs in M. genitalium, 26 have essential homologs in M pulmonis, and 18 have homologs that are essential in both species (Table 2). Only a few of the dispensable M. arthritidis genes that are essential in one or both of the other species of mycoplasma have potential paralogs within the M. arthritidis genome that might render the genes expendable. Perhaps most interesting are the six dispensable M. arthritidis genes that have essential homologs in B. subtilis. Three of these are the ribosomal protein S15, S18, and L15 genes. The truncated protein in the disrupted S15, S18, and L15 genes would be 78%, 61%, and 58%, respectively, of the full-length wild-type protein. Other genes important for protein synthesis that were disrupted in the library included those coding for methionyl-tRNA formyltransferase and polypeptide deformylase. Although these genes are essential in M. genitalium and M. pulmonis, formylation is not essential for initiation of protein synthesis in all bacteria, and Mycoplasma hyopneumoniae lacks these genes altogether (18, 23, 28). Other genes of interest that were disrupted in the library included those coding for potential virulence factors such as the MAM superantigen, the Marth_orf698 hemolysin, and several members of the MSA family (Marth_orf220, -221, -224, -459, -746, and -793).
|
View this table: [in a new window] |
TABLE 2. Disposable M. arthritidis genes that are essential in M. pulmonis or M. genitalium
|
|
|
|---|
Although phase-variable proteins have been described for several species of mycoplasma, the number of such proteins predicted from the genome sequence of M. arthritidis is high compared to M. pulmonis, M. genitalium, and M. pneumoniae but probably not as high as some other species such as M. gallisepticum (19, 21). The total length of the coding regions for the genes listed in Table 1 is more than 90 kb, accounting for 11% of the genome. Homonucleotide tracts located within the promoter regions of genes that are phase-variably expressed were first described in M. hyorhinis (35). The frequency of phase variation is high, about 10–3 per CFU, and results from SSM events that vary the length of the repeats and thus the spacing between the –10 and –35 sites. With 26 genes demonstrating phase variation at a frequency of 10–3, about 1 CFU in 40 will undergo a phase variation event each generation. Several of the phase-variable genes also have tandem repeats in the coding region that would be subject to SSM, resulting in variation in the size of the protein, as has been shown previously for the MIA protein of M. arthritidis (Marth_orf00584) (26). Cultures of M. arthritidis consist of complex cell populations producing a varied repertoire of proteins. Not all transformants in the transposon library will be isogenic, and care must be taken in the assessment of mutant phenotypes.
Based on the number of genes that are essential for growth in M. genitalium, about one-half of the M. arthritidis genome should consist of essential genes. The nonessential half of the genome, around 400 kb, would represent the effective target size for transposon integration. Within the nonessential regions of the genome, the density of insertion sites would be about one site per 360 bp (400 kb of genome/1,100 transposon insertion sites). Genome coverage of the library is thus inadequate to conclude that any gene not hit by the transposon is essential.
We had anticipated that unconfirmable mutants would be rare in the M. arthritidis library. In minitransposons, the transposase gene is not inside the transposon but is located elsewhere on the vector backbone. Consequently, Tn4001TF1 and Tn4001TF2 should not be capable of transposition into secondary sites. Indeed, no evidence of secondary transposition of the minitransposons was detected. Previous studies of transposon mutagenesis in mycoplasma have reported unconfirmable mutants—genes that were disrupted by the transposon in primary transformants but could not be subsequently verified (12, 13). These reports did not use a minitransposon. As Tn4001 and its derivatives transpose by an excision-insertion mechanism, we had attributed unconfirmable mutants to active transposition within the genome. In the current study using minitransposons, the location of the transposon could not be confirmed for only 32 of 338 (9%) primary transformants. For our M. pulmonis library constructed with the non-minitransposon Tn4001T, confirmation of the location of the transposon was not obtained for 97 of 464 (21%) primary transformants. Thus, the use of a minitransposon did not eliminate the occurrence of unconfirmable transposon insertions but did reduce them by 50%.
Considering the essential roles of ribosomal proteins S15, S18, and L15 in most bacteria, the dispensability of these proteins in M. arthritidis is surprising. Although we have knocked out the S18 gene in M. pulmonis, the S15, S18, and L15 genes are essential in M. genitalium. It appears that the structure of the ribosome may differ, perhaps in subtle ways, between species of mycoplasma. The ribosomes of other bacteria may exhibit different properties as well because six of the 30S ribosomal proteins including S15 are nonessential in Escherichia coli (1).
Published ahead of print on 23 June 2008. ![]()
Supplemental material for this article may be found at http://iai.asm.org/. ![]()
Present address: Department of Microbiology Immunology and Molecular Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095. ![]()
¶ Present address: Biology Department, Georgia Southwestern State University, Americus, GA 31709. ![]()
Present address: Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28223. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»