Previous Article | Next Article ![]()
Infection and Immunity, March 2008, p. 1016-1023, Vol. 76, No. 3
0019-9567/08/$08.00+0 doi:10.1128/IAI.01535-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Laboratory of Genomic Research on Pathogenic Bacteria,1 Department of Bacterial Infections, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan,2 Laboratory of Comparative Genomics, Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan,3 Laboratory of Molecular Gene Technics, Department of Genetic Resources Technology, Faculty of Agriculture, Kyushu University, Fukuoka, Japan,4 Division of Bioenvironmental Science, Frontier Science Research Center, University of Miyazaki, Miyazaki, Japan5
Received 20 November 2007/ Returned for modification 16 December 2007/ Accepted 6 January 2008
|
|
|---|
|
|
|---|
Although infections caused by V. parahaemolyticus are usually sporadic cases caused by various serotypes of this organism, strains with O3:K6 and a few other serotypes, such as O4:K68, O1:K25, and O1:K untypeable (O1:KUT), have caused an increasing number of worldwide outbreaks of gastroenteritis since 1996 (1, 4, 5, 6, 7, 18, 19, 21, 26). These strains showed almost identical fragment patterns of pulsed-field gel electrophoresis (PFGE) and arbitrarily primed PCR and are referred to as "pandemic clones" (2, 19, 21, 26, 36).
In 2003, we determined the entire genomic sequence of a pandemic V. parahaemolyticus strain, RIMD2210633, with the O3:K6 serotype (16) and identified 4,832 protein-coding genes (ORFs) on the two circular chromosomes (chromosomes 1 and 2) of the strain. A large pathogenicity island (Vp-PAI) was identified on chromosome 2, where one set of type III secretion system (TTSS) genes and two copies of tdh genes encoding almost identical TDHs were located (16). Many genes encoding potential virulence factors were also identified at this chromosomal locus. While genome sequencing can disclose the presence of these potential virulence factors, genomic comparisons between pathogenic and nonpathogenic strains within a species should be useful for identifying virulence determinants crucial to the pathogenicity of this organism. DNA microarrays have successfully been used for comparative genomic analyses of many pathogenic bacteria to gain insights into genomic evolution or diversity in individual bacteria and to identify genes that correlate with diseases. In this study, we constructed a DNA microarray targeting all ORFs of V. parahaemolyticus RIMD2210633 and performed a comparative genomic hybridization (CGH) analysis of various V. parahaemolyticus strains to determine the difference in gene repertoire between pathogenic and environmental (nonpathogenic) strains and between pandemic and nonpandemic strains.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. V. parahaemolyticus strains used in this study
|
. All transformants were cultured and stored as frozen stocks in LB broth containing 20% (wt/vol) glycerol. Eight V. parahaemolyticus ORFs that were not present in RIMD2210633 were prepared by PCR using the following template DNA: genomic DNA of strain AQ3996 (trh1) and strain AQ4729 (trh2, ureR, nicD, and ureC) and pVP-RET1 DNA (ret, orf560, and orf218). Construction of DNA microarray. The microarray contains PCR products derived from all of the ORFs of V. parahaemolyticus strain RIMD2210633 (16) with the exception of VPA1386. Each V. parahaemolyticus ORF was amplified from the frozen stock using M13-Forward and M13-Reverse primers. After checking the quality and size of each of the PCR products by agarose gel electrophoresis, the PCR products were spotted on poly-L-lysine-coated glass slides (Matsunami Glass Ind., Ltd., Osaka, Japan) by using the SPBIO spotter (Hitachi Software Engineering, Tokyo, Japan).
Isolation and labeling of chromosomal DNA. Genomic DNA was prepared from each strain using a DNeasy tissue kit (Qiagen) according to the manufacturer's instructions. Two micrograms of genomic DNA was suspended in 24 µl H2O, combined with 20 µl of the random hexamer solution (supplied by the BioPrime Array CGH Genomic Labeling System [Invitrogen]), heated to 95°C for 5 min, and chilled on ice. Five microliters of 10x deoxynucleoside triphosphate-aminoallyl dUTP mix (5 mM dATP, dCTP, and dGTP, 2 mM dTTP, and 3 mM aminoallyl dUTP) and 1 µl of the Klenow fragment solution were added to the mixture and incubated for 1 h at 37°C. The aminoallyl-labeled DNA was purified by phenol-chloroform extraction and ethanol precipitation. Precipitated DNA was dried and resuspended in a 10-µl solution of 50 mM NaHCO3 (pH 9.0), and 10 µl Cy3 or Cy5 monofunctional dye (GE Healthcare) dissolved in dimethyl sulfoxide was added to the solution. One-tenth of one reaction vial of Cy3 or Cy5 dye supplied by the manufacturer was used per reaction. After incubation for 1 h at room temperature in the dark, unincorporated dye was removed by using a CentriSep spin columns (Princeton Separations, Inc.). Cy3- or Cy5-labeled probes were recovered by means of ethanol precipitation and resuspended in 18 µl of H2O. Equal amounts of probes labeled with Cy3 and Cy5 were mixed, and 3 µl of 10% sodium dodecyl sulfate (SDS), 6 µl of 10 mg/ml yeast tRNA, and 15 µl of 20x SSC (3 M NaCl, 0.3 M trisodium citrate·2H2O, pH 7.0) were added to the mixture. The probe mixture was then heated for 5 min at 96°C, quickly chilled on ice, and centrifuged for 1 min at 14,000 rpm in a microcentrifuge. Each probe mixture was incubated for 5 min at 55°C just before being applied to a microarray slide.
Hybridization and detection of microarray signals. The prewarmed probe mixture was applied to a microarray slide and covered with a MAUI AO lid (BioMicro Systems). After the microarray slide had been sealed, it was placed in the MAUI hybridization chamber and incubated for 16 h at 55°C. After hybridization, the microarray slides were washed twice with the 2x SSC-0.1% SDS solution at 55°C for 10 min, twice with the 0.2x SSC-0.1% SDS solution at room temperature for 10 min, and finally twice with the 0.2x SSC solution at room temperature for another 10 min. Washed microarray slides were dried by centrifugation and scanned with a microarray scanner, the ScanArray Express Lite (Perkin Elmer Life and Analytical Sciences), and the data were processed with ScanArray Express software. For each test strain, three experiments were carried out independently.
Data analysis. The signal intensities of each spot were quantified with the ScanArray Express software, and further data analyses were performed with Microsoft Excel software and the microarray genomic analysis program GACK (http://falkow.stanford.edu/whatwedo/software) (13). After the data with low signal intensities or slide abnormalities were excluded, the data set was analyzed by GACK using cutoff lines, each of which was determined for every microarray experiment.
To determine the presence or absence of each of the spots, log2 values of the test strain divided by those of the reference strain for each spot were analyzed by GACK; spots with 0% were considered "absent" and those with an estimated probability of presence between 0% and 50% "divergent." Since we performed three independent experiments for each strain, the ORFs were categorized as "present," "absent," or "divergent" when they got the same judgments in more than one experiment. When the judgments of three experiments were all different, the ORFs were judged "uncertain." Absence of some variably present genes and regions was verified by PCR and sequencing. When an ORF was absent or divergent in at least more than one test strain, the ORF was judged to be variably present. When absent or divergent ORFs continually appeared and occupied more than 80% of a locus, the locus was designated a variably present gene cluster.
Microarray data accession number. The microarray data have been submitted to the Gene Expression Omnibus (series record number GSE10020). Final processed data are presented in the supplemental material.
|
|
|---|
With the aid of a CGH analysis using the DNA microarray, we compared the genomic composition of 22 strains of V. parahaemolyticus. These test strains were chosen from strains of various serotypes, years of isolation, and sources and consisted of three environmental strains and 19 pathogenic strains (14 KP-positive strains and 5 KP-negative strains) (Table 1). Of the 14 KP-positive strains, 3 were pandemic and the others nonpandemic strains. An overview of the results of CGH analyses is provided in Fig. 1. The final data sets from the CGH analyses are available in the supplemental material. Of 4,832 RIMD2210633 ORFs, a total of 675 (14.0%) were absent or highly divergent in at least more than 1 strain among the 22 strains (referred to as "variably present genes" in this paper; see Materials and Methods). The remaining 4,157 ORFs (86%) were conserved in all of the tested strains and represent the core gene set of V. parahaemolyticus. We analyzed the functions of the 675 variably present genes according to their clusters of orthologous groups (COG) classification (the database of the V. parahaemolyticus genome project [http://genome.gen-info.osaka-u.ac.jp/bacteria/vpara/genome.html]). A large proportion (51.1%) of the variably present genes were not assigned to any COG category, and this ratio was significantly higher than that for the whole genome (25.5%). The abundant functional group was that for "cell envelope biogenesis, outer membrane," followed by that for "DNA replication, recombination, and repair."
![]() View larger version (46K): [in a new window] |
FIG. 1. Genomic composition of V. parahaemolyticus strains. Each row corresponds to a spot on the microarray, and they are arranged according to the chromosomal positions of the RIMD2210633 genes. The numbers on the top correspond to strain numbers in Table 1. The colors indicate the status of the ORFs, as follows: blue, present; pale blue, divergent; yellow, absent; gray, uncertain. Representative variably present regions are indicated on the left of the columns. Asterisks indicate the positions of tRNA flanking the variably present regions. GC percentages of the genome are on the right.
|
|
View this table: [in a new window] |
TABLE 2. Size, position, GC percentage, and content of the large variably present gene cluster identified in this study
|
Characteristics of the variably present gene clusters larger than 10 ORFs are listed in Table 2. Among them were five putative prophage regions (VPC-03, VPC-05, VPC-07, VPC-08, and VPC-10), with the genes in these prophage regions highly divergent or completely absent in many test strains (Fig. 1; Table 2). Such a variation of prophages has also been observed in Escherichia coli O157 strains (24, 25). Other notable characteristics of variably present gene clusters are the type I restriction-modification system (VPC-02), superintegron (VPC-06), insertion sequence (IS), and transposon (VPC-06, VPC-11, and VPC-12). IS and transposons are known as mobile elements which transfer horizontally. Three of the variably present gene clusters, including two prophage regions (VPC-03 and VPC-07), were flanked by the tRNA gene, which suggests that these clusters were integrated into the chromosome by using tRNA loci as target sites (Fig. 1; Table 2) (10). The presence of various mobile elements in the clusters may partly account for the relatively high percentage of genes for the "DNA replication, recombination and repair" COG category among variably present genes.
Most of the variably present genes that were classified in the "cell envelope biogenesis" category were found in VPC-01 (corresponding to VP0187 to VP0238). This region encodes the genes for the biosynthesis of lipopolysaccharides and capsular polysaccharides which serve as major antigens of V. parahaemolyticus. In fact, among the serotype O3:K6 strains with the same serotype as RIMD2210633 (strains 7, 20, and 22; lanes 1 to 3 in Fig. 2), genes in this region were fully conserved, whereas only the genes in the first half (VP0187 to VP0213) were conserved in the two test strains with serotypes O3:K20 and O3:K5 (strains 1 and 19; lanes 4 and 5 in Fig. 2). This suggests that the first and second halves of this region are involved in the biosynthesis of O and K antigens, respectively.
![]() View larger version (74K): [in a new window] |
FIG. 2. Comparison of locus VP0187 to VP0238 in various serotypes of V. parahaemolyticus strains. Serotype and number of strains are shown at the top. Blue, yellow, pale blue, and gray show, respectively, present, absent, divergent, and uncertain genes.
|
We first compared two O3:K6 strains isolated in 1996 and 2001 (strains 20 and 22, respectively) and one O4:K68 strain (strain 21) isolated in 2001. Comparison of the genomic content among these three pandemic strains showed that their genomic organizations closely resemble each other. A clear difference between these strains was seen only in the region specifying O and K antigens. In addition, the region encoding VP1884 to VP1891 was absent in the two pandemic strains isolated in 2001 (strains 20 and 21). These results agree with the hypothesis based on the findings of PFGE (2, 5) and arbitrarily primed PCR analysis (19, 26) that the O4:K68 strain originated from the pandemic O3:K6 clone.
We next examined the pandemic and nonpandemic strains for genetic differences and identified 65 pandemic-strain-specific genes that are located in 11 chromosomal regions (see the supplemental material). Many of these genes were hypothetical protein genes but were located in variably present gene clusters, such as type I restriction-modification system, phage, super-integron, or IS element. This suggests that the pandemic strains emerged not via a single genetic event or mutation from the nonpandemic strains but via multiple genetic events, including insertion of several large gene clusters.
In some previous studies, pandemic and nonpandemic strains of the same serotype (O3:K6) were compared by means of the DNA subtraction method, resulting in the identification of several pandemic-strain-specific regions (28; M. Nishibuchi et al., personal communication). Our microarray analysis confirmed their results but also indicated that more pandemic-strain-specific regions are present in pandemic strains.
Hurley et al. found by using PCR analysis that four of the GIs (VPaI-1, -4, -5, and -6) were present specifically in the pandemic group of V. parahaemolyticus isolates (10). Genes on the four pandemic-group-specific GIs (corresponding to VPC-02, -07, -08, and -11) showed a rather similar distribution to that among the isolates used in our analysis (Fig. 1) (see the supplemental material) because most genes within these regions were present only in pandemic group isolates. In addition, VP1071 to VP1085 on VPC-03 (corresponding to VPaI-3 in the Hurley study) were also pandemic group specific in our analysis.
In many pathogenic bacteria, proteins encoded on prophage regions are known to affect cellular processes of the host cells (17). It is thus possible that some of the pandemic-strain-specific ORFs in our study may have added to the organism some novel phenotype which is related to pandemic phenotypes. BlastP analysis of the pandemic-strain-specific hypothetical genes showed that several ORFs were homologous to nuclease, acetyltransferase, and RTX toxin but most showed no significant homology to proteins with known functions. A systematic functional analysis of the pandemic-strain-specific ORFs may be required to identify the genes that are crucial for the emergence of pandemic clones.
Genomic comparison between pathogenic and nonpathogenic strains of V. parahaemolyticus. To determine what genes are responsible for the virulence of V. parahaemolyticus for humans, we compared the genomic content of three environmental (nonpathogenic) isolates (strains 1 to 3) and 19 clinical isolates (strains 4 to 22) to identify the genes or DNA regions present specifically in pathogenic strains. We could not find any genes that were common to all the clinical isolates but not to the environmental ones. However, by comparing the KP-positive strains (strains 9 to 22) with KP-negative clinical (strains 4 to 8) and environmental (strains 1 to 3) strains, we found that only VPC-12 (from VPA1310 to VPA1396) was exclusively conserved in KP-positive pathogenic strains and not in KP-negative strains (Fig. 1 and 3) (see the supplemental material). This cluster was previously described as the "pathogenicity island (Vp-PAI)" (16). We could not find any other genes that were unique and common to all of the KP-positive clinical strains, thus suggesting that VPC-12 has a strong correlation with pathogenicity in KP-positive V. parahaemolyticus.
![]() View larger version (75K): [in a new window] |
FIG. 3. Comparison of VPC-12 genes (VPA1310 to VPA1396) among pathogenic (both KP-negative and -positive) and nonpathogenic V. parahaemolyticus strains. Features of each strain are shown at the top. Blue, yellow, pale blue, and gray show, respectively, present, absent, divergent, and uncertain genes. Representative gene functions are indicated on the right.
|
Although this Vp-PAI of ca. 80 kb had previously been predicted on chromosome 2 of strain RIMD2210633 because of its low G+C content (39.8%) compared with the average G+C content of the genome (45.4%) (10, 16), the genome sequence data were not sufficient to define the boundary of this region. Our genomic comparison between KP-positive clinical strains and environmental ones more precisely defined the boundaries of the Vp-PAI (VPA1310 to VPA1396) (see the supplemental material). Investigation of the sequences around the boundaries may provide clues as to how the KP-positive clinical isolates of V. parahaemolyticus acquired the Vp-PAI.
The Vp-PAI was not identified in the KP-negative clinical strains (strains 4 to 8) (Fig. 1) (see the supplemental material). Many of the KP-negative clinical isolates are known to possess the TDH-related hemolysin (trh) and urease operon (ure) genes in close proximity to each other (11, 12, 29), and the KP-negative clinical isolates used in this study also possessed the trh and ure genes (see the supplemental material). Since the presence of a pathogenicity island-like structure has been predicted on the genome of KP-negative strains (12), investigation of the regions around the trh and ure genes may lead to the identification of additional virulence factors or genomic island(s) unique to the KP-negative clinical isolates of V. parahaemolyticus.
Comparative genomic analysis by microarray is a powerful tool to identify the characteristic genes or chromosomal regions within a species. In the study presented here, we used this type of analysis for V. parahaemolyticus and were able to identify the genes and genomic regions that are specifically present in pathogenic or pandemic strains. In particular, it was demonstrated that VPC-12 on chromosome 2 has a strong correlation with the pathogenicity of KP-positive V. parahaemolyticus. However, how the genes within VPC-12 contribute to the pathogenesis of KP-positive V. parahaemolyticus is not yet fully understood. Expression profiling analysis using microarray and functional analysis of putative virulence genes and hypothetical genes within VPC-12 can be expected to lead to a more detailed understanding of the mechanism of pathogenicity by V. parahaemolyticus.
We are grateful to T. Tobe, H. Abe and Y. Ogura for their valuable advice, D. Okuzaki at the DNA-chip Development Center for Infectious Diseases RIMD, Osaka University, for technical support in analyzing the microarray data, and N. Hinomizu for technical assistance with preparing the microarray.
Published ahead of print on 14 January 2008. ![]()
Supplemental material for this article may be found at http://iai.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»