ABSTRACT
The type b capsule of pathogenic Haemophilus influenzaeis a critical factor for H. influenzae survival in the blood and the establishment of invasive infections. Other pathogenic factors associated with type b strains may also play a role in invasion and sustained bacteremia, leading to the seeding of deep tissues. The gene encoding haemocin is the only noncapsular gene found to be specific to type b strains until now. Here we report the discovery of an approximately 16-kb genetic locus, HiGI1, that is present primarily in type b strains. Pulsed-field gel electrophoresis and Southern hybridization were used to map this new locus between secG(HI0445) and fruA (HI0446), which are contiguous in Rd, a nonpathogenic derivative of a serotype d strain. It is inserted at the 3′ end of tRNA4Leu and has regions whose G+C content differs from the average genomic G+C content of H. influenzae. An integrase gene, which encodes a CP4-57 like integrase, is located downstream of tRNA4Leu. Hybridization probes based on the sequences within the HiGI1 locus have been used to screen 61 H. influenzae strains (2 type a, 22 type b, 2 type c, 1 type d, 3 type e, 7 type f, and 21 nontypeableH. influenzae [NTHi]) from our collection. This HiGI1 locus exists in all 22 type b strains and two NTHi strains and is likely to have been acquired by an ancestral type b strain.
Haemophilus influenzaecauses a variety of human infections. Type b capsule, LOS, and pili have been shown to play important role in pathogenesis (13, 29). Encapsulated H. influenzae type b (Hib) strains cause invasive infections, including meningitis and septicemia, in infants and children, while H. influenzae of other capsule types (a, c, d, e, and f) rarely cause invasive infections. Encapsulated strains only occasionally colonize the upper respiratory tract, whereas nontypeable H. influenzae (NTHi) strains often colonize the respiratory tract and can cause a variety of respiratory infections, such as otitis media, sinusitis, bronchitis, and conjunctivitis (31).
The entire genomic DNA sequence of H. influenzae strain Rd, a nonencapsulated, nonpathogenic derivative of a serotype d strain, became available in 1995 (11, 45). The Rd genome is estimated to be 270 kb smaller than that of virulent type b strain Eagan (6). Capsule type b (24), pili (hif) (44), tryptophanase (tna) (27), and haemocin (hmc) (30) genes are present in Hib strains and are not found in Rd. The cap b,hif, and tna loci are each flanked by direct repeats. The cap b gene cluster, containing a duplication of two ∼18-kb segments, lies between direct repeats of IS1016(21, 24). In each ∼18-kb segment, there is a central serotype-specific region II which has a substantially lower G+C content, 32% (43). The hif gene cluster is inserted between pepN (HI1614) and purE (HI1615). This cluster has a G+C ratio of 39%, typical of H. influenzae. Analysis of the regions flanking the pilus gene cluster of type b strain reveals a duplication of the 57-bppur regulatory region (44). The tryptophan (tna) genes are situated between nlpD (HI0706) and mutS (HI0707), are found at the same map location of all indole-positive strains, and are missing from Rd, type d, and type e genomes. Most interestingly, this locus is flanked by 43-bp direct repeats of paired Haemophilus uptake signal sequences (USSs) (27). The hmc locus produces haemocin, a protein toxic to all non-Hib strains and one which appears to play a role in the onset of invasive type b disease in the infant rat model (26, 30).
Many bacterial pathogens contain virulence genes located on pathogenicity islands, which may be derived from integrated bacteriophages that are associated with tRNA or single-stranded RNA genes or, alternatively, might arise from the insertion sequence-mediated gene transfer (8, 16). Tizard et al. have proposed that loci similar to pathogenicity islands that either do not contain virulence genes or have not yet been shown to contain virulence genes be called genetic islands (42). Genetic islands may represent a class of genetic elements whose acquisition contribute to microbial evolution (40).
From a search for other potential virulence genes that might contribute to the ability of Hib strains to cause invasive diseases, we report here a ∼16-kb locus in strain Eagan that appears to be found primarily in type b strains. It is situated between secG(HI0445) and fruA (HI0446), is adjacent to the tRNA4Leu gene, is flanked by 23-bp direct repeats (DR1), has regions different in G+C content from the rest of the genome, and contains a phage-related integrase gene, suggesting it could be of bacteriophage origin. We propose to call this locus HiGI1 (for H. influenzae genetic island 1).
MATERIALS AND METHODS
Bacterial strains.A total of 61 H. influenzaestrains were used in this study: 22 Hib, 2 type a, 2 type c, 1 type d, 3 type e, 7 type f, and 24 NTHi. The majority of these isolates have been previously characterized (9, 15). Seventeen strains (one type a, nine type b, four type f, and three NTHi) were isolated from cerebrospinal fluid or blood. Hib strain Eagan, the source for the mapping, cloning, and sequencing procedures, was originally isolated from a child with meningitis (12). In contrast to the majority of Hib strains, which belong to multilocus enzyme phylogenetic division I, strain R9 (otherwise known as Rab) belongs to multilocus enzyme phylogenetic division II (32, 33). Strains AAr64 (14) and AAr117 (9) have lost type b capsules. NTHi strains 315-3 and 316-4 were isolated from blood of patients with immunodeficiency disease. H. influenzaebiogroup aegyptius strain ATCC 49252 was isolated from blood of a Brazilian purpuric fever patient. Strain ATCC 11116 is a type strain of biogroup aegyptius (4). H. influenzaestrains were grown on brain heart infusion plates, solidified with 1.2% agar and supplemented with 10% Levinthal base (28) and NAD (2 μg/ml), in a 35°C CO2 incubator.
The host Escherichia coli strains used in the cloning experiments were INVaF [F endA1 recA1 hsdR17(rK− mK+) supE44 thi-1 gyrA96 relA1 φ80lacZΔM15 Δ(lacZYA-argF)U169 λ−], TOP10 [F′mcrA Δ(mrr-hsdRMS-mcrBC) φ80lacZΔM15 ΔlacZX74 recA1 deoR araD139Δ(ara-leu)7697 galU galK rpsL(Strr) endA1 nupG], both obtained from Invitrogen, Carlsbad, Calif., and DH5α [F-f80d lacZΔM15endA1 recA1 hsdR17 (rK−mK+) supE44 thi-1 λ−gyrA96 Δ(lacZYA-argF)U169].
Pulsed-field gel electrophoresis (PFGE).The protocol for preparation of Haemophilus genomic DNA in InCert agarose plugs was adapted from the manufacturer (FMC BioProducts, Rockland, Maine). After digestion with restriction enzymes, DNA fragments were resolved by contour-clamped homogeneous electric field (CHEF) electrophoresis using a CHEF-DRIII apparatus (Bio-Rad Laboratories, Richmond, Calif.) with an electric field of 6 V cm−1 and an angle of 120°. DNA fragment migration was performed in 1% SeaKem HGT agarose (FMC) and in 0.5× Tris-borate-EDTA buffer at 14°C. Pulsed time was ramped from 1 to 15 s or from 10 to 30 s over 8 to 20 h, according to the size of DNA fragment to be resolved.
DNA techniques.Isolation of genomic DNA was performed using a Wizard genomic DNA purification kit (Promega, Madison, Wis.). Plasmid extractions were carried out as specified for the Wizard Plus minipreps DNA purification system (Promega). The DNA preparations were quantitated on ethidium bromide-stained gels by applying GibcoBRL DNA quantitation standards (Life Technologies, Gaithersburg, Md.).
PCR.Low annealing temperature (40°C) and a relatively high concentration (up to 1 mg/100 μl) were used with the degenerate PCR primers, LVIED and GADDY, which were based on two regions of conserved amino acids shared by response regulators from a variety of bacteria (39) (Table 1). A 100-μl reaction mixture consisted of 10 μl of 10× reaction buffer, 4 μl of MgCl2 (2 mM, final concentration), 2 μl of dimethyl sulfoxide, 10 μl of deoxynucleoside triphosphate (dNTP) mix (2 mM), 1 μl of forward primer (50 μM), 1 μl of reverse primer (50 μM), 0.5 μl of Taq DNA polymerase (5 U/ml), 10 μl of genomic DNA template (>100 μg per reaction), and 61.5 μl of H2O (7, 46). Long PCR amplification was performed by one of the two following methods. The first long PCR usedTaq DNA polymerase (Promega). A 50-μl reaction mixture consisted of 5 μl of reaction buffer, 3 μl of MgCl2 (25 mM), 2 μl of dNTP mix (10 mM), 1 μl of forward primer (20 mM), 1 μl of primer (20 mM), 1 μl of Taq DNA polymerase (5 U/ml), 1 μl of genomic DNA template (>200 ng), and 36 μl of H2O. One cycle of preamplification DNA denature at 94°C for 30 s, 25 cycles of denaturing at 94°C for 10 s, annealing at 55°C for 1 min, and extension at 72°C for 5 min, and one cycle of final extension at 72°C for 10 min were done on thermal cycler (MJ Research, Watertown, Mass.). Long PCR products (5 kb) were cloned into the original pCR2.1 vector (Invitrogen). The second long PCR amplification was performed using the TaqPlus Long polymerase mixture (Stratagene, La Jolla, Calif.). A 50-μl reaction mixture consisted of 5 μl of 10× TaqPlus Long low-salt buffer, 2 μl of dNTP mix (10 mM), 1 μl of forward primer (20 mM), 1 μl of backward primer (20 mM), 1 μl of TaqPlus Long polymerase mixture (5 U/ml), 1 μl of genomic DNA template (>200 ng) and 39 μl of H2O. One cycle of preamplification denaturing at 94°C for 30 s, 25 cycles of denaturing at 94°C for 10 s, annealing at 55°C for 1 min, and extension at 72°C for 10 min, and one cycle of final extension at 72°C for 15 min were done on a thermal cycler (MJ Research). Long PCR products were visualized and purified by agarose gel electrophoresis using crystal violet (35). Gel-purified long PCR products (11 kb) were cloned into pCR-XL-TOPO vector (Invitrogen). Subsequently, a 5-kbClaI fragment, FC5, was subcloned into theClaI-linearized pGEM7 vector.
Oligonucleotide primer sequences used for PCR
Nucleotide sequencing and analysis.DNA sequencing of clone F2 was carried out by the dideoxy-chain terminating method (37) with a Sequenase 2.0 sequencing kit (U.S. Biochemical, Cleveland, Ohio) in conjunction with 35S (Sigma Chemical. St. Louis, Mo.). Double-stranded DNAs from three other overlapping clones (F5E, FC5, and F10G) were subjected to automated sequencing run by the DNA Sequencing Core, University of Michigan, Ann Arbor, with reagents from a dye terminator kit (Applied Biosystems). MacVector sequencing analysis software (version 5.0; Oxford Molecular Group) was used to analyze the DNA sequencing for identification of open reading frames, restriction sites, base composition, and codon frequency. The nucleotide sequences were used in searches against GenBank, EMBL, DDBJ, and PDB databases. The predicted amino acid sequences of each open reading frame were used in searches against the GenBank CDS translation, PDB, SwissProt, PIR, and PRF databases, using the BLAST2.0 program (1). The codon letter G+C content of genes in each region was calculated and compared with those of H. influenzae Rd, accessed from CUTG (codon usage tabulated from GenBank) database at the Kazusa DNA Research Institute web site (www.kazusa.or.jp ) (34).
DNA probes and hybridization.Probes for mapping and probes (I, II, and III) for screening were PCR amplified. Probe IVa was aKpnI/SacI-digested fragment from clone F2. Probe IVb was a HincII-digested fragment from clone F5E. DNA was labeled with digoxigenin-11-dUTP by the random-primed method as specified by the according manufacturer (Boehringer Mannheim, Indianapolis, Ind.). Restriction digested DNAs were electrophoresed in SeaKem HGT agarose gels and then transferred to a positively charged nylon membrane (Boehringer Mannheim). For analysis of distribution of HiGI1 among different strains, bacterial genomic DNA was denatured by adding dot blot-denaturing solution (4 M NaOH, 100 mM EDTA) and was pipetted onto a positively charged nylon membrane (Boehringer Mannheim). Hybridizations were performed under stringent conditions: at 65°C in 5× SSC (1× SSC is 0.15 M NaCl plus 0.015 M sodium citrate)–1.0% (wt/vol) blocking reagent–0.1%N-lauroylsarcosine–0.02% sodium dodecyl sulfate, and the membranes were washed at 65°C in 0.5× SSC containing 0.1% sodium dodecyl sulfate.
Oligonucleotide sequences.Oligonucleotide sequences for degenerate PCR primers, long PCR primers, and probe preparations are listed in Table 1.
Nucleotide sequence accession number.The nucleotide sequence of HiGI1 has been submitted to GenBank and assigned accession no. AF198256 .
RESULTS
Discovery of a locus present in Eagan and other Hib isolates.A pair of degenerate PCR primers, LVIED and GADDY, were used to identify potential response regulators of two-component regulatory systems in H. influenzae strain Eagan (7). One 300-bp PCR fragment, f-1, was found in strain Eagan but not in the published sequences of Rd (11). In a preliminary screen, hybridization analysis using f-1 as a probe indicated that f-1 sequences were present uniformly in 21 tested Hib isolates but not inH. influenzae with other capsular types (one type a, one type c, one type d, two type e, and seven type f) or in 15 NTHi strains (7). A larger clone, F2, a 1,837-bp TaqI-digested fragment, was isolated from strain Eagan using the f-1 probe, and its entire sequence was also absent in Rd. We then proceeded to map this region on strain Eagan and delineate its complete size and sequence.
Location and boundary sequence analysis of unique type b locus.PFGE was used in combination with Southern hybridization to position the F2 region onto an existing large-scale restriction map of strain Eagan for the enzymes EagI, NaeI,RsrI, and SmaI (6). This location was further refined by using PCR probes based on Rd sequences, looking for DNA fragments that hybridized to a given PCR probe and to probe IVa (Table 1; Fig. 1). This dual approach localized the strain Eagan-specific DNA to a region located between secG (HI0445, protein translocation protein) and fruA (HI0446, fructose-permease IIBC component), which are contiguous in the Rd genome (11). Using sequences within clone F2 and the flanking known genes, long PCR was used to isolate clones containing the rest of the region (Materials and Methods). The sequence of the region was determined, and two maps of the region are shown in Fig. 1. For reasons described below, we have decided to name this region H. influenzae genetic island 1 (HiGI1). The left boundary of HiGI1 is the end of the tRNA4Leu gene. The HiGI1 sequence is flanked by direct repeats. The left-most member of the first direct repeat (DR1L) is 23 bp in length (5′-ttcaagtctcgcccagagcacca-3′) and is almost completely contained within the 3′ end of tRNA4Leu gene. The right-most member of the first direct repeat (DR1R) is 22 bp in length (5′-ttcacttctcgcccag_gcacca-3′), and 20 of 23 bases are identical between DR1L and DR1R (differences are underlined). DR2L starts 162 bp on the right of DR1L, is 22 bp in length, and is perfect match to DR2R (DR2, 5′-ttagtaaccaaaatagtaacca-3′). Each repeat consists of two 9-bp identical units (underlined). DR2R is located 49 bp to the left of DR1R. Of these strain Eagan sequences, only DR1L is retained in Rd. A stretch of six 10-bp short direct repeats (5′-gtctttaatt-3′) also can be found between DR1L and DR2L. In strain Eagan, inverted repeat 1 is located downstream of tRNA4Leu coding sequences (Fig.2).
(A) Genetic organization of the HiGI1 locus and positions of clones. Open reading frames homologous to genes found in a variety of phage are indicated by diagonal lines in boxes. Dashed vertical lines, USS; < and >, long PCR primers; *orf14, containing the 300-bp f-1 sequences. (B) Partial restriction map. Only the restriction enzymes used to position the clone F2 betweensecG and fruA are shown.
Nucleotide sequences at the boundaries of the HiGI1 locus. Both ends of HiGI1 contain a single copy of DR1 (boldface). (DR1L on the left boundary and DR1R on the right boundary). Rd lacks the whole 15,660-bp sequence of HiGI1 but retains one copy of the 23-bp sequence DR1L. Another pair of repeats, DR2L and DR2R, are underlined. SDR, a stretch of six 10-bp short direct repeats; IR, inverted repeats.
Identification of genes in this locus.The locus consists of 18 open reading frames between two direct repeats (Fig. 1). Just downstream of the tRNA4Leu coding region,orf1 shows a high degree of amino acid sequence similarity (52%) to E. coli prophage CP4-57 integrase, SlpA. The predicted amino acid sequence of the next open reading frame (orf2) shares significant similarity (score; 291; similarity, 53%) with phage phi-R73 primase. The predicted amino acid sequence of orf1 1 are homologous to phage D3 terminase. The predicted amino acid sequences of orf13 and orf16 are similar to those of phage phi-105 holin and ORF25, respectively. The predicted amino acid sequences of the last two open reading frames,orf17 and orf18, show low-level (BLAST score, <80) similarity to gp35 and gp36 of Streptomyces temperate phage phi-C31, respectively (Table 2). This is consistent with the recent conclusion about the evolutionary relationships among prophages that all double-stranded DNA phage genomes are mosaics in nature and capable of horizontal exchange (20).
The predicted amino acid sequences of the remaining 11 open reading frames, however, show very low level (BLAST score, <50) of similarities to genes from diverse origins, such as Mycobacterium tuberculosis and Plasmodium falciparum (Table2).
Summary of BLAST search and G+C content for each open reading frame
Base composition of DNA and codon letter G+C content.The average G+C content of HiGI1 is approximately 41%, slightly higher than the genomewide average of approximately 38%, but the distribution of G+C is uneven. The base composition of a region that contains prophage CP4-57 integrase homologue and phi-R73 primase homologue, designated region I, is 36.3% G+C. The G+C content of region II, which contains orf3 to orf7, 7, is 41.6%. Region III, containing orf8 to orf10, has a low G+C content (31.2%). An 8-kb fragment designated region IV, which contains several phage-related genes and others, has a 45.4% G+C content (Fig. 3A; Table 2). The bias for A- or T-ending codons in the four different G+C regions reflects its base composition using G- or C-ending codons: 29.0, 40.1, 24.9, and 46.4%, respectively (Table 3).
(A) G+C content of each region of HiGI1 and region covered in each probe. (B) Hybridization of HiGI1 locus DNA probes to chromosomal DNA of various H. influenzae strains. The presence or absence of hybridization is indicated by + or −, respectively. ∗, cerebrospinal fluid or blood isolate.
Codon letter G+C content
Distribution of the HiGI1 among H. influenzaestrains.Five probes, corresponding to regions of HiGI1 differing in G+C content, were used to determine whether homologous sequences were present in the genomic DNA preparations of 61 H. influenzae strains from our collection. Using probes I, II, III, IVa and IVb, hybridizations occurred in all 22 type b strains and 2 NTHi strains. Among Hib strains are 9 strains isolated from patients with invasive diseases and 13 isolates from the upper respiratory tract, including 2 strains that have lost expression of type b capsules. Thus, this genetic island not only associates with Hib strains causing invasive diseases but also exists in Hib strains isolated from the upper respiratory tract. It is noteworthy, however, that two NTHi strains, AAr176 and Mr31, also hybridized to probe II. The same hybridization analysis also indicated that the entire locus is absent from 2 type a, 2 type c, 1 type d, 3 type e, 7 type f, and 10 other NTHi strains, including the invasive nontypeable strains 315-3 and 316-4 (Fig. 3B).
USS.Analysis of sequences between two direct repeats (DR1) reveals nine USS sites (Fig. 1), four in the plus orientation (5′-AAGTGCGGT) and five in the minus orientation (5′-ACCGCACTT). None of the sites are in inverted repeat pairs. The mean distance between sites was 1,050 bp, with the range of 333 to 2833 bp. This is comparable to the genomewide mean distance between sites of 1,248 bp, with a range of 50 bp to 8 kb (38). Three USS sites fall into region II and four are located in region III, with the remaining two in region IV. All nine USS sites are located in open reading frames. In Rd, only 65% of 1,465 copies of USS sites are found in open reading frames, while about 86% of the genome is coding sequence (38).
DISCUSSION
We have characterized a ∼16-kb locus from H. influenzae strain Eagan that appears primarily in type b strains. This locus has several features characteristic of a genetic island (16). It contains 18 open reading frames differing in G+C content from the average H. influenzae genome (Table 2), is adjacent to tRNA4Leu gene, is bracketed by two 23-bp and two 22-bp direct repeats, and possesses a prophage CP4-57 integrase homologue. Genetic islands are thought to arise when a large region of foreign DNA is inserted into a bacterial genome. We are thus naming this locus H. influenzae genetic island 1 (HiGI1). While the HiGI1 locus differs in G+C content from other H. influenzaeregions, it does contain nine H. influenzae uptake sequences (38).
tRNA loci are often targets for the integration of bacteriophage and pathogenicity islands into the chromosomes of various bacterial species, such as Pseudomonas aeruginosa (19),Vibrio cholerae (22), Yersinia pseudotuberculosis (5), and E. coli(3). In H. influenzae, leucine tRNA loci seem to be the most favored sites for phage integration. Two cryptic prophages, Mu-like phage (11) and φflu (20), are found in Rd. There are no clear boundary sequences around the proposed Mu-like phage, while φflu is found to integrate into tRNA4Leu. The temperate phage Hp1c1 (17) is capable of integrating into tRNA4Leu. However, Mu-like phage, φflu, and phage HP1c1 are not known to play any role in virulence (18). HiGI1 is located at the 3′ end of tRNA4Leu gene, a rare tRNA gene as opposed to more abundant tRNA1 and tRNA2. This rare tRNA4Leu gene might also act as a regulator for genes that frequently use this leucine-specific codon. In uropathogenic E. coli strain 536, pathogenicity island II was found to be inserted into theleuX locus, which encoded the rare tRNA4Leu, and deleted at a frequency of 10−3 to 10−4per cell per generation (23). The deletion event also distorted the leuX locus and was shown to affect the expression of several virulence properties, such as type 1 fimbriae, flagella, serum resistance (36), and uropathogenesis (41).
There are two sets of direct repeats (DR1 and DR2) in the flanking regions of HiGI1 locus. The first set of repeats, DR1L and DR1R, were probably created during HiGI1 integration into tRNA4Leu. The direct repeats that flank the genetic islands play important role in their integration or excision (16). The excision of pathogenicity islands I and II from uropathogenic E. coli 536 occurs due to recombination within repeating sequences within tRNA coding sequences (3). However, we do not know whether the excision of HiGI1 can occur. The second set of direct repeats are internal to the HiGI1 locus. The role (if any) and origin of these repeats are not known, nor is it known if they facilitate rearrangement or deletion of genetic elements in HiGI1 locus. Whether this locus has gone through rearrangement or deletion in different strains, particularly in the three NTHi strains that possess only region II of HiGI1 locus (Fig. 3), remains to be explored.
HiGI1 is present in all Hib strains and two NTHi strains in our collection. Two possible mechanisms might have contributed to the distribution of HiGI1 in type b strains. HiGI1 could have been of bacteriophage origin, “HiGI1φ,” which might have played important role in the distribution of HiGi1 within H. influenzae. However, we do not know whether this putative HiGI1φ was transferable between different strains. A more likely scenario is that a type b ancestral strain acquired the HiGi1 locus before it diverged into different type b strains. As for the HiGI1-possessing nontypeable strains, they might have acquired HiGI1, or part of it, by horizontal uptake of DNA and homologous recombination, because HiGi1 evolved to contain several H. influenzae USS sites. The HiGI1 locus could also have been acquired by an ancestral nontypeable strain, and subsequent recombination between two direct repeats resulted in the loss of all or part of the HiGI1 locus from most nontypeable strains.
The G+C contents of region I (36.3%) and region II (41.6%) do not differ very much from the genome average (38%). This indicates that they might have been acquired from species with G+C content similar to that of H. influenzae or that the base composition of such acquired DNA has gradually adapted to the host genome over time (25). However, the G+C contents of region III (31.2%) and region IV (45.4%) vary substantially from the genomewide average. In A+T-rich H. influenzae, the average G+C content of the third codon letter is only 29.1% in 1,709 genes of Rd (34). In G+C-rich M. tuberculosis (∼65% G+C), there is a strong bias toward G- or C-ending codons for every amino acid; the G+C content at the third position of codons is 83% (2). The G+C usage in the third codon position of regions III and IV (24.9 and 46.4%, respectively) show strong bias toward each region's G+C content. These observations support the evidence discussed above that the HiGI1 locus might have been acquired by phage-mediated gene transfer; furthermore, the original element transferred in might have been composed of at least four different elements, from different sources.
In Rd, a cryptic Mu-like phage (11) with relatively high G+C content (∼50%), is located in the interval from 1.56 to 1.59 Mb on the genome. Two regions of 14,441 and 8,239 bp in this area contain no USS site. However, there are USSs in cryptic prophage φflu (11) and phage HP1c1 (10). The distribution of USSs in the Rd genome is not entirely random and is overrepresented in the intergenic regions. Most USS sequences in the H. influenzae genome appear as inverted-repeat pairs just beyond the 3′ ends of genes (38). In contrast, the USS sequences in HiGI1 are single and are found within coding regions. So far, the only similarity between the newly identified HiGI1 locus and the rest of genome is that they all contain USS sites.
Our results demonstrate that the HiGI1 locus might have resulted from a phage-mediated transfer, as evidenced by its being flanked by the tRNA4Leu gene and harboring a prophage CP4-57 integrase gene homologue just downstream of tRNA gene. The G+C content and codon usage of HiGI1 are different from the rest of host genome. To date, there is no experimental evidence to indicate that HiGi1 is a pathogenicity island. It is, however, conserved in Hib strains, which are responsible for most invasive diseases, and is absent from the majority of other strains studied. These facts raise the potential that it might be a virulence-associated region. As we continue our studies on the HiGI1 locus, we will dissect its structure among differentH. influenzae strains and evaluate its possible role in the virulence of pathogenic strains.
ACKNOWLEDGMENT
This work was supported in part by Public Health Service grant RO1 AI25630 from the National Institute of Allergy and Infectious Diseases to J.R.G.
Notes
Editor: J. T. Barbieri
FOOTNOTES
- Received 2 December 1999.
- Returned for modification 14 January 2000.
- Accepted 7 February 2000.
- Copyright © 2000 American Society for Microbiology