Previous Article | Next Article 
Infection and Immunity, May 2001, p. 3271-3285, Vol. 69, No. 5
0019-9567/01/$04.00+0 DOI: 10.1128/IAI.69.5.3271-3285.2001
Copyright © 2001, American Society for Microbiology. All rights reserved.
Complete DNA Sequence and Analysis of the Large
Virulence Plasmid of Shigella flexneri
Malabi M.
Venkatesan,1,*
Marcia B.
Goldberg,2,*
Debra J.
Rose,3
Erik J.
Grotbeck,3
Valerie
Burland,3 and
Frederick R.
Blattner3
Department of Enteric Infections, Division of
Communicable Diseases and Immunology, Walter Reed Army Institute of
Research, Silver Spring, Maryland 209101;
Infectious Disease Division, Massachusetts General Hospital,
Boston, Massachuetts 021142; and
Laboratory of Genetics, University of Wisconsin, Madison,
Wisconsin 537063
Received 19 September 2000/Returned for modification 22 November
2000/Accepted 31 January 2001
 |
ABSTRACT |
The complete sequence analysis of the 210-kb Shigella
flexneri 5a virulence plasmid was determined.
Shigella spp. cause dysentery and diarrhea by invasion and
spread through the colonic mucosa. Most of the known
Shigella virulence determinants are encoded on a large
plasmid that is unique to virulent strains of Shigella and
enteroinvasive Escherichia coli; these known genes account for approximately 30 to 35% of the virulence plasmid. In the complete sequence of the virulence plasmid, 286 open reading frames (ORFs) were
identified. An astonishing 153 (53%) of these were related to known
and putative insertion sequence (IS) elements; no known bacterial
plasmid has previously been described with such a high proportion of IS
elements. Four new IS elements were identified. Fifty putative proteins
show no significant homology to proteins of known function; of these,
18 have a G+C content of less than 40%, typical of known virulence
genes on the plasmid. These 18 constitute potentially unknown virulence
genes. Two alleles of shet2 and five alleles of
ipaH were also identified on the plasmid. Thus, the plasmid
sequence suggests a remarkable history of IS-mediated acquisition of
DNA across bacterial species. The complete sequence will permit
targeted characterization of potential new Shigella virulence determinants.
 |
INTRODUCTION |
Shigella spp. continue to
be a major health problem worldwide, causing an estimated 1 million
deaths and 163 million cases of dysentery annually (29),
predominantly in children younger than 5 years of age in developing
countries. Shigella spp. cause bacillary dysentery in humans
by invading and replicating in epithelial cells of the colon, causing
an intense inflammatory reaction, characterized by abscess formation
and ulceration, which damages the colonic epithelium.
Shigella entry into susceptible host cells requires the
coordinated expression of numerous genes that are activated in response to environmental cues (29, 68). Upon contact with host
cells, the bacteria secrete factors (IpaB, IpaC, and IpaA) that induce large membrane ruffles on the host cell surface associated with cytoskeletal rearrangements that lead to internalization of bacteria into the host cell. Once internalized, Shigella spp. release
factors that cause lysis of the phagocytic vacuole, thereby releasing the bacteria into the cell cytoplasm, where the bacterial surface protein VirG (IcsA) assembles actin tails that propel the bacteria through the cell cytoplasm and into adjacent cells. Additional bacterial factors lead to release of proinflammatory cytokines, osmotic
leak of the mucosal epithelium (ShET1 and ShET2), and, in macrophages,
induction of cell death (IpaB) (20, 38, 75).
Most work on the molecular pathogenesis of Shigella has been
carried out in S. flexneri serotypes 2a and 5a. The entire
complement of genes critical for invasion of epithelial cells is
contained on a large 220-kb plasmid, termed the virulence plasmid or
the invasion plasmid, which is present in all pathogenic strains
(68). Known to be located on the virulence plasmid is a
locus of genes (ipa-mxi-spa) that encode proteins involved
in invasion of mammalian cells and which has homologs in
Salmonella, Yersinia, enteropathogenic Escherichia
coli, the plant pathogens Ralstonia salanacearum and Xanthomonas campestris, and the flagellar assembly loci of
Salmonella enterica serovar Typhimurium (21).
In addition to those mentioned above, several other virulence plasmid
proteins have been previously characterized (68).
Together, the known genes account for approximately 30 to 35% of the
virulence plasmid. Sequence analysis of the entire virulence plasmid
was undertaken to permit identification of all plasmid-based proteins,
including potential new determinants of virulence, to facilitate the
understanding of evolutionary assembly of the plasmid and mechanisms
that generate spontaneous deletions which lead to strain avirulence,
and to permit comparisons of virulence plasmids of different
Shigella serotypes and strains and of different bacteria that vary in virulence properties. Herein, initial analysis of the
virulence plasmid from S. flexneri 5a is presented.
 |
MATERIALS AND METHODS |
Bacterial strains and plasmids.
The virulence plasmid
pWR501, which was sequenced in this study, had been previously derived
from the native virulence plasmid pWR100 (39). In brief,
S. flexneri wild-type serotype 5a strain M90T, harboring
pWR100, was crossed with E. coli MC1061 carrying pMT999, a
Tn501-labeled, self-transmissible, temperature-sensitive plasmid. The cross resulted in M90T carrying a cointegrate of pWR100
and pMT999, which was subsequently mobilized into E. coli 395-1 by conjugation. Following passage of this strain at 42°C with
appropriate antibiotic selection, a pWR100 derivative from which pMT999
had been lost and in which the Tn501 integration was
retained was isolated and designated pWR501 (39). This
E. coli strain, which lacks the two smaller
Shigella plasmids, was the source for the isolation of
virulence plasmid DNA. Plasmid DNA was isolated using previously
published methods (70) and was subjected to two cycles of
CsCl banding and purification.
Library construction and sequencing.
Plasmid DNA was sheared
by nebulization and size fractionated by agarose gel electrophoresis
(31) to obtain DNA fragments in the range from 0.7 to 2.0 kb. Purified, end-repaired fragments were subcloned into M13Janus
(7), forming a random library. Clones were sequenced using
Prism dye terminator or BigDye terminator chemistry (P-E Applied
Biosystems, Foster City, Calif.). Data were collected on ABI377
sequencers. Sequence reads were assembled by Seqman II (DNASTAR,
Madison, Wis.). Finishing methods included sequencing of opposite ends
of linking clones, PCR-based techniques, and primer walking.
Sequence analysis.
The sequence was searched for potential
open reading frames (ORFs) by Genemark (5). Each ORF was
initially searched against the GenPept protein database using the
DeCypher II system, to determine identity or match with known proteins.
After subsequent inspection and further analysis of many of the ORFs,
annotations were created. Each gene was assigned a unique identifier
(ID; S number).
Annotation.
Sequence comparisons were performed using the
amino acid sequences predicted for the identified ORFs unless otherwise
indicated. Significant homology was defined as greater than 30%
identity over greater than 60% of both the query sequence and the
target sequence, as has been used in other genome analysis projects
(3). An exception was applied in the analysis of insertion
sequence (IS) elements, where, because we wanted to not miss fragments of IS elements, we chose to apply the criteria of greater than 50%
identity over greater than 60% of the query sequence only. Based on
sequence analysis and the extent of similarity to hypothetical proteins, potential ORFs were categorized as detailed below. In a few
specific circumstances, we designated protein similarity on lower
levels of amino acid identity than defined by these criteria; these
cases and all other exceptions to the criteria are noted in the text
where appropriate. The clusters of orthologous genes (COG) algorithm
(COGnitor [65]) was used to identify families to which
predicted proteins are related, and ProfileScan and Prosite Scan
algorithms were used for the identification of potential functional domains.
Nucleotide sequence accession number.
The sequence described
here has been assigned GenBank Accession number AF 348706.
 |
RESULTS AND DISCUSSION |
Overview of annotation.
The fully assembled circular sequence
of pWR501, including the Tn501 selectable marker, consisted
of 221,851 bp, 213,491 bp of which was specific to the
Shigella virulence plasmid. Initial analysis of the sequence
identified 293 potential ORFs (Fig. 1), 286 of which were Shigella plasmid derived and 7 of which
belonged to Tn501. The 8,360-bp Tn501 insert,
which had been previously introduced into pWR501 (see Materials and
Methods), was located between sequence coordinates 198713 and 207065, with a 7-bp CCTAAAG target site duplication near the pWR501 replicon.

View larger version (48K):
[in this window]
[in a new window]
|
FIG. 1.
Circular map of the plasmid. Outer ring depicts ORFs and
their orientations, color coded according to functional category: 1, identical or essentially identical to known virulence-associated
proteins (red); 2, homologous to known pathogenesis-associated proteins
(pink); 3, highly homologous to IS elements or transposases (blue); 4, weakly homologous to IS elements or transposases (light blue); 5, homologous to proteins involved in replication, plasmid maintenance, or
other DNA metabolic functions (yellow); 6, no significant similarity to
any protein or ORF in the database (brown); 7, homologous or identical
to conserved hypothetical ORFs, i.e., proteins of unknown function
(orange); and 8, Tn501 insertion-associated genes (green).
The second ring shows complete IS elements. The third ring graphs G+C
content, calculated for each ORF and plotted around the mean value for
all ORFs, with each value color coded for the corresponding ORF. Scale
is in base pairs. The figure was generated by Genescene (DNASTAR).
|
|
pWR501 is a mosaic of potential pathogenesis-associated genes, IS
elements, maintenance genes, and unknown ORFs. Of the 286 Shigella-derived potential ORFs, 54 (19%) encode known
Shigella proteins (category 1, indicated by red banding in
Fig. 1). Thirty-seven of these are located within a 32-kb cluster of
uninterrupted ORFs, previously described, constituting the
ipa-mxi-spa loci or pathogenicity island (68).
The remaining 17 are distributed throughout the plasmid and include
five alleles of ipaH and one allele each of icsA
(virG), virA, icsP (sopA), virF,
virK, msbB sepA, ipgH, shet2, phoN-Sf, trcA, and an apyrase gene.
Four previously unidentified ORFs (1.4% of pWR501 ORFs) encode
proteins that have significant sequence similarity to known virulence-associated proteins (category 2, indicated by pink banding in
Fig. 1); these are homologs of S. flexneri ShET2, the
Salmonella serovar Typhimurium intracellular growth and
virulence determinant MkaD, the E. coli lipopolysaccharide
biosynthesis-related protein RfbU, and a UDP-sugar hydrolase (16,
18, 38, 62).
Fifty-two percent of pWR501 ORFs (153 ORFs) are known and new IS
elements (reviewed in reference 32) as well unknown IS elements (categories 3 and 4, indicated by blue and light blue in Fig.
1). Among these, 20 complete known and at least 4 new putative IS
elements were identified. The majority of the IS element ORFs are
partial copies and, together with the complete IS elements, are
arranged in blocks separated from non-IS related ORFs. In some regions,
the arrangement of the IS elements is complex, with fragments of one IS
element ORF interdigitated and often fused with another IS element or
fragment thereof, forming a mosaic pattern (Fig.
2).

View larger version (29K):
[in this window]
[in a new window]
|
FIG. 2.
Schematic representation of IS elements flanking
virulence-associated genes on pWR501. (A) The 32-kb
ipa-mxi-spa region; (B) the virA-virG region; (C)
the virF region. Sequence coordinates are indicated above
each map. frag, fragment; put tnp, putative transposase; seq, sequence;
URF, unknown ORF; inc, incomplete. (D) The region downstream of
virF (left side, as shown in panel C) shown at the base pair
level to demonstrate that the nucleotide marking the end of one IS
element or IS fragment is the start of another. The relevant bases are
indicated in bold underline.
|
|
In pWR501, most virulence-associated genes, including the
ipa-mxi-spa operons, the virG gene, and the ShET2
toxin gene, are flanked by one or more such mosaics of IS element ORFs
(Fig. 2). In addition, many of the unknown ORFs (discussed below) are
flanked by IS element ORFs and have G+C content of less than 40%.
Based on this genetic organization, the recombination events that led to the acquisition of many or most genetic loci and the assembly of the
large virulence plasmid almost certainly involved IS-mediated events.
To date, no known plasmid in any bacterial species has been described
with this degree of IS element content.
Coincident with the extremely high content of IS element-related ORFs
on pWR501 is an extremely low density of ORFs that encode proteins
predicted or known to be involved in other (i.e., non-IS element-related) functions in comparison with other plasmids. Thus,
while pWR501 contains 0.62 non-IS element-related genes per kb, the
Y. pestis LCD plasmid contains 0.87 per kb (61 ORFs over
70,509 nucleotides) (45) and the E. coli
O157:H7 plasmid contains 0.91 per kb (84 ORFs over 92,077 nucleotides)
(8). From a different perspective, 48% of pWR501 ORFs are
non-IS element related, whereas 82% of Y. pestis LCD
plasmid (45) and 87% of E. coli O157:H7
plasmid (8) ORFs are non-IS element related. Furthermore,
bacterial chromosomes contain relatively few IS elements. For example,
only 2% of the ORFs on the E. coli chromosome are thought
to encode transposons, phage, or plasmids, and the overall density of
genes outside these three categories is 0.906 per kb (4,201 ORFs over
4,639,221 nucleotides) (3). Thus, pWR501 appears to be and
has been capable of extraordinary IS element-related recombination
events, which undoubtedly contributed to the unique evolution of
Shigella.
Twenty-five ORFs (9% of pWR501 ORFs) showed significant homology to
proteins involved in plasmid replication and DNA metabolic functions
(category 5, indicated by yellow banding in Fig. 1). The replicon
region, along with the origin of replication (ori) site, is
almost identical to the R100 replicon (40, 43). Other proteins in this category include two sets of toxin-antitoxin genes or
protein addiction modules, including the ccdAB module of
plasmid F and several homologs of full-sized or partial copies of a
reverse transcriptase associated with group II introns (1, 33).
Fifty putative ORFs (17% of pWR501 ORFs) had insufficient homology to
known proteins to be assigned a putative function. These constitute the
unknown ORFs, of which 33 had no significant similarity to any protein
or ORF in the database (category 6, indicated by brown in Fig. 1), and
17 had significant similarity to conserved hypothetical ORFs, i.e.,
proteins of unknown function (category 7, indicated by orange in Fig.
1). Many of those in the latter group have been identified in other
bacterial sequencing projects.
A SalI restriction map of the virulence plasmid of a
serotype 2a strain (pMYSH6000) had been previously constructed by
Sasakawa et al. (53, 54). The map of pMYSH6000 assembled
from that analysis differs grossly from that of pWR501 in that the
ipa-mxi-spa loci are in inverse orientation. The
SalI restriction map predicted from the sequence of pWR501
matches that seen upon restriction analysis of pWR501 DNA (data not
shown). However, the pWR501 SalI restriction map agrees only
in part with the SalI restriction map of pMYSH6000.
Approximately 60% of the individual SalI fragments of
pWR501 are of sizes determined for individual fragments of pMYSH6000
(e.g., pMYSH6000 fragments C, F, G, I, and J), but many fragments are
of different sizes, and the total number of fragments differs (23 in
pMYSH6000, versus 22 in pWR501). These differences suggest that the
virulence plasmids from these two serotypes have diverged in overall
organization as well as in nucleotide sequence.
IS elements.
Of the 153 ORFs related to IS element ORFs, 114 (39%) have homology to known IS element ORFs (32;
www-IS.biotoul.fr/is/is_family.html); of these, 71 belong to members of
the IS3 family, which includes many IS elements in addition
to IS3. Among the complete IS
elements, 20 have been previously identified in Shigella or
other organisms; these include five copies of IS629, four
copies of IS1294, two copies of IS2, three copies
of IS600, three copies of iso-IS1, one copy of
IS4, one copy of IS630, and one copy of
IS911 (Table 1). An IS element was considered partial or
incomplete when the homology of the element in pWR501 did not extend to
and include the inverted repeat of the target sequence, when it should
be present. Most of the IS element-related ORFs on pWR501 are partial copies of IS elements that are truncated at the ends, contain internal
deletions, or contain insertions of other IS elements within their
sequence, reflecting extensive genomic rearrangement following
acquisition, a pattern that has also been seen in the enteropathogenic
E. coli plasmid pB171 (67) and the Y. pestis plasmid pCD1 (45). Among all IS element ORFs
on pWR501, those that have not been previously identified on
Shigella virulence plasmids include IS10,
IS21, IS100, IS110, IS150,
IS1294, IS1328, and IS1353. Detailed
characterization of the IS element ORFs can be found at
http://www.genome.wisc.edu/.
Four new types of putative IS elements were identified in pWR501 and
were designated ISSfl1, ISSfl2,
ISSfl3, and ISSfl4. Their identification was
based on the following criteria: the sizes of the ORFs were comparable
to ORF sizes of known IS elements, the protein sequence similarity was
less than 55% to these known elements, and inverted and/or direct
repeats were present at either end of the putative element. At present,
there are no data as to whether these putative IS elements are
transposable. Both complete and incomplete copies of these new putative
IS elements are present in pWR501, comprising six new complete elements
and 26 ORFs in all. In addition, 13 ORFs showed less than significant
similarity to known IS elements or putative transposases (Table 1,
unknown ORFs). None of the putative transposases showed the
characteristic features of new IS elements; however, a more detailed
investigation of their sequence remains to be done.
The first new putative IS element, ISSfl1, consists of ORFs
S0204 and S0203, is 929 bp in length, has 20-bp inverted repeats, and
generates a 3-bp TTC direct repeat at the point of insertion. S0204 and
S0203 show similarity to OrfA and OrfB of IS1650 at the
amino acid level but not at the nucleotide level. Elsewhere on pWR501,
S0078-S0079 and S0101-S0102 constitute incomplete copies of
ISSfl1 consisting of bp 6 to 869 and 4 to 827 respectively, of ISSfl1. The second putative new IS element,
ISSfl2, is present in two almost identical copies, S0055 and
S0128. They encode proteins with 58 to 59% identity to the
Streptomyces coelicolor IS110 transposase, but as
with ISSfl1, they have no significant homology at the
nucleotide level, except for approximately 55 nucleotides in the center
of the element, which are similar to bp 1071 to 1127 of
IS110. A direct repeat sequence CCCCTATTA is seen
at either end of S0055 and S0128, identifying and duplicating the
target site for insertion. The third putative new IS element,
ISSfl3, consists of S0034, which has 33% identity at the
amino acid level to E. coli IS10 transposase,
without homology at the nucleotide level. ISSfl3 is inserted
within a Y. pestis IS100 sequence, with
duplication of sequences at the insertion site. Sequences 98%
identical at the amino acid level to S0034 have also been identified in
the enteropathogenic E. coli EAF plasmid pB171 (Orf5 and
Orf28 [67]).
The fourth putative new IS element, ISSfl4, is homologous to
the recently identified ISEc8, located on the chromosome of
enterohemorrhagic E. coli O157:H7, adjacent to the locus of
enterocyte effacement pathogenicity island (44, 57). These
elements are members of the IS66 family, which has also been
identified in Rhizobium sp., Agrobacterium sp.,
and Sinorhizobium meliloti (57). Homologous ORFs are also present on the enteropathogenic E. coli
plasmid pB171 (orf49 to orf51 [67]). ISEc8
(and ISShf4) consists of three ORFs transcribed in the same
direction and flanked by 11- to 22-bp imperfect inverted repeats and an
8- or 9-bp target duplication site. On pWR501, three loci consisting of
ORFs homologous to all three ISEc8 ORFs are present (S0023
to S0025, S0116 to S0119, and S0216 to S0219). In addition, six partial
copies of the third ORF alone are distributed throughout the plasmid
(S0008, S0038, S0072, S0081, S0090, and S0230). Within the loci
containing all three ORFs, the third ORF appears to have undergone an
internal deletion in one case (S0116-S0117) and a frameshift in another (S0216-S0217). Two of the ISSfl4 sequences (S0116 to S0119
and S0216 to S0219) are flanked by an exact 11-bp inverted repeat GTAAGCGCCCC, which is present 71 bp upstream of the ATG
start site of the first ORF of the element (S0119 and S0219) and 21 bp
downstream of the last ORF (S0116 and S0216). ISSfl4 is the largest IS element on pWR501, 2,730 bp, which is comparable in size to
ISEc8 (2,443 bp). The ORFs belonging to ISSfl4
show variably 36 to 62% identity over long stretches to the
ISEc8 ORFs.
Several IS elements of one type show direct repeats at each end of the
insertion, while others belonging to the same group do not. Those that
show target site duplication may have been acquired more recently than
those that do not. For example, only two of the five copies of
IS629 and two of three iso-IS1 elements have
direct repeats. IS1294 elements have 4-bp direct repeats of
the sequence CTTG, although the duplication occurs not as the result of
target site duplication but as a result of insertion site specificity
(CTTG), with the duplicate copy of CTTG being the end of
IS1294 itself (66). Recent work with the
IS1294 found on the ColD-like resistance plasmid pUB2380
indicates that the IS1294 element mediates not only its own
transposition but also that of sequences adjacent to it in a
transposition mechanism resembling rolling circle replication with
single-stranded DNA intermediates (66). pWR501 contains
four complete copies of IS1294.
A substantial portion of horizontally transferred genes, characterized
by atypical nucleotide composition or patterns of codon usage, are
associated with plasmid, phage, or transposon-related sequences,
including IS elements (42). The regions adjacent to these
horizontally transferred genes often contain remnants of these mobile
elements (42). The map of pWR501 indicates a unique
arrangement of known virulence-related genes and unknown ORFs separated
by blocks of IS element-related ORFs. The blocks of IS elements
presumably reflect the history of IS element-mediated acquisition of
the virulence genes and other ORFs that have contributed to the
assembly and evolution of the virulence plasmid.
For example, sequences homologous to the ipa-mxi-spa gene
cluster (ipaJ to spa40) are seen in other
bacteria, including the yscM-yopD gene cluster on the
Yersinia low-Ca2+-response (LCR) plasmid pCD1
(45). It is noteworthy that a partial copy of the
Yersinia IS100 (bp 1 to 1051 of the IS element's
1,954 bp) borders one end of the Shigella ipa-mxi-spa loci
and defines one end of the region of low G+C content characteristic of
this gene cluster. At the other end of the ipa-mxi-spa loci,
approximately 50 bp downstream of the stop codon for spa
ORF11, which is adjacent to spa40, is another sequence of
139 bp (sequence coordinates 129144 to 129282) identical to the portion
of the Yersinia LCR plasmid that borders the insertion of an
IS285 into the LCR plasmid. The 139-bp sequence includes the
28-bp left inverted repeat of IS285 and 100 bases into
orf2 of the IS element. The end of the 139-bp sequence in
pWR501 also signals the end of the low-G+C ipa-mxi-spa
region. The 139-bp sequence is therefore a remnant of a mobile genetic
element that was possibly involved in the horizontal transfer of the
ipa-mxi-spa genes into pWR501. It is interesting to note
that an IS100 has recently been described adjacent to
yscM on pCD1, and two partial IS285 elements
flank the yscM-yopD gene cluster in Yersinia
(45). Homologs of the IS100 element from
Yersinia associated with a pathogenicity island that
contains homologs to the ipa-mxi-spa genes of
Shigella and the yscM-yopD genes of
Yersinia on a 154-kb native plasmid pAV511 in the bean
pathogen Pseudomonas syringae pathovar phaseolicola have
also been identified (27). The lack of IS elements within the ipa-mxi-spa pathogenicity island suggests that the
entire locus was acquired in a single recombination event. Furthermore, like the yscM-yopD, the ipa-mxi-spa cluster lacks
a characteristic of most pathogenicity islands, the presence of
flanking tRNA genes (49). Absence of tRNA genes may have
resulted from genomic rearrangements following gene transfer;
alternatively, acquisition of the locus may not have involved insertion
into tRNA genes.
A predictable consequence of the presence of such a high density of IS
element sequences on the Shigella virulence plasmid is the
observed predisposition to frequent genomic alterations. For example,
the construction of S. flexneri 2a vaccine SC602 by targeted
deletion of virG (12) was accompanied by an
unexpected recombination between two IS629 sequences that
flanked the region, resulting in a larger than expected deletion, which
includes virA as well as virG (M. M. Venkatesan, unpublished data). A spontaneous avirulent isolate of
S. flexneri 5 (M90T-A3) contains a deletion of approximately
70 kb that includes both the ipa-mxi-spa locus and the
virG-virA locus (9), a segment flanked by
IS629 elements (Fig. 1). Similar deletions have been
described for the T-32 ISTRATI vaccine strain (71) and the
chromosomal she locus in a serotype 2a strain
(48). Another type of IS element-mediated alteration that
has been described for the Shigella virulence plasmid is the
spontaneous insertion of an IS1 element within the
virF gene, which converted a virulent strain to an avirulent
one (36).
G+C composition.
The G+C content of genes and gene flanking
regions has been used as a marker for phylogenetic origin of genes or
gene clusters, with those genes or gene clusters that have anomalous
base composition being thought to have been acquired more recently by
horizontal gene transfer. The average G+C content of pWR501 is 47.6%,
similar to the estimated G+C content of the Shigella
chromosome, 50.02% (G. Plunkett III et al., personal communication).
In contrast, essentially all of the characterized virulence-associated
genes encoded on the virulence plasmid have markedly lower G+C
composition, in the range of 30 to 35% (Fig.
3).

View larger version (36K):
[in this window]
[in a new window]
|
FIG. 3.
Plot of G+C content (y axis) of each ORF on
pWR501 (x axis). Unknown ORFs and selected known ORFs with
G+C content of less than 40% are labeled.
|
|
The overall G+C content of the ipa-mxi-spa region is 35%.
While the G+C content of the Yersinia yscM-yopD region
(44.8%) is similar to that of both the entire LCR plasmid and the
Y. pestis chromosome (45), the G+C content of
the sequenced region of pAV511 is 54%, significantly lower than the
overall figures of 59 to 61% reported for pathovars of P. syringae (27). The homology of the virulence genes
and flanking sequences of the pathogenicity islands among
Shigella, Yersinia, and P. syringae plasmids
indicate a common ancestry of these pathogenicity islands, with
subsequent evolutionary changes in gene composition and arrangements.
Based on G+C composition alone, it is tempting to speculate that the Shigella ipa, mxi, and spa genes were acquired
much later in evolution than the Yersinia genes.
Alternatively, Yersinia may have acquired the region from an
organism with a similar G+C content.
Altogether, 66 ORFs (22.5% of pWR501 ORFs) have a G+C content of less
than 40%; of these, 38 are found as a block within the ipa-mxi-spa loci, and 28 are distributed throughout the
remainder of the plasmid (Fig. 3). Sixteen of these 28 are found in
clusters of two or three genes of low G+C content, suggesting that they might have been acquired as a block of genes from a single donor organism. Four of these 28, virF (S0051), shet2
(S0097), virA (S0191), and sopA (S0292) are known
virulence factors (68). Most of the remainder encode
hypothetical proteins that lack significant similarity to any known
protein, many of which are of moderate size (200 to 500 amino acids
[aa]). This raises the possibility that like the known virulence
genes on pWR501, many of these hypothetical ORFs encode virulence
factors and were acquired by lateral transfer. These therefore
potentially represent as yet uncharacterized recently acquired
virulence determinants, some of which may have been acquired after the
emergence of the evolutionary group from which pWR501 was isolated.
Three pWR501 loci have G+C contents of 60% or greater (Fig. 3). The
first of these is S0214, which lies within a block of relatively high
G+C content genes that is homologous to an E. coli plasmid
ColIb-P9 locus. The second high-G+C
content locus is S0264, which
lies immediately adjacent to a group of genes predicted to be involved
in plasmid transfer and stability functions, most similar to those of
plasmid R100 (see "The replicon," below). The third locus is S0269
to S0274, within Tn501.
Unknown ORFs.
ORFs that either showed no significant
similarity to any protein or ORF in the database or were homologous or
identical to conserved hypothetical ORFs (proteins of unknown function)
were jointly defined as unknown ORFs and are listed in Table
2. A discussion of several interesting
features of these ORFs follows.
Unknown ORFs with no significant similarity to hypothetical
proteins.
The longest unknown ORF is S0249, which putatively
encodes a 970-aa protein that has less than significant homology to the Mycobacterium tuberculosis probable DNA helicase II homolog
UvrD2 (700 aa [11]), largely between residues 335 and
422. While M. tuberculosis UvrD2 has not been extensively
characterized, E. coli uvrD encodes a 720-aa protein with
3'-5' DNA helicase activity (6) that is nonessential for
viability but is required for methyl-directed mismatch repair and
nucleotide excision repair and is believed to participate in
recombination and DNA replication. S0249 belongs to the superfamily of
DNA/RNA helicases (COG0210), as determined by COGnitor for
identification of families to which predicted proteins are related
(65). S0249 has an ATP/GTP binding site motif A (P loop)
at residues 156 to 163 (GSAGSGKT), similar to the ATP
binding motif in UvrD proteins (ProfileScan algorithm). The ATP binding
motif is a glycine-rich region, which typically forms a flexible loop
between a beta strand and an alpha helix and interacts with one of the
phosphate groups of the nucleotide. S0249 also has an ankyrin repeat 2 motif at residues 721 to 753. Ankyrin repeats are tandemly repeated
modules of 33 aa seen primarily in eukaryotic proteins, where they play
a role in protein-protein interactions (4). The few known
examples from prokaryotes and viruses are thought to be the result of
horizontal gene transfer (4). S0249 could represent a
helicase with a C-terminal end capable of interaction with other
accessory proteins involved in DNA replication, excision, and/or repair mechanisms.
ORF S0141, located between icsB and ipgD within
the ipa-mxi-spa loci, has not been previously described in
the literature. The predicted protein has no significant homology to
any hypothetical protein. The ATG start site and the first nine amino
acids precede the start codon of ipgD, which is transcribed
in the opposite orientation. It remains to be determined whether S0141
is expressed in vivo.
S0103 has less than significant homology to HilC (SprA) of
Salmonella serovar Typhimurium and PerA of E. coli, both members of the AraC/XylS family of transcriptional
regulators, known to bind DNA via a helix-turn-helix (HTH) motif
(23, 56). In enteropathogenic E. coli, PerA
activates the eaeA gene, encoding intimin. HilC regulates
hilA expression, which in turn activates the expression of
invasion-associated genes. A sequence that is 52% similar to the HTH
motif characteristic of this family of proteins was detected in S0103
at residues 54 to 155 (Prosite Scan ID PS01124). HTH DNA binding motifs
were originally identified in bacterial proteins. They have since also
been found in eukaryotic DNA-binding proteins. It is not known what
genes might be regulated by S0103.
Unknown ORFs with significant similarity to previously described
proteins of unknown function.
Adjacent to and in the opposite
orientation of S0249 is an ORF (S0250) that encodes a 280-aa protein
with 95% identity to Shigella shf (48). In the
previously published sequence of shf, this locus was thought
to contain two ORFs, shf1 (predicted to encode a 133-aa
protein) and shf2 (predicted to encode a 145-aa protein).
The pWR501 sequence lacks a single T nucleotide that is present after
base 770 of the published sequence. The absence of this nucleotide in
pWR501 generates a single ORF encoding a 280-aa protein. COG analysis
places S0250 in the family of xylanases/chitin deacetylases (COG0726),
which function in carbohydrate transport and metabolism.
S0250 forms an operon with S0251, S0252, and S0253, which encode
rfbU (capU), virK, and
msbB, respectively. RfbU is a UDP-sugar hydrolase and has
been described in Vibrio cholerae as an accessory protein
required for O-antigen biosynthesis (18). MsbB is an acyltransferase involved in fatty acyl modification of the O antigen (60). msbB genes are also present on the
Shigella chromosome and in the E. coli genome.
Mutations in msbB genes result in lipopolysaccharide that
has reduced toxicity (60). In Shigella, VirK
mutants have less virG mRNA than wild type, suggesting the
involvement of virK in posttranscriptional regulation of
virG expression (37). It is interesting to
consider that VirK, being encoded within the shf-capU-msbB
operon, may have a role in O-antigen biosynthesis. A locus of three
genes with significant homology and similar in organization to pWR501
genes shf, rfbU(capU), and virK is
also seen in the enteroaggregative E. coli plasmid pAA2
(J. R. Czeczulin, T. R. Whittam, I. R. Henderson, and
J. P. Nataro, submitted for publication) and E. coli
O157:H7 plasmid pO157 (8).
Two loci containing seven (S0208 to S0214) and two (S0235 and S0236)
ORFs are similar to unknown ORFs on plasmid ColIb-P9 (G. Sampei and K. Mizobuchi, unpublished data). Of note, ColIb-P9 was originally
described in Shigella sonnei. ORF S0210 shows 41% identity
to 135 residues of a 304-aa adenine methyltransferase (EcoVIII modification methylase), which contains an
N6-adenine-specific methylase signature motif
between residues 24 and 30 (ILTDPPY) (Prosite Scan ID PS00092). DNA
methyltransferases modify DNA within a recognition sequence and in most
cases are associated with a restriction endonuclease. The combination
of these two activities protects native bacterial DNA from cleavage and
facilitates cleavage of foreign DNA. In bacteria, DNA methylation may
also be involved in transposon movement and DNA mismatch repair and may
have important roles in virulence (26).
Salmonella serovar Typhimurium strains with mutations in DNA
adenine methylase show abnormalities in protein secretion, host cell
invasion, and M-cell cytotoxicity (22). It is not known
whether the DNA methylase activity of S0210 is associated with a
restriction endonuclease.
The identification of several unknown ORFs, many of which have
interesting homologies and many of which may be involved in virulence,
permits targeted investigation of genes that may have subtle albeit
important roles in Shigella pathogenesis. Most of the
virulence genes that have been previously characterized were identified
using in vitro assays designed to evaluate the ability of bacteria to
invade and multiply within epithelial cells. Many of the unknown ORFs
in pWR501 may have been missed in these screens by virtue of not being
essential to either invasion or intracellular multiplication. They may
nevertheless modulate the efficiency of invasion and intercellular
dissemination or be otherwise important to pathogenesis. Further
genetic and phenotypic analysis of each of these genes will permit
characterization of each one's role in pathogenesis.
Known ORFs.
S0042 (405 aa) is 47% identical and 62% similar
over 387 aa to a reverse transcriptase/maturase (RT) from
Sinorhizobium meliloti (419 aa) that is associated with a
group II intron (33). S0113 and S0114 are homologous to
short regions of the RT, apparently fragments (50 and 47 aa,
respectively), and S0199 and S0200 represent a second copy of the RT
element that is frameshifted. Group II introns are self-splicing RNAs
linked to mobile genetic elements, related to the nuclear introns of
pre-mRNA. They are commonly found in organellar genes of lower
eukaryotes and plants but have recently been described as associated
with IS elements in many prokaryotes, including E. coli
(13). In addition to their ribozyme core, some group II
introns encode proteins with reverse transcriptase activity associated
with endonuclease activity. In pWR501, the RT encoded by S0042 appears
to contain the seven domains characteristic of active RTs
(33). The S. meliloti group II intron with the encoded RT is inserted within an IS element. The association of group
II introns with IS elements ensures the spread and maintenance of the
introns. S0042 does not appear to be within an IS element, although
there is an IS629 element 700 bp upstream of the coding sequence and a putative transposase downstream. Furthermore, the presence of multiple copies of RT in pWR501 indicates that they might
originally have been acquired within IS elements. The IS629 sequence is immediately adjacent to a 170-bp sequence that is homologous to sequence on the enteropathogenic E. coli
plasmid pB171, suggesting DNA exchange between pWR501 and pB171.
Splicing efficiency of the S. meliloti group II intron
requires expression of the RT protein, which suggests that it has a
maturase activity (33). S0042 contains the
carboxy-terminal domain that specifies maturase activity but lacks the
zinc finger domain that specifies endonuclease activity.
A virulence plasmid-encoded operon that contributes to enhanced
survival and mutation frequencies, impCAB, has previously been characterized in S. flexneri strain SA100
(50). In pWR501, the impCAB operon is missing;
only the first 176 bp are present, beginning at sequence coordinate
157595. impCAB is similar to the chromosomal
umuDC operons of E. coli and
Salmenella serovar Typhimurium and the plasmid
mucAB operon; all function in DNA repair following
radiation- and chemical-induced mutagenesis.
The invasion locus of Shigella is a pathogenicity
island-like cluster that consists of 38 ORFs (S0130 to S0167) of the
ipa-mxi-spa operons within a stretch of 32 kb of the
virulence plasmid (beginning with the start codon of ipaJ at
sequence coordinate 97092 [S0130] and ending with the stop codon of
spa-orf11 [S0167] at sequence coordinate 129091). Genes
within this locus are critical for Shigella invasion of
mammalian cells, although certain genes outside this region are
required for optimal invasion of tissue culture cells. This region has
been previously mapped and sequenced, and genes within this region have
been extensively characterized (46). Notably, on pWR501,
the orientation of the entire gene cluster is inverse of what had been
previously published (17). Potential explanations for this
inversion include (i) strain differences due to a true inversion during
evolution that may have been enhanced by the presence of flanking IS
elements or (ii) an artifact of the cloning approach that was used in
the prior sequencing projects. Strain-to-strain differences in the
arrangement of virulence genes have been previously described
(2).
Two, and perhaps three, ShET2-like toxin genes are present on pWR501.
The ShET2 enterotoxin (S0097, 566 aa) is present on the virulence
plasmids of all species of Shigella (38). S0012 (533 aa) represents a second allele of the ShET2 gene, with 40% identity to ShET2 over more than 90% of its length (Fig.
4). In addition, S0230 (266 aa) is 22%
identical to ShET2 over 55% of its length (Table 2 and Fig. 4).

View larger version (65K):
[in this window]
[in a new window]
|
FIG. 4.
Alignment of ShET2 alleles on pWR501. The ShET2 toxin
(S0097) is compared with the second copy of ShET2 (S0012) and a smaller
ORF (S0030) with which it bears similarity. The similarities are
indicated as for BLAST searches.
|
|
There are five complete alleles of ipaH on pWR501,
designated ipaH1.4, ipaH2.5, ipaH4.5, ipaH7.8, and
ipaH9.8 (69). A single T residue at pWR501
sequence coordinate 213802, within the coding sequence of
ipaH1.4, a single T residue at pWR501 sequence coordinate 41551, within the coding sequence of ipaH2.5, and a single G
residue at pWR501 sequence coordinate 61847, within the coding sequence of ipaH7.8, were missed previously, leading to truncation of
ipaH1.4 and ipaH2.5 at their 3' ends and
ipaH7.8 at its 5' end. Thus, our sequence predicts that
ipaH1.4, ipaH2.5, ipaH4.5, ipaH7.8, and ipaH9.8
encode proteins of 575, 563, 574, 565, and 545 aa, respectively. The
amino-terminal halves are variable, while the carboxy-terminal halves
are conserved (Fig. 5), as described
previously (69). The variable amino-terminal halves
contains a leucine-rich repeat (delimited by asterisks in Fig. 5)
(28). The carboxy-terminal 13 residues of IpaH7.8 and
IpaH9.8 are absent in IpaH2.5 and IpaH4.5 and altered in IpaH1.4. The
G+C content of these five alleles is remarkable in that the
amino-terminal halves are lower in G+C content than the conserved
carboxy-terminal halves of all five alleles, which suggests a modular
unit whose two halves might have been acquired separately during
evolution and subsequently fused. ipaH1.4 and
ipaH2.5 are more similar to each other than to the other
three alleles. A schematic representation of the amino-terminal
alignment and the degree of sequence identity among the five
ipaH alleles is shown in Fig. 5. Although the function of
each ipaH allele is unknown, transcription of four of them, but not that of the ipaBCDA and mxi operons, was
markedly increased during growth in the presence of Congo red and in an
ipaD mutant, two conditions under which secretion through
the Mxi-Spa machinery is enhanced (15). Transcription of
the ipaH genes was also transiently activated upon entry
into epithelial cells. Recently, IpaH7.8 was shown to facilitate the
escape of Shigella from phagocytic vacuoles of mouse
macrophages and human monocytes (19).

View larger version (76K):
[in this window]
[in a new window]
|
FIG. 5.
The five IpaH alleles of pWR501. (A) Alignment of the
amino-terminal halves of the five IpaH alleles, using the Clustal
method with PAM250 residue weight table (MEGALIGN algorithm; DNASTAR
software). Beginning with the sequence LADAV (residues 307 to 311 of
IpaH1.4), the sequences are conserved among all five alleles; the
carboxy termini are not shown. Asterisks, approximate extent of the
leucine-rich repeat sequence; arrow, beginning of region conserved in
all five alleles. (B) Sequence identity and divergence among the five
IpaH alleles. Percent divergence is calculated by comparing sequence
pairs in relation to the phylogeny reconstructed by MEGALIGN, whereas
percent identity is determined for individual pairs without regard to
phylogenetic relationship. (C) G+C content distribution of IpaH7.8.
Scale is in base pairs.
|
|
The replicon.
pWR501 has a replicon region highly homologous
to that of plasmid R100, which is within the RepFIIA family of
replicons. There are six known incompatability (Inc) groups within this
family; plasmid R100 belongs to the IncFII group. In pWR501, the
replicon extends from sequence coordinates 208523 to 210861 and
contains the essential replication elements present in R100, including an ori and a G site (single-strand initiation site), which
directs the priming of single-stranded DNA templates for leading and/or lagging strand synthesis (Fig. 6)
(43, 64). Of note, the R100 plasmid was initially isolated
from an S. flexneri 2a strain.

View larger version (25K):
[in this window]
[in a new window]
|
FIG. 6.
Replicon of pWR501. (A) Replicon region and flanking DNA
sequences (sequence coordinates 187081 to 214607), including the
insertion site of Tn501. (B) Expanded view of the replicon
shown in panel A (sequence coordinates 207312 to 214912), indicating
the position of inc RNA, ori, and the G site.
Distances are indicated in base pairs below each map.
|
|
The frequency of replication is regulated by control of the
synthesis of the plasmid-specific replication initiation protein RepA1,
which binds to the plasmid ori and assembles a replication complex (43). The principal copy control elements are (i)
an antisense RNA, inc RNA (or CopA in plasmid R1), which is
constitutively expressed and rapidly turned over and which exerts
negative control on the repA1 transcript (40);
(ii) RepA2, a trans-acting repressor of the repA1
promoter; and (iii) repA6, whose expression disrupts the
binding of inc RNA to the repA1 transcript
(73). All of these elements are present on pWR501. To our
knowledge, no function has been ascribed to repA4.
The pWR501 replicon is more than 75% identical to the R100 replicon at
both nucleotide and amino acid levels. Classification of replicons is
largely based on the extent of similarity of the sequence of RepA1
proteins. pWR501 RepA1 (251 aa in length) is 76% identical to residues
35 to 285 of R100 RepA1 (285 aa in length). Phylogenetic analysis of
the RepA1 proteins from several different replicons indicated that
pWR501 RepA1 is most closely related to RepA1 from R100 (data not shown).
repA2 of pWR501 is divergent from repA2 of R100
but is identical to that of R1 (an IncFII plasmid), as has been shown
for another S. flexneri serotype 5a virulence plasmid,
pWR110 (59); interestingly, the three sequences are
identical at the protein level. The RepA2 target site is in a region
that lacks homology, indicating that the RepA2 proteins and their
targets are plasmid specific and not group specific (41).
Of note, pWR110 was shown to be compatible with IncFII plasmids and
only weakly incompatible with IncFI plasmids (59). The pWR501 incompatibility determinant, located within the inc
gene, is identical to that of pWR110 (59); therefore, the
incompatibility profiles of the two plasmids would be predicted to be
the same. The inc genes of pWR110 (59) and
pWR501 (this study) differ slightly from the inc genes of
the IncFII plasmids R1 and R100 and the IncFI plasmids P307 and
ColV2-K94. Thus, while the replicon of pWR501 is most similar to that
of IncFII plasmids, it does not fall within the FII incompatibility
group, suggesting that the few differences in nucleotide sequence
within the inc gene may give rise to significant differences
in incompatibility.
The ori region of R100 is located within a 167-bp sequence
downstream from repA1 that contains approximately 75 bp of
high AT content; a similar high-AT-content sequence is observed in the
putative ori of pWR501, which lies within a 362-bp segment between the stop codon of repA1 and the start codon of
repA4 (sequence coordinates 210012 to 210373) (Fig 6)
(63). Base pairs 93 to 258 of this 362-bp sequence are
95% identical to the 167-bp ori sequence of R100. A
dnaA box (TTATCCACA) is located within the ori sequence, and a G site, analogous to the single-strand
initiation site of phages, is located toward the 3' end of
repA4, as in R100. Within the G site are three blocks of
conserved nucleotides, one of which provides the starting point for
primer RNA synthesis (63); all three are conserved in
pWR501. pWR501 homology to the R100 replicon ends 80 bp upstream of
tir in R100 (Sampei and Mizobuchi, submitted for publication).
Apart from the essential replicon, pWR501 has multiple loci homologous
to sequences known to be involved in plasmid segregation and stable
maintenance: parA and parB of the P1
bacteriophage partitioning system (S0039 and S0040), stbB
and stbA of the R100 plasmid (S0206 and S0207),
ccdA and ccdB (also known as proteins H and G) of
the F plasmid (S0232 and S0233), and mvpA and
mvpT (S0259 and S0258). P1 ParA and ParB act on the
cis-acting element parS to promote faithful
segregation of the plasmid during the bacterial cell cycle
(14). The ParA proteins of pWR501 and P1 are 75%
identical and the ParB proteins are 59% identical over most of their
lengths, and an AT-rich parS site with an inverted imperfect
20-bp repeat structure is present immediately downstream of the
parB stop codon. Plasmid R100 StbA and StbB, along with an
essential upstream cis-acting element, mediate stable
plasmid inheritance (61). StbA and StbB of pWR501 are 43 and 29% identical to R100 StbA and StbB over more than 80% of the
target sequences, although pWR501 StbA is predicted to be twice as long
as R100 StbA. An AT-rich sequence immediately upstream of
stbA may constitute a cis-acting site. The
redundancy of plasmid segregation and stable maintenance systems and
their distribution around the plasmid may enhance the stable
inheritance of a single-copy plasmid, such as pWR501, over many
generations. In addition, the redundancy may suggest not only that
maintenance of the plasmid is important to Shigella
virulence but that maintenance of plasmid derivatives that carry large
deletions is important. The Shigella virulence plasmid is
known to spontaneously delete large segments (34), and the
observed distribution of segregation and maintenance loci might permit
maintenance of derivatives that had deleted segments containing one or
more of these loci, which may permit maintenance of virulence despite
loss of some sequence. Toxin-antitoxin systems, which encode a stable
cytotoxin and an unstable antitoxin, mediate postsegregational killing
of daughter cells that lack plasmid. pWR501 contains one recently
described toxin-antitoxin system (the mvp locus, S0258-S0259
[55]), one that has been characterized on F plasmid
(S0232-S0233), and remnants of a third (S0205). S0232 and S0233 encode
homologs of F plasmid CcdB (toxin) and CcdA (antitoxin) (1). Immediately downstream of stbB, S0205
encodes a protein homologous to E. coli RelB, the toxin
encoded by the relBE toxin-antitoxin locus
(25).
ColE1 plasmids resolve plasmid multimers into monomers by site-specific
recombination involving the cer site (58), in
conjunction with chromosomally encoded argR, pepC, and
xerC. pWR501 contains a 129-bp sequence (sequence
coordinates 175196 to 175067) that is similar to bp 111 to 240 of the
284-bp ColE1 cer site, and therefore contains the ArgR
binding site, but lacks the right arm of the recombination site.
Plasmid R1 has a significantly truncated cer site (only 44 bp long) that is thought to function by recombination with a related
site in the terminus of the E. coli chromosome
(10), which suggests that the pWR501 site may also be
functional. Of note, a homolog of ColE1 mob9 protein is encoded within the pWR501 cer site (S0238).
Transfer genes.
A block of genes located between sequence
coordinates 189271 and 196816 are similar to known transfer genes.
These include ORFs with significant homology to a truncation of
traD and the complete sequences of traI, traX,
and finO. In pWR501, this block has the same genetic
organization as the corresponding genes in plasmid R100
(74), including two downstream ORFs of unknown function,
yigA and yigB, followed by the replicon, raising
the possibility that the entire region was acquired by horizontal transfer from R100. The remainder of the large block of transfer genes
that are located upstream in R100 and other conjugative plasmids are
absent on pWR501, and the traI gene has an internal frameshift. Whether pWR501 once had the full complement of transfer genes and subsequently lost most of them or whether only a portion of
the transfer genes were acquired cannot be determined. Experimental data have shown that the virulence plasmid is not capable of
self-transfer by conjugation but can be conjugated in the presence of
conjugative plasmids (52).
Evolution of the Shigella virulence plasmid.
Recent genetic analyses suggests that shigellae do not constitute a
distinct genus as traditionally believed but rather are within the
genus of E. coli, much like the pathogenic E. coli (47). These analyses indicate that
Shigella emerged from E. coli seven or eight
independent times during evolution, leading to three clusters of
Shigella, each of which contains serotypes from multiple
traditional species, and four or five additional forms, each of which
contains one traditional serotype (47). The three main
Shigella clusters are estimated to have evolved 35,000 to
270,000 years ago, which predates the development of agriculture and
makes shigellosis one of the early infectious diseases of humans
(47).
The defining event each time Shigella arose was almost
certainly the acquisition of an historical precursor of the current-day virulence plasmid. The data also suggest that the loss of specific catabolic pathways (inability to utilize lactose and mucate and to
decarboxylate lysine), loss of motility, and expansion of O-antigen diversity that are characteristic of Shigella strains
occurred more recently than the acquisition of the plasmid
(47). In addition, curli loci have been insertionally
inactivated in Shigella (51). Since the plasmid
was acquired at distinct times, one would predict that differences
reflecting the evolution of the plasmid could be obtained by genetic
comparison of virulence plasmids of the seven different
Shigella evolutionary groups. Subsequent to the acquisition
of the virulence plasmid, divergence of Shigella clones from
E. coli would involve clonal divergence (accumulation of mutations by base substitution), horizontal transfer of genetic material from other species, and loss of gene sequences that interfere with pathogenicity (35, 42).
Certain horizontal gene transfer events have been key to the evolution
of Shigella. A quintessential feature of Shigella
is its ability to invade mammalian cells and access the cell cytoplasm, defining a niche unique among enteric gram-negative bacteria, with the
exception of enteroinvasive E. coli. Thus, the acquisition and evolution of the ipa-mxi-spa pathogenicity island, which
encodes all of the genes required for cell invasion and phagolysosomal lysis, permitted a major alteration in pathogenesis (24).
Likewise, the acquisition of virG (icsA), which
mediates actin assembly on Shigella, and virF and
virB, the regulators of the virG and ipa-mxi-spa loci, were key to the emergence of
Shigella. Since all Shigella serotypes contain
these loci, they were probably all present on the prototypic virulence plasmid.
It is not known which ORFs were acquired subsequent to the acquisition
of the plasmid by the first Shigella strain. One might predict that access to an intracellular niche would present the organism with selective pressure to acquire certain factors that would
not have provided selective advantage previously. These might include
factors that permit resistance to the cellular immune response of the
host or utilization of nutrients present in the host cytoplasm. Of
note, the cellular immune response is ineffective against
Shigella in animal models (72), and
Shigella-specific cytotoxic T lymphocytes have not been
isolated from convalescent individuals. In addition, factors that
permit the bacterium to optimize its lifestyle in the human colon may
also have been acquired after acquisition of the prototypic virulence
plasmid. An example of this is the acquisition by horizontal transfer
of O-antigen genes, such as those present on the virulence plasmid of
S. sonnei, and subsequent inactivation of native O-antigen
genes (30). Serotypic diversity due to the variations in O
antigen is seen among Shigella strains. Such diversity
likely facilitates evasion of the host humoral immune response. Further
analysis of the function of many ORFs of unknown function on pWR501 as
well as comparative analysis of virulence plasmids from other
Shigella serotypes will allow a more complete
characterization of the evolution of the plasmid.
 |
ACKNOWLEDGMENTS |
We thank Guy Plunkett III for assistance with DNA analysis and
helpful discussions, Vessela Ivanova for assistance in the isolation of
DNA and analysis of the replicon, Sara Klink, George F. Mayhew, Guy
Plunkett III, and Helen J. Wing for help with confirming the sequence
assembly, Nicole Perna for critical reading of the manuscript, Luther
Lindler for many helpful discussions, and the sequencing teams of the
Genome Center of Wisconsin for excellent technical work.
The sequencing work was supported by NIH/NIAID, and the sequence
annotation was supported by NIH grant AI43562 (M.B.G.).
 |
ADDENDUM IN PROOF |
Buchrieser et al. (C. Buchrieser, P. Glaser, C. Rusniok, H. Nedjeri, H. d'Hauteville, F. Kunst, P. Sansonetti, and C. Parsot, Mol.
Microbiol. 38:760-771, 2000) have also recently sequenced and analyzed the Shigella flexneri virulence plasmid.
 |
FOOTNOTES |
*
Corresponding author. Mailing address for Malabi M. Venkatesan: Department of Enteric Infections, Division of Communicable Diseases and Immunology, Walter Reed Army Institute of Research, 503 Robert Forney Dr., Silver Spring, MD 20910. Phone: (301) 319-9764. Fax:
(301) 319-9801. E-mail:
Malabi.Venkatesan{at}NA.AMEDD.ARMY.MIL. Mailing
address for Marcia B. Goldberg: Infectious Disease Division, Massachusetts General Hospital, GRJ504, 55 Fruit St., Boston, MA
02114. Phone: (617) 432-3238. Fax: (617) 432-3259. E-mail: mgoldberg1{at}partners.org.
Editor:
V. J. DiRita
 |
REFERENCES |
| 1.
|
Bahassi, E. M.,
M. H. O'Dea,
N. Allali,
J. Messens,
M. Gellert, and M. Couturier.
1999.
Interactions of CcdB with DNA gyrase. Inactivation of Gyra, poisoning of the gyrase-DNA complex, and the antidote action of CcdA.
J. Biol. Chem.
274:10936-10944[Abstract/Free Full Text].
|
| 2.
|
Benjelloun-Touimi, Z.,
M. S. Tahar,
C. Montecucco,
P. J. Sansonetti, and C. Parsot.
1998.
SepA, the 110 kDa protein secreted by Shigella flexneri: two domain structure and proteolytic activity.
Microbiology
1444:1815-1822.
|
| 3.
|
Blattner, F. R.,
G. Plunkett,
C. A. Bloch,
N. T. Perna,
V. Burland,
M. Riley,
J. Collado-Vides,
J. D. Glasner,
C. K. Rode,
G. F. Mayhew,
J. Gregor,
N. W. Davis,
H. A. Kirkpatrick,
M. A. Goeden,
D. J. Rose,
B. Mau, and Y. Shao.
1997.
The complete genome sequence of Escherichia coli K-12.
Science
277:1453-1462[Abstract/Free Full Text].
|
| 4.
|
Bork, P.
1993.
Hundreds of ankyrin-like repeats in functionally diverse proteins: mobile modules that cross phyla horizontally?
Proteins
17:363-374[CrossRef][Medline].
|
| 5.
|
Borodovsky, M., and J. McIninch.
1993.
GeneMark: parallel gene recognition for both DNA strands.
Comput. Chem.
17:123-133.
|
| 6.
|
Bruand, C., and S. D. Ehrlich.
2000.
UvrD-dependent replication of rolling-circle plasmids in Escherichia coli.
Mol. Microbiol.
35:204-210[CrossRef][Medline].
|
| 7.
|
Burland, V.,
D. L. Daniels,
G. Plunkett, and F. R. Blattner.
1993.
Genome sequencing on both strands: the Janus strategy.
Nucleic Acids Res.
21:3385-3390[Abstract/Free Full Text].
|
| 8.
|
Burland, V.,
Y. Shao,
N. T. Perna,
G. Plunkett,
H. J. Sofia, and F. R. Blattner.
1998.
The complete DNA sequence and analysis of the large virulence plasmid of Escherichia coli O157:H7.
Nucleic Acids Res.
26:4196-4204[Abstract/Free Full Text].
|
| 9.
|
Buysse, J. M.,
M. Venkatesan,
J. A. Mills, and E. V. Oaks.
1990.
Molecular characterization of a trans-acting, positive effector (ipaR) of invasion plasmid antigen synthesis in Shigella flexneri serotype 5.
Microb. Pathog.
8:197-211[CrossRef][Medline].
|
| 10.
|
Clerget, M.
1991.
Site-specific recombination promoted by a short DNA segment of plasmid R1 and by a homologous segment in the terminus region of the Escherichia coli chromosome.
New Biol.
3:780-788[Medline].
|
| 11.
|
Cole, S. T.,
R. Brosch,
J. Parkhill,
T. Garnier,
C. Churcher,
D. Harris,
S. V. Gordon,
K. Eiglmeier,
S. Gas,
C. E. Barry,
F. Tekaia,
K. Badcock,
D. Basham,
D. Brown,
T. Chillingworth,
R. Connor,
R. Davies,
K. Devlin,
T. Feltwell,
S. Gentles,
N. Hamlin,
S. Holroyd,
T. Hornsby,
K. Jagels,
B. G. Barrell, et al.
1998.
Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence.
Nature
393:537-544[CrossRef][Medline].
|
| 12.
|
Coster, T.,
C. W. Hoge,
L. VandeVerg,
A. B. Hartman,
E. V. Oaks,
M. M. Venkatesan,
D. Cohen,
G. Robin,
A. Fontaine-Thompson,
P. J. Sansonetti, and T. L. Hale.
1998.
Vaccination against shigellosis with attenuated Shigella flexneri 2a strain SC602.
Infect. Immun.
67:3437-3443[Abstract/Free Full Text].
|
| 13.
|
Cousineau, B.,
S. Lawrence,
D. Smith, and M. Belfort.
2000.
Retrotransposition of a bacterial group II intron.
Nature
404:1018-1021[CrossRef][Medline].
|
| 14.
|
Davis, M. A.,
L. Radnedge,
K. A. Martin,
F. Hayes,
B. Youngren, and S. J. Austin.
1996.
The P1 ParA protein and its ATPase activity play a direct role in the segregation of plasmid copies to daughter cells.
Mol. Microbiol.
21:1029-1036[CrossRef][Medline].
|
| 15.
|
Demers, B.,
P. J. Sansonetti, and C. Parsot.
1998.
Induction of type III secretion in Shigella flexneri is associated with differential control of transcription of genes encoding secreted proteins.
EMBO J.
17:2894-2903[CrossRef][Medline].
|
| 16.
|
Edwards, C. J.,
D. J. Innes,
D. M. Burns, and I. R. Beacham.
1993.
UDP-sugar hydrolase isoenzymes in Salmonella enterica and Escherichia coli: silent alleles of ushA in related strains of group I Salmonella isolates, and of ushB in wild-type and K12 strains of E. coli, indicate recent and early silencing events, respectively.
FEMS Microbiol. Lett.
114:293-298[CrossRef][Medline].
|
| 17.
|
Egile, C.,
H. d'Hauteville,
C. Parsot, and P. J. Sansonetti.
1997.
SopA, the outer membrane protease responsible for polar localization of IcsA in Shigella flexneri.
Mol. Microbiol.
23:1063-1073[CrossRef][Medline].
|
| 18.
|
Fallarino, A.,
C. Mavarangelos,
U. H. Stroeher, and P. A. Manning.
1997.
Identification of additional genes required for O-antigen biosynthesis in Vibrio cholerae O1.
J. Bacteriol.
179:2147-2153[Abstract/Free Full Text].
|
| 19.
|
Fernandez-Prada, C. M.,
D. L. Hoover,
B. D. Tall,
A. B. Hartman,
J. Kopelowitz, and M. M. Venkatesan.
2000.
Shigella flexneri IpaH7.8 facilitates escape of virulent bacteria from the endocytic vacuoles of mouse and human macrophages.
Infect. Immun.
68:3608-3619[Abstract/Free Full Text].
|
| 20.
|
Fernandez-Prada, C. M.,
D. L. Hoover,
B. D. Tall, and M. M. Venkatesan.
1997.
Human monocyte-derived macrophages infected with virulent Shigella flexneri in vitro undergo a rapid cytolytic event similar to oncosis but not apoptosis.
Infect. Immun.
65:1486-1496[Abstract].
|
| 21.
|
Galan, J. E., and A. Collmer.
1999.
Type III secretion machines: bacterial devices for protein delivery into host cells.
Science
284:1322-1328[Abstract/Free Full Text].
|
| 22.
|
Garcia-DelPortillo, F.,
M. G. Pucciarelli, and J. Casadesus.
1999.
DNA adenine methylase mutants of Salmonella typhimurium show defects in protein secretion, cell invasion, and M cell cytotoxicity.
Proc. Natl. Acad. Sci. USA
96:11578-11583[Abstract/Free Full Text].
|
| 23.
|
Gomez-Duarte, O. G., and J. B. Kaper.
1995.
A plasmid-encoded regulatory region activates chromosomal eaeA expression in enteropathogenic Escherichia coli.
Infect. Immun.
63:1767-1776[Abstract].
|
| 24.
|
Groisman, E. A., and H. Ochman.
1996.
Pathogenicity islands: bacterial evolution in quantum leaps.
Cell
87:791-794[CrossRef][Medline].
|
| 25.
|
Gronlund, H., and K. Gerdes.
1999.
Toxin-antitoxin systems homologous with relBE of Escherichia coli plasmid P307 are ubiquitous in prokaryotes.
J. Mol. Biol.
285:1401-1415[CrossRef][Medline].
|
| 26.
|
Heithoff, D. M.,
R. L. Sinsheimer,
D. A. Low, and M. J. Mahan.
1999.
An essential role for DNA adenine methylation in bacterial virulence.
Science
284:967-970[Abstract/Free Full Text].
|
| 27.
|
Jackson, R. W.,
E. Athanassopoulos,
G. Tsiamis,
J. W. Mansfield,
A. Sesma,
D. L. Arnold,
M. J. Gibbon,
J. Murillo,
J. D. Taylor, and A. Vivian.
1999.
Identification of a pathogenicity island, which contains genes for virulence and avirulence, on a large native plasmid in the bean pathogen Pseudomonas syringae pathovar phaseolicola.
Proc. Natl. Acad. Sci. USA
96:10875-10880[Abstract/Free Full Text].
|
| 28.
|
Kajava, A. V.
1998.
Structural diversity of leucine-rich repeat proteins.
J. Mol. Biol.
277:519-527[CrossRef][Medline].
|
| 29.
|
Kotloff, K. L.,
J. P. Winickoff,
B. Ivanoff,
J. D. Clemens,
D. L. Swerdlow,
P. J. Sansonetti,
G. K. Adak, and M. M. Levine.
1999.
Global burden of Shigella infections: implications for vaccine development and implementation of control strategies.
Bull W. H. O.
77:651-666[Medline].
|
| 30.
|
Lai, V.,
L. Wang, and P. R. Reeves.
1998.
Escherichia coli clone Sonnei (Shigella sonnei) had a chromosomal O-antigen gene cluster prior to gaining its current plasmid-borne O-antigen genes.
J. Bacteriol.
180:2983-2986[Abstract/Free Full Text].
|
| 31.
|
Mahillon, J.,
H. A. Kirkpatrick,
H. L. Kijenski,
C. A. Bloch,
C. K. Rode,
G. F. Mayhew,
D. J. Rose,
G. Plunkett,
V. Burland, and F. R. Blattner.
1998.
Subdivision of Escherichia coli K-12 genome for sequencing: manipulation and DNA sequence of transposable elements introducing unique restriction sites.
Gene
223:47-54[CrossRef][Medline].
|
| 32.
|
Mahillon, J.,
C. Leonard, and M. Chandler.
1999.
IS elements as constituents of bacterial genomes.
Res. Microbiol.
150:675-687[Medline].
|
| 33.
|
Martinez-Abarca, F.,
S. Zekri, and N. Toro.
1998.
Characterization and splicing in vivo of a Sinorhizobium meliloti group II intron associated with particular insertion sequences of the IS630-Tc1/IS3 retrotransposon superfamily.
Mol. Microbiol.
28:1295-1306[CrossRef][Medline].
|
| 34.
|
Maurelli, A. T.,
B. Blackmon, and R. Curtiss.
1984.
Loss of pigmentation in Shigella flexneri 2a is correlated with loss of virulence and virulence-associated plasmid.
Infect. Immun.
43:397-401[Abstract/Free Full Text].
|
| 35.
|
Maurelli, A. T.,
R. E. Fernandez,
C. A. Bloch,
C. K. Rode, and A. Fasano.
1998.
"Black holes" and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli.
Proc. Natl. Acad. Sci. USA
95:3943-3948[Abstract/Free Full Text].
|
| 36.
|
Mills, J. A.,
M. Venkatesan,
L. S. Baron, and J. M. Buysse.
1992.
Spontaneous insertion of an IS1-like element into the virF gene is responsible for avirulence in opaque colonial variants of Shigella flexneri 2a.
Infect. Immun.
60:175-182[Abstract/Free Full Text].
|
| 37.
|
Nakata, N.,
T. Tobe,
I. Fukuda,
T. Suzuki,
K. Komatsu,
M. Yoshikawa, and C. Sasakawa.
1993.
The absence of a surface protease, OmpT, determines the intercellular spreading ability of Shigella: the relationship between the ompT and kcpA loci.
Mol. Microbiol.
9:459-468[Medline].
|
| 38.
|
Nataro, J. P.,
J. Seriwatana,
A. Fasano,
D. R. Maneval,
L. D. Guers,
F. Noriega,
F. Dubovsky,
M. M. Levine, and J. G. Morris.
1995.
Identification and cloning of a novel plasmid-encoded enterotoxin of enteroinvasive Escherichia coli and Shigella strains.
Infect. Immun.
63:4721-4728[Abs |