ABSTRACT
Enteropathogenic Escherichia coli (EPEC) is a leading cause of moderate to severe diarrhea among young children in developing countries, and EPEC isolates can be subdivided into two groups. Typical EPEC (tEPEC) bacteria are characterized by the presence of both the locus of enterocyte effacement (LEE) and the plasmid-encoded bundle-forming pilus (BFP), which are involved in adherence and translocation of type III effectors into the host cells. Atypical EPEC (aEPEC) bacteria also contain the LEE but lack the BFP. In the current report, we describe the complete genome of outbreak-associated aEPEC isolate E110019, which carries four plasmids. Comparative genomic analysis demonstrated that the type III secreted effector EspT gene, an autotransporter gene, a hemolysin gene, and putative fimbrial genes are all carried on plasmids. Further investigation of 65 espT-containing E. coli genomes demonstrated that different espT alleles are associated with multiple plasmids that differ in their overall gene content from the E110019 espT-containing plasmid. EspT has been previously described with respect to its role in the ability of E110019 to invade host cells. While other type III secreted effectors of E. coli have been identified on insertion elements and prophages of the chromosome, we demonstrated in the current study that the espT gene is located on multiple unique plasmids. These findings highlight a role of plasmids in dissemination of a unique E. coli type III secreted effector that is involved in host invasion and severe diarrheal illness.
INTRODUCTION
Enteropathogenic Escherichia coli (EPEC) is a leading cause of moderate to severe diarrhea among children in developing countries (1, 2). EPEC is also among the earliest and most extensively characterized pathogenic variants (pathovars) or pathogenic types (pathotypes) of diarrheagenic E. coli, in terms of its pathogenic mechanism (3, 4). EPEC isolates carry several canonical virulence features, including the locus of enterocyte effacement (LEE) region, which encodes a type III secretion system (T3SS) and the intimin adherence protein (5, 6). EPEC isolates also lack the Shiga toxin-encoding genes which are characteristic of enterohemorrhagic E. coli (EHEC) isolates such as E. coli O157:H7 (3, 4). EPEC isolates that also encode the bundle-forming pilus (BFP) (7–9) are subclassified as typical EPEC (tEPEC) isolates, while atypical EPEC (aEPEC) isolates lack the genes encoding the BFP (3, 4).
E. coli isolates identified as tEPEC primarily cause diarrhea in children and have been linked to diarrheal outcomes more severe than those seen with aEPEC (2). Also, aEPEC isolates have been identified as similarly prevalent among both diarrheal and healthy children, as well as being identified among healthy adults (2, 10–12); however, aEPEC isolates have been associated with diarrheal outbreaks in some cases (13–17). One of the most notable diarrheal outbreaks attributed to an aEPEC isolate occurred in 1987 among children at a school in Finland; the outbreak then spread to adults and other family members, a feature that is not common among aEPEC isolates (14). Subsequent molecular studies and genetic characterizations of aEPEC isolate E110019, which is an aEPEC archetype isolate from the Finland outbreak, demonstrated that it contains several unique virulence genes and phenotypes that are not present in other traditional aEPEC isolates (18). Among the most notable and unique features of E110019 is the presence of the EspT effector, which was previously found to have a narrow range of distribution among EPEC and EHEC isolates (18–23) and is known to facilitate invasion of nonphagocytic host cells by inducing the formation of membrane ruffles, where it then occupies a vacuole and forms intracellular actin pedestals (18, 23). While EspT is known to be an important virulence feature and also relatively exclusive to E110019 compared to other aEPEC isolates, it is not known whether EspT had a significant role in the E110019-related outbreak. Also, it is possible there may be other unique features of aEPEC isolate E110019 that could have contributed to the ability of E110019 to cause illness that was more severe than that caused by other aEPEC isolates. Thus, we used long-read sequencing in the current study to generate a complete genome assembly of aEPEC isolate E110019 and to identify additional unique examples of genomic content or notable features of this aEPEC isolate for investigation in future functional studies.
A draft genome assembly of aEPEC isolate E110019 was generated with previous sequencing technology over a decade ago (19); however, with the subsequent improvement of sequencing technologies, it has become more cost-effective and feasible to generate complete genome assemblies. A complete genome assembly allows a more comprehensive view of genome structure and content, especially of important historical reference isolates such as this one. Thus, we used long-read sequencing in the current study to generate a complete genome assembly of E. coli E110019 and to provide additional insight into the unique genomic content of this important outbreak-associated aEPEC archetype isolate. We also describe all of the plasmids of E. coli E110019, including an espT-containing plasmid, and describe the distribution of this plasmid and other potentially espT-containing plasmids among diverse aEPEC isolates.
RESULTS AND DISCUSSION
Complete genome sequence of aEPEC archetype isolate E110019.The complete genome assembly of aEPEC isolate E110019 is 5.47 Mb with overall GC content of 50.67% and contains six genomic molecules, which include a single chromosome, a phage-like molecule, and four circular plasmids (Table 1; see also Tables S1 through S4). The chromosome is 5.24 Mb with GC content of 50.76%. The four plasmids of E. coli E110019 range in sequence length from 5.68 to 66.17 kb and have GC content of 41.60% to 52.44% (Table 1). Previously described E. coli virulence genes identified on the chromosome of E110019 include the LEE region and a suite of non-LEE T3SS effectors (Table 1). While the majority of the T3SS effectors were identified on the chromosome of E110019, the espT gene was identified on a 65-kb plasmid, pE110019_65 (Table 1). The pE110019_65 plasmid encodes two additional previously described E. coli virulence-associated factors, including the Pet autotransporter (encoded by pet) (24) and a hemolysin (encoded by hlyCABD) (25) (Table 1; see also Table S2 in the supplemental material). Plasmid pE110019_65 also contains genes with predicted functions in partitioning and toxin-antitoxins that increase plasmid stability; however, this plasmid does not contain any known conjugative transfer genes involved in plasmid mobility (Table S2). The three other E110019 plasmids did not contain any previously described E110019 virulence genes; however, they did contain several putative virulence genes and/or known antibiotic resistance genes. These putative virulence genes have not been functionally characterized as having a role in the virulence mechanism of E110019. Among the novel virulence genes is a predicted fimbrial gene cluster that is located on the 66-kb plasmid (pE110019_66); however, whether these genes produce functional fimbriae and whether the fimbriae contribute to the virulence mechanism of aEPEC isolate E110019 are unknown (Table S1). The 33-kb plasmid, pE110019_33, does not contain any known or predicted virulence genes but does carry genes for conjugative transfer (Table S3). While the smallest plasmid of E110019, pE110019_5, does not contain any known E110019 virulence genes or other unknown virulence genes, this plasmid carries two different genes that each encode a TEM β-lactamase (Table 1; see also Table S4).
Characteristics of the complete E110019 genome assemblya
Phylogenomic similarity of E110019 and other espT-containing E. coli isolates.As previous studies have demonstrated that EspT is a unique effector that has a narrow range of distribution among EPEC isolates and plays an important role in the ability of E110019 to invade host cells (18, 21, 23, 26), we chose to further investigate the phylogenetic relatedness of the complete E110019 genome to all other publicly available E. coli genomes that carry the espT gene. In silico detection of espT among the 12,253 assembled E. coli genomes that were publicly available in GenBank as of July 2018 demonstrated that there were only 65 sequenced E. coli genomes in addition to E110019 that contained espT (Fig. 1; see also Table S5). All of the espT-containing E. coli genome assemblies also carry LEE genes, in particular, the T3SS structural genes, which are required for secretion of T3SS effectors such as EspT. Also, the majority of the espT-containing genomes (93.85%, 61/65) did not contain the BFP genes or any Shiga toxin genes, and therefore the majority of these genomes would be molecularly classified as aEPEC (Table S5). The other four espT-containing genomes did contain BFP genes but were lacking the Shiga toxin genes, indicating that they are tEPEC genomes (Table S5). The extremely low prevalence of espT among previously sequenced E. coli genomes (0.54%, 66/12,253) was similar to the low 1.8% prevalence of espT that was reported among EPEC and EHEC genomes in a previous PCR-based analysis (21).
Phylogenomic analysis of aEPEC isolate E110019 and other espT-containing E. coli isolates. The phylogeny was constructed from 175,207 conserved SNP sites per genome that were identified compared to the reference genome of E. coli isolate IAI39 (GenBank accession no. NC_011750.1). The tree scale indicates the distance of 0.01 nucleotide changes per site. Bootstrap values of ≥80 are indicated by a circle over each node. The presence of espT in each genome is indicated by an orange square next to each genome name, while genomes that contain BFP genes and are classified as tEPEC genomes have a purple square (see inset figure key). E. coli phylogroups (A, B1, B2, D, E, and F) are labeled on the interior of the phylogeny. The subclade that contains the E. coli E110019 genome is indicated with red labels, while a closely related clade (indicated with blue labels) contains espT with sequence divergence as well as at a location on a distinct plasmid compared to that of E. coli E110019.
Phylogenomic analysis of E110019 and the additional espT-containing EPEC genomes from a collection of 48 diverse E. coli and Shigella reference genomes demonstrated that the espT-containing EPEC genomes have considerable genomic diversity (Fig. 1; see also Table S5). More than half (54.55%, 36/66) of the espT-containing genomes were identified in phylogroup B1 along with the genome of E110019; however, espT-containing EPEC genomes were also identified in phylogroups B2 (16.67%, 11/66), E (15.15%, 10/66), A (12.12%, 8/66), and D (1.52%, 1/66) (Fig. 1; see also Table S5). The E110019 genome was identified in a phylogenomic lineage with 16 additional espT-containing EPEC genomes, which could be further separated into two subgroups based on their espT gene sequences (Fig. 1, red and blue genome labels). The four espT-containing tEPEC genomes were identified in phylogroup B2 (Fig. 1), and two of the tEPEC genomes (EPEC11 and MOD1-EC1392) were identified in the EPEC1 lineage, which contains tEPEC archetype isolate E2348/69 (Fig. 1). In contrast, the espT-containing aEPEC genomes were identified in a greater number of phylogroups (phylogroups A, B1, B2, D, and E) (Fig. 1). Two of the aEPEC genomes (P175_4 and C488-07) grouped with another archetype tEPEC isolate, namely, B171, in the EPEC2 lineage of phylogroup B1 (Fig. 1), possibly indicating that they were tEPEC at one point but have lost the BFP-containing plasmid; however, we cannot determine this from the genome sequence. Interestingly, although none of the espT-containing genomes were identified with Shiga toxin genes, there were nine genomes that are closely related to reference EHEC genomes (Fig. 1). One of the espT-containing aEPEC genomes (403308_aEPEC) grouped with the non-O157 EHEC reference genomes (11368 and 11128) in phylogroup B1 (Fig. 1). The other eight espT-containing genomes that exhibited phylogenomic similarity to EHEC grouped with aEPEC O55:H7 isolate CB9615, which is closely related to and considered to be a precursor of the EHEC O157:H7 isolate (27–31) (Fig. 1). All of the espT-containing aEPEC genomes that grouped with CB9615 were bioinformatically predicted to have the O55:H7 serotype. These genomic findings demonstrate that the espT gene can been acquired by genomically diverse E. coli isolates that carry genes of the LEE-encoded T3SS, which is located on the chromosome and is a defining feature of EPEC and EHEC isolates. Although these additional espT-containing E. coli isolates are primarily aEPEC, there were 13 isolates that were tEPEC or were closely related to previously described EHEC genomes, demonstrating the potential for this T3SS effector to be acquired by other diarrheagenic E. coli isolates that are typically noninvasive.
Dissemination of espT via mobile elements.Following identification of the espT gene on a plasmid, we further investigated the association of espT with mobile elements in the E110019 genome and the 65 espT-containing EPEC genomes (Fig. 2). Annotation of the espT-containing pE110019_65 plasmid demonstrated that espT is located in a region of the plasmid that contains additional known virulence genes, including the pet gene encoding a serine protease autotransporter (24) and hemolysin genes hlyCABD (25) (Fig. 2A). The espT gene and additional virulence genes in this region are flanked on each side by predicted insertion sequence element genes (Fig. 2A). In silico detection of the pE110019_65 protein-coding genes in the other espT-containing genomes demonstrated that the majority of this plasmid was present only among the espT-containing genomes that are the most phylogenomically similar to the E110019 genome (Fig. 1, red labels; see also Fig. 2, red labels). The pet gene was detected only in E110019 and the other closely related genomes, while the hemolysin genes were present in an additional nine of the espT-containing genomes; however, those genomes were missing other portions of the pE110019_65 plasmid (Fig. 2B). Both the annotation and the limited distribution of these additional virulence genes demonstrate that this virulence-encoding region of pE110019_65 is modular and that only E110019 and seven closely related genomes carry an identical virulence region and similar overall plasmid content (Fig. 2B). The espT-containing genomes of the other E110019 lineage subgroup (Fig. 1, blue labels) contained only espT and not the additional virulence genes or other genes of pE110019_65 (Fig. 2B).
Diagram of the virulence gene regions of E. coli E110019 espT plasmid pE110019_65 and in silico detection of pE110019_65 genes in other espT-containing E. coli genomes. (A) Diagram illustrating the orientation and sequence length of the protein-coding genes of the pE110019_65 plasmid. Each arrow indicates a different protein-coding gene, and colors indicate their predicted protein functions as follows: red, virulence; yellow, mobile elements; gray, hypothetical/unknown function. Virulence genes of interest are indicated within black rectangles and are labeled with the gene name (underneath in bold type). (B) Each column of the heat map represents a different plasmid gene of pE110019_65, ranging from gene E110019_5249 to gene E110019_5332 (from left to right across the heat map; see Table S3). Selected genes of interest are indicated by labels at the bottom of the heat map and enclosed within a red rectangle. The colors of the heat map indicate genes that were detected with significant similarity values (light green) or with divergent similarity values (blue-green) or that were absent (dark blue) in each of the genomes analyzed. Rows represent individual espT-containing genomes that are categorized on the left of the heat map by phylogroup (see inset key). Red genome labels on the right side of the heat map indicate E. coli E110019 and other related genomes within the same clade in the phylogenomic analysis, while blue genome labels indicate genomes from an adjacent clade that had divergent espT sequences and a different espT-containing plasmid.
The espT gene sequences identified in the 65 other E. coli genomes exhibited 86% to 100% nucleotide (nt) similarity to the reference espT sequence from E110019 (see Fig. S1 in the supplemental material). The espT-like regions identified in 11 of the genomes were truncated and missing the first 48 bp compared to the espT of E110019 (Fig. S1). Phylogenetic analysis of the espT nucleotide sequences demonstrated that in some instances the espT genes formed groups similar to those observed in the whole-genome phylogenomic analysis (Fig. 1; see also Fig. S1). There were also many espT genes from genomes of different phylogroups that grouped together, demonstrating that the espT alleles were likely horizontally acquired at different times by these EPEC isolates (Fig. S1). The two subgroups of genomes that were most closely related to the E110019 genome in the phylogenomic analysis (Fig. 1, red and blue labels) formed two separate groups in the espT gene phylogeny, demonstrating that although the genomes are related, these EPEC isolates have distinct espT alleles (Fig. S1). These findings demonstrate that espT and the chromosome are evolving separately in many of the EPEC isolates, unlike previous findings from studies performed with chromosomally encoded effectors, including both LEE-encoded and non-LEE-encoded effectors, which appeared to be conserved in phylogenomic groups (10, 32).
The espT genes were identified on contigs that ranged in length from 741 bp to 68,917 bp (Table S5). Plasmid markers were identified on the same contig as that containing the espT gene in 21.54% (14/65) of the additional espT-containing genomes, all of which were identified as aEPEC genomes (Table S5). Analysis of the additional predicted gene content located on the espT-containing contigs in the genomes with the two longest plasmid-like contigs further suggested that espT is carried on a plasmid in these genomes. For instance, espT of aEPEC isolate KTE142 was identified on a 47,848-bp contig containing the IncFII(AY458016) and IncFIB(AP001918) plasmid replicon markers (Table S5). IncFIB(AP001918) was also identified on the espT plasmid of E110019 (pE110019_65) (Table 1). Also, the espT-containing contig of KTE142 had several regions of >3 kb that exhibited >90% nucleotide identity to pE110019_65, which was demonstrated in the heat map, as the regions of similarity of pE110019_65 were identified in the KTE142 genome (Fig. 2B). The genome of aEPEC isolate KTE142 was identified in the subgroup adjacent to E110019 but contained a divergent espT gene sequence (87% nt identity to E110019) (Fig. 1, blue genome labels). Interestingly, the espT contig of KTE142 also contains a bfpA gene, which encodes the major subunit of the BFP; however, the additional genes required for BFP production (bfpB through bfpL) were truncated or missing from this contig (7–9). A similar 31,626-bp contig containing bfpA and espT was identified in the genome of aEPEC isolate IAL4511, which is in the same phylogenomic group as KTE142 (Fig. 1, blue genome labels). In contrast, the 6,891-bp espT-containing contig of aEPEC isolate P219-2 did not contain any detectable common plasmid-typing markers, although it did contain numerous plasmid-associated genes, including those with predicted functions involved in replication, partitioning, and conjugative transfer (GenBank accession no. PJEI00000000) (Table S5). This plasmid also differed from the other espT plasmid contigs in that it contained a truncated version of espT (Fig. S1) and exhibited little nucleotide similarity to the espT-containing plasmid of E. coli E110019 (Fig. 2B). Thus, comparison of additional plasmid-like espT-containing contigs demonstrated that some of the aEPEC isolates acquired espT via a truncated version of the tEPEC virulence plasmid, while other E. coli isolates acquired espT via a different plasmid that, in the case of aEPEC isolate P219-2 (GenBank accession no. PJEI00000000), carries genes for conjugative transfer and exhibits little overall similarity to the espT-containing plasmid of E. coli E110019 (Fig. 2). Further investigation is necessary to verify the existence of espT on these plasmids and to determine the host range of these different espT-containing plasmids.
Distribution of the additional E110019 plasmids.Also, we further investigated the distribution of all of the additional isolate E110019 plasmids among the other espT-containing E. coli genomes that were analyzed to determine whether any of these plasmids might have cotransferred with the espT-containing plasmid or whether the plasmids are unique to E110019 (Fig. S2 to S4). The largest E110019 plasmid, pE110019_66, was detected in diverse genomes of phylogroups A, B1, B2, and E (Fig. S2), whereas espT-containing plasmid pE110019_65 was detected in 80% (8/10) of the genomes of the lineage subgroup that contains E110019 (Fig. 2B, red labels). Fimbrial plasmid pE110019_66 was identified in only three of the genomes in the E110019-containing subgroup (Fig. S2, red labels). Also, the pE110019_66 fimbrial genes were not identified in any of the other espT-containing genomes, demonstrating the unique association of these putative fimbrial genes with aEPEC isolate E110019 and not with the other espT-containing aEPEC isolates. The fimbrial genes were detected in 48 of the E. coli genomes that did not contain espT. These additional fimbrial-gene-containing E. coli genomes did not contain any of the genes encoding the LEE, BFP, or Shiga toxin, and further investigation would be necessary to determine the likely origin and distribution of these genes among other groups of E. coli isolates. Plasmid pE110019_33, which did not contain any known or putative virulence genes, was identified in only one of the other espT-containing genomes, and it was a genome of phylogroup B2 rather than one of the genomes closely related to E110019 (Fig. S3). The smallest plasmid of aEPEC isolate E110019, namely, pE110019_5, was present in its entirety only in the genome of E. coli E110019 (Fig. S4). Overall, the narrow range of distribution of the additional plasmids of aEPEC isolate E110019 further highlights that this outbreak-associated isolate contains unique virulence-associated genomic content not present in the other E. coli isolates that have been genomically characterized to date. Further investigation is necessary to determine whether any of these additional plasmid-encoded features are involved in the unique virulence mechanisms of aEPEC isolate E110019.
Conclusions.By generating the complete genome assembly of aEPEC archetype isolate E110019, we identified several additional putative virulence factors, including a novel fimbrial gene cluster in E. coli, which may contribute to the unique virulence mechanisms of this important outbreak-associated E. coli isolate. Also, we demonstrate that, in contrast to previously described T3SS effectors, which have been identified on insertion elements or phage located on the chromosome of E. coli isolates, the espT gene of E. coli E110019 is located on a plasmid. The distribution of the entire espT-containing plasmid of E. coli E110019 is limited to only seven additional EPEC genomes; however, we found that espT is located on additional plasmids in other EPEC genomes. These findings highlight another role of plasmids in disseminating virulence-associated genes among diverse EPEC isolates and provide additional insight into the unique genomic content and potential virulence mechanisms of outbreak-associated aEPEC archetype isolate E110019.
MATERIALS AND METHODS
Genome sequencing and assembly.Genomic DNA of EPEC isolate E110019 was isolated by alkaline lysis extraction as previously described (33). DNA was used to generate a sequencing library with a length of 18 kb and was sequenced using a Pacific Biosciences (PacBio) RS II platform and P6C4 chemistry in a single flow cell with standard methods at the Institute for Genome Sciences Genomics Resource Center (http://www.igs.umaryland.edu/resources/grc/) as previously described (34). PacBio raw data were corrected and assembled using HGAP assembler (SMRTAnalysis 2.3.0) (35) run with default parameters, and Minimus2 was used for circularization (36). The final circularized assembly was polished using Quiver (SMRTAnalysis 2.3.0) (35).
In silico multilocus sequence typing, plasmid typing, serotyping, antibiotic resistance gene detection, and virulence gene detection.The multilocus sequence types (MLSTs) of each of the E. coli genome assemblies were determined using the MLST scheme developed by Wirth et al. (37) as previously described (38). The sequences were queried against the BIGSdb database (39) to obtain allele numbers and a sequence type (ST) for each of E. coli genomes analyzed (see Table S1 in the supplemental material).
Plasmids were detected in the E. coli E110019 genome assembly using PlasmidFinder v.1.3 (40) with the default 95% nucleotide identity threshold. The molecular serotype of each E. coli genome was predicted using SerotypeFinder v.2.0 (https://cge.cbs.dtu.dk/services/SerotypeFinder/) with the default settings of an 85% nucleotide identity threshold and 60% minimum alignment length (41). Antibiotic resistance genes were detected in the E. coli E110019 genome assembly using the resistance gene identifier (RGI) and the comprehensive antibiotic resistance database (CARD) v.3.0.0, with perfect or strict identification criteria (42). Previously described E. coli virulence genes, including espT, were detected in E. coli E110019 and the other genomes analyzed using BLASTN BSR as previously described (10, 32, 43–45).
Phylogenomic analysis.The E. coli E110019 genome was included in phylogeny analysis performed with 65 other espT-containing E. coli genome assemblies available in GenBank as of July 2018 and with a collection of 48 diverse E. coli and Shigella reference genomes (Table S5). The genomes were analyzed using single nucleotide polymorphism (SNP)-based In Silico Genotyper (ISG) software as previously described (46, 47) to identify conserved and parsimony-informative SNP sites of all of the genomes relative to the genome of E. coli isolate IAI39 (GenBank accession number NC_011750.1) of phylogroup F. There were 175,781 SNP sites used to infer a maximum-likelihood phylogeny with RAxML v7.2.8 (48), using the GTR model of nucleotide substitution, the GAMMA model of rate heterogeneity, and 100 bootstrap replicates. The phylogeny was midpoint rooted and labeled using the interactive tree of life (iTOL) v.4 (49).
In silico detection of plasmid genes.The predicted protein-coding genes of the E. coli E110019 plasmids (pE110019_66, pE110019_65, pE110019_33, and pE110019_5) were identified in each of the genomes analyzed in this study using BLASTN large-scale BLAST score ratio (LS-BSR) analysis as previously described (10, 50). Clustered heat maps representing the BSR values were generated using the heatmap.2 function of gplots v. 3.0.1 in R v. 3.4.1. The genomes in each heat map were clustered using the default complete linkage method with Euclidean distance estimation.
Data availability.The complete genome assembly of E. coli E110019 was deposited in GenBank under the accession numbers listed in Table 1.
ACKNOWLEDGMENTS
We thank Jane Michalski for laboratory assistance as well as Rodrigo T. Hernandes and Luís F. dos Santos for access to aEPEC isolate genomes from Brazil.
This project was funded in part by federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under grant number U19 AI110820.
FOOTNOTES
- Received 24 May 2019.
- Returned for modification 29 June 2019.
- Accepted 18 July 2019.
- Accepted manuscript posted online 29 July 2019.
Supplemental material for this article may be found at https://doi.org/10.1128/IAI.00412-19.
- Copyright © 2019 American Society for Microbiology.