Phylogenetic distribution of sortase homologs and CWS-containing proteins

SpeciesSourceaNo. of homologs in sortase subfamilybCWS substratesc
AB345UnclassifieddNo. in genomeNo. assign.
ClusteredNot clustered
Actinobacteria (high-G+C gram-positive bacteria)
    Corynebacterium diphtheriaeSanger511613
    Corynebacterium efficiens514184
    Corynebacterium glutamicum31111
    Tropheryma whipplei Twist9111
    Tropheryma whipplei TW08/279111
    Streptomyces avermitilis3041313
    Streptomyces coelicolor8251313
    Thermobifida fuscaDOE111
    Bifidobacterium longum DJ010ADOE21104
    Bifidobacterium longum NCC27056221135
Chloroflexi (green nonSulfur bacteria)
    Chloroflexus aurantiacusDOE440
Firmicutes (gram-positive bacteria)
        Bacillus anthracis A20123311195
        Bacillus anthracis Ames58111104
        Bacillus anthracis KrugerBTIGR111114
        Bacillus anthracis WesternNATIGR111114
        Bacillus cereus ATCC 10987TIGR11411911
        Bacillus cereus ATCC 14579331112139
        Bacillus halodurans6812388
        Bacillus subtilis391122
        Geobacillus stearothermophilusOU111
        Oceanobacillus iheyensis69233
        Listeria innocua24113535
        Listeria monocytogenes 4bTIGR114040
        Listeria monocytogenes EGD-e24113838
        Staphylococcus aureus COLTIGR112121
        Staphylococcus aureus MRSA252Sanger112020
        Staphylococcus aureus MSSA476Sanger112222
        Staphylococcus aureus MW24112222
        Staphylococcus aureus Mu5040112020
        Staphylococcus aureus N31540112121
        Staphylococcus aureus NCTC 8325OU111818
        Staphylococcus epidermidis RP62ATIGR11111
        Staphylococcus epidermidis ATCC 12228CNHGC11118
        Enterococcus faecalis561112913
        Enterococcus faeciumDOE52157
        Lactobacillus gasseriDOE11313
        Lactobacillus plantarum3811010
        Pediococcus pentosaceusDOE133
        Leuconostoc mesenteroidesDOE230
        Oenococcus oeni MCWDOE111
        Lactococcus lactis subsp. Lactis131183
        Streptococcus agalactiae 2603V/R70152311
        Streptococcus agalactiae A909TIGR14199
        Streptococcus agalactiae NEM31625143514
        Streptococcus equiSanger112715
        Streptococcus gordoniiTIGR12222
        Streptococcus mitisTIGR11414
        Streptococcus mutans1166
        Streptococcus pneumoniae R62911414
        Streptococcus pneumoniae TIGR471131511
        Streptococcus pneumoniae 23FSanger11313
        Streptococcus pneumoniae 670-6BTIGR13119
        Streptococcus pyogenes MI GAS221111510
        Streptococcus pyogenes MGAS31510111613
        Streptococcus pyogenes MGAS823265111512
        Streptococcus pyogenes ManfredoSanger111514
        Streptococcus pyogenes SSI-IGIRC111513
        Streptococcus sobrinusTIGR11212
        Streptococcus suisSanger14209
        Streptococcus uberisSanger199
        Clostridium acetobutylicum52122
        Clostridium botulinumSanger111
        Clostridium difficileSanger177
        Clostridium perfringens ATCC 13124TIGR111123
        Clostridium perfringens 1364111143
        Clostridium tetani15133
        Ruminococcus albusTIGR122
    Gram-positive, totale42175413141714887684
  • a Either the reference for the genome sequence is given when available or the source of the preliminary sequence data: CNHGC (Chinese National Human Genome Center), DOE (U.S. Department of Energy Joint Genome Institute), GIRC (Genome Information Research Center), OU (University of Oklahoma Advanced Center for Genome Technology), Sanger (Sanger Institute), and TIGR (The Institute for Genomic Research).

  • b Sortase homologs are clustered into subfamilies according to sequence homology using BLAST profiles and HMMs.

  • c C-terminal sorting signal (CWS)-containing proteins. The first column is the total number of CWS-containing proteins identified as encoded in each respective genome, whereas the second column is the number of CWS-containing proteins that can be linked to a sortase homolog.

  • d The sortase homolog is not readily classified into a subfamily based on sequence homology. The first column, “clustered", denotes that these sortases can nevertheless be linked to a CWS-containing protein by genomic positioning. The second column denotes that these unclassified sortases are not genomically adjacent to a CWS-containing protein.

  • e For proteobacteria (purple bacteria and relatives—gram negative), each of the following has one sortase homolog, with one CWS-containing protein encoded in the genome and one CWS-containing protein that can be linked to a sortase homolog (abbreviations for sources, given in parentheses, are explained in footnote a): Bradyrhizobium japonicum (36), Colwellia psychrerythraea (TIGR), Microbulbifer degradans (DOE), Shewanella oneidensis (27), and Shewanella putrefaciens (TIGR). This brings the total for both gram-negative and gram-positive bacteria to 892 encoded CWS-containing proteins and 689 CWS- containing proteins linked to a sortase homolog.