ABSTRACT
As a central component of innate immunity, complement activation is a critical mechanism of containment and clearance of microbial pathogens in advance of the development of acquired immunity. Several pathogens restrict complement activation through the acquisition of host proteins that regulate complement activation or through the production of their own complement regulatory molecules (M. K. Liszewski, M. K. Leung, R. Hauhart, R. M. Buller, P. Bertram, X. Wang, A. M. Rosengard, G. J. Kotwal, and J. P. Atkinson, J. Immunol. 176:3725-3734, 2006; J. Lubinski, L. Wang, D. Mastellos, A. Sahu, J. D. Lambris, and H. M. Friedman, J. Exp. Med. 190:1637-1646, 1999). The infectious stage of the protozoan parasite Trypanosoma cruzi produces a surface-anchored complement regulatory protein (CRP) that functions to inhibit alternative and classical pathway complement activation (K. A. Norris, B. Bradt, N. R. Cooper, and M. So, J. Immunol. 147:2240-2247, 1991). This study addresses the genomic complexity of the T. cruzi CRP and its relationship to the T. cruzi supergene family comprising active trans-sialidase (TS) and TS-like proteins. The TS superfamily consists of several functionally distinct subfamilies that share a characteristic sialidase domain at their amino termini. These TS families include active TS, adhesions, CRPs, and proteins of unknown functions (G. A. Cross and G. B. Takle, Annu. Rev. Microbiol. 47:385-411, 1993). A sequence comparison search of GenBank using BLASTP revealed several full-length paralogs of CRP. These proteins share significant homology at their amino termini and a strong spatial conservation of cysteine residues. Alternative pathway complement regulation was confirmed for CRP paralogs with 58% (low) and 83% (high) identity to AAB49414. CRPs are functionally similar to the microbial and mammalian proteins that regulate complement activation. Sequence alignment of mammalian complement control proteins to CRP showed that these sequences are distinct, supporting a convergent evolutionary pathway. Finally, we show that a clonal line of T. cruzi expresses multiple unique copies of CRP that are differentially recognized by patient sera.
The complement system is a powerful immune mechanism, consisting of soluble protein markers, proteases, and membrane-bound receptors that identify and eliminate invading microorganisms and cellular debris. The complement cascade is activated through three distinct pathways: alternative, classical, and lectin. The activation of complement proteases in the absence of foreign material is detrimental to the host and, therefore, regulated tightly. Multiple different complement proteins monitor and control activation; these proteins are known as the regulators of complement activation (RCA). RCA family members include the soluble serum proteins factor H and C4-binding protein (C4bp), the cell surface proteins decay accelerating factor (DAF), complement receptor 1 (CR1), CR2, and membrane cofactor protein (MCP) (13). Most RCA proteins inhibit the formation of and/or accelerate the decay of the C3 and/or C5 convertase and many act as cofactors for factor I-mediated cleavage of C3b and C4b (24). RCA family members and other complement proteins (i.e., factor B and C2) share a common structural motif known as the CCP module (also known as short consensus repeat units or sushi domains). CCP modules are approximately 60 amino acids long with short intervening sequences of two to seven amino acids. Each unit has four conserved cysteine residues with a unique disulfide-bridging pattern (I/III and II/IV) as well as an almost invariant tryptophan residue and several conserved glycine, phenylalanine, and tyrosine residues (6). Individual RCA family members may have as few as four CCP modules (DAF) or as many as 30 (MCP), and primary sequence identity between CCP modules ranges from 20% to 99% (13). RCA proteins are composed primarily or entirely of CCP modules, with C3b and/or C4b binding sites localized mainly in the amino termini of these proteins (13).
Many microbes have evolved mechanisms that block complement activation and/or disrupt progression to the lytic pathway in a manner similar to that of RCA proteins (3, 11, 15, 17). Several enveloped viruses, such as human immunodeficiency virus, passively sequester host RCA into the envelopes of newly emerging viruses, thereby usurping the host's ability to regulate complement (4). Members of the Poxviridae and several members of the gammaherpesviruses produce proteins that are both structurally and functionally homologous to the complement control proteins (4). Other viruses of the Herpesviridae produce proteins that are functionally equivalent but have no structural similarity to RCA family members (11).
The protozoan parasite Trypanosoma cruzi produces a glycophosphatidylinositol (GPI)-anchored complement regulatory protein (CRP) that is functionally similar to human DAF, but lacks sequence identity or similarity to DAF or other mammalian RCA (18). The T. cruzi genome contains multiple copies of crp, and the proteins encoded by these genes share sequence similarity with members of the T. cruzi trans-sialidase (TS) superfamily. Within the TS superfamily, the CRPs form one of three functionally distinct subfamilies that lack TS activity (7). The T. cruzi genome contains many repetitive sequences, and the sequence similarity between TS superfamily members has made it difficult to ascertain an accurate copy number for each family. In this study, we analyzed the sequence diversity of the CRP family and determined its relationship to the TS superfamily and the RCA. We show that multiple members of the CRP family are capable of alternative pathway (AP) complement regulatory activity. We also suggest a method, based on sequence identity and spacing between conserved cysteine residues, to designate molecules labeled currently as putative TSs more accurately, according to their subfamilies.
MATERIALS AND METHODS
Compilation of websites utilized for sequence analysis.BLASTP (Blosum62 matrix) at www.ncbi.nlm.nih.gov/ was utilized to search GenBank for homologous copies of CRP. ClustalW at www.ebi.ac.uk/tools/clustalw/ was used to align paralogous copies of CRP. The SAPS tool at www.ebi.ac.uk/saps/index.html was used to determine the spacing between cysteine residues within individual proteins. BLAST 2 sequences at www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi were used to determine sequence identity between CRP and RCA family members and CRP and glycoprotein C of herpes simplex viruses 1 and 2. The prediction algorithm NetNGlyc (version 1.0; Center for Biological Sequence Analysis, Technical University of Denmark [www.cbs.dtu.dk/services/NetNGlyc/ ]) was used to predict the potential for N-linked glycosylation among CRP paralogs and to determine the conservation of these sites between family members.
Parasites. T. cruzi trypomastigotes (Y strain) were passed in BALB/c mice as described previously (18) and used to initiate an infection in NIH 3T3 cells at a multiplicity of infection of 5 in Dulbecco's modified Eagle's medium buffered with 10 mM HEPES (pH 7.4) and supplemented with 5% fetal bovine serum, 5 mM l-glutamine, 0.2 mM sodium pyruvate, and 50 μg/ml gentamicin, all from Invitrogen. Tissue culture-derived trypomastigotes were recovered after 7 to 9 days from supernatants of infected cultures and used immediately for labeling.
Biosynthetic labeling and membrane preparation.Tissue culture-derived trypomastigotes were recovered from the supernatant of infected cells, washed twice in phosphate-buffered saline (PBS)-1% glucose, and resuspended at 108 cells per milliliter in labeling medium (ICN Biochemicals). [35S]methionine (Trans-Label; ICN Biochemicals) was added at 50 μCi/ml, and cells were incubated for 1 h at 37°C. After labeling, the cells were washed twice at 4°C in PBS-1% glucose. The final pellet was used to prepare trypomastigote membrane extracts as described previously (18).
C3b preparation and affinity purification of CRP.Human C3 was purified from fresh plasma, and C3b was prepared and coupled to Affi-Gel 10 (Bio-Rad Laboratories, Richmond, CA) as described previously (19). CRP was affinity purified from trypomastigote membrane extracts (19). A total of 50 mM NaHCO3 (pH 10.5) was used to strip protein from beads by gentle agitation at room temperature for 20 min. Eluate was neutralized with a 1/10 volume of 1 M NaPO4 (pH 6.8).
Immunoprecipitation of CRP following C3b affinity purification, sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), and fluorography.Sera from patients with chronic Chagas' disease were used at 1:20 for immunoprecipitation assays. A total of 60 μl of affinity-purified material (as described above) was used for each immunoprecipitation. Immunoprecipitations were carried out as described previously (19). Samples were analyzed by SDS-PAGE (8.5%), followed by fluorography as described previously (18).
Cloning of crp genes from high-similarity group (HSG) and low-similarity group (LSG).Multiple different copies of the crp gene were obtained by PCR. T. cruzi Y strain chromosomal DNA was used as the template, and oligonucleotides CRP-F (AACATGTCCCGTCATGTGTTT) and CRP-R (TCACACCTCAGCAGAGAACC) were used to target the crp gene family. The annealing temperature was 59°C, and extension by Platinum Taq high fidelity (Invitrogen Corporation, Carlsbad, CA) was for 3 min at 72°C. The MgSO4 concentration was 2 mM. A 3-kbp band was amplified, and these DNAs were cloned into pTOPO-XL (Invitrogen Corporation, Carlsbad, CA). Sequence analysis confirmed the isolation of several different crp genes.
Subcloning and sequence modifications of crp isolates.Two isolates designated CRP1.5 and CRP1.6 had 58% and 83% identity, respectively, to AAB49414 and were selected for further analysis. Eight amino acid codons at the 5′ end of each gene were replaced with eight amino acid codons of the rat preproinsulin signal sequence, as described previously (2), and the native GPI signal sequence was replaced with the GPI signal sequence from human DAF, as described previously (2). These constructs were subcloned into the eukaryotic expression vector pcDNA (Invitrogen Corporation, Carlsbad, CA) for further analysis.
Cell lines and medium.Freestyle 293 cells were purchased from Invitrogen Corporation, Carlsbad, CA. Cells were grown in Wheaton double-side-arm stirrer flasks on a magnetic stir plate at 121 rotations per minute in Freestyle 293 expression medium (Invitrogen Corporation, Carlsbad, CA) and maintained in a humidified atmosphere of 5% CO2 at 37°C.
Transfections, phosphatidylinositol-specific phospholipase C treatment of cells, and sample concentration.Transfections were carried out according to the manufacturer's instructions by using 55 ml of medium, 106 cells/ml, and 293fectin (Invitrogen Corporation, Carlsbad, CA) according to the manufacturer's instructions. Forty-two hours after the start of the transfection, cells were placed into fresh medium containing 700 μU phosphatidylinositol-specific phospholipase C (Molecular Probes, Inc., Eugene, OR) and incubated for 6 h at 37°C. At the completion of the incubation, cultures were spun at 300 × g in a Beckman Coulter Allegra 25R centrifuge for 5 min and supernatants were transferred to fresh tubes. A total of 15 ml of supernatant from each transfection was concentrated to approximately 0.5 ml in iCON concentrators (7 ml/20 kDa; Pierce, Rockford, IL) at 4°C according to the manufacturer's recommendation. Buffer exchange was achieved by reconstituting samples twice to the original volume in 10 mM phosphate 25 mM NaCl and concentrating to 0.5 ml. Concentrated samples were stored at 4°C until used.
DAA.Alternative pathway C3 convertase decay acceleration activity (DAA) was determined by enzyme-linked immunosorbent assay as described previously (10), with the following modifications: sample concentrations were 500 μg/ml, triplicate wells were averaged, and the DAA was performed for 20 min. C3b and factors B, D, and H (positive control) were purchased from Complement Technology, Inc., Tyler, TX. Goat anti-factor B and horseradish peroxidase-conjugated rabbit anti-goat immunoglobulin were purchased from Calbiochem, San Diego, CA.
RESULTS
Comparative sequence analysis to determine the approximate number of crp genes in the T. cruzi genome.Previously, we cloned and sequenced a full-length copy of a T. cruzi crp (20) and demonstrated that this gene encodes a functional CRP (2). We used this CRP (GenBank accession number AAB49414) as the reference sequence to analyze the T. cruzi genome for paralogs.
The CL-Brener strain is a hybrid strain of T. cruzi that is well characterized experimentally and was used for the T. cruzi genome-sequencing project (9), and protein and DNA sequences from this project are accessible through GenBank. A GenBank BLASTP search, with CRP copy AAB49414 derived from T. cruzi Y strain, revealed 14 full-length CRP paralogs and 6 partial-length paralogs from the CL-Brener strain as well as 1 full-length paralog and 2 partial-length paralogs from the CL strain. All paralogs had E values of 0, indicating that there is no chance that these sequences were randomly related to AAB49414. Table 1 shows the sequence identities and similarities of the CL-Brener proteins relative to AAB49414. A full-length copy of AAB49414 was not present in this strain; instead, two partial protein sequences (XP_804086 and XP_805135) with 99% identity to AAB49414 were identified. Overlapping sequence between these partial protein sequences showed that they were unique proteins, suggesting that at least two very similar copies of this gene exist in the CL-Brener strain. Because the CL-Brener strain is a heterozygote hybrid at most loci (9), these two proteins might represent heterozygote copies of the same locus. AAA30196, CAA50255, and CAA50289 represent three protein sequences identified from the CL strain; these proteins were 100%, 98%, and 93% identical, respectively, to full-length proteins in the CL-Brener strain. Additionally, of the six CL-Brener partial protein sequences with E values of 0, three were amino terminus, partial-length sequences and three were carboxy terminus, partial-length sequences, but overlapping sequence showed they were six unique proteins. An additional partial-length protein sequence (XP_804894), with a significant E value, but not equal to 0, was too short to determine whether it was unique or the amino terminus of another partial-length CRP. Whether any of these partial-length protein sequences represent true partial-gene copies or result from the inherent difficulty in sequencing this highly repetitive genome will require experimental confirmation. Nevertheless, our data suggest that 14 to 20 copies of the crp gene (representing 7 to 10 loci) are present in a single strain of T. cruzi and several paralogs are retained between strains.
Percent identity/similarity of CRP paralogs relative to AAB49414a
Sequence diversity suggests two distinct groups of CRPs.The CRP paralogs fell into two distinct groups. Seven full-length copies with more than 80% identity and more than 86% similarity to AAB49414 were designated the HSG. Seven full-length copies with sequence identity between 54 to 62% and similarity between 64 to 71% to CRP (AAB49414) were designated the LSG. A ClustalW alignment of the HSG paralogs revealed shared sequence identity and similarity of approximately 60% and 84%, respectively, along the length of the polypeptides (Fig. 1). A similar alignment of the paralogs in the LSG showed strong identity at the amino termini, with shared identity and similarity of approximately 48% and 72%, respectively, and limited sequence identity (7%) and low similarity (33%) at the carboxy termini. Additionally, of the seven partial copies, three fell in the HSG and four were in the LSG. Interestingly, the bulk of the insertions or deletions were located in the carboxy termini of these proteins.
Sequence identity/similarity (I/S) of the HSG and the LSG of CRPs relative to other members of each group. Relative to the AAB49414 sequence, the sialidase domain is 400 amino acids, the central region is 275 amino acids, and the carboxy-terminal region is 279 amino acids.
Among the CRPs, cysteine residues are conserved and localized to the amino termini.The spacing between conserved cysteines in the prototype CRP sequence, AAB49414, and the 14 full-length CRP paralogs from the CL-Brener strains showed a unique spacing pattern between conserved cysteines (Fig. 2). Of 10 cysteines in AAB49414, 5 were absolutely conserved in all paralogs, with two additional cysteines conserved among the HSG paralogs. No cysteines were found in the carboxy-terminal third of AAB49414, and few cysteines were located in this region for any of the 14 paralogs.
Spacing between conserved cysteines is unique among TS superfamily subfamilies. Representative subfamilies of the TS superfamily show spacing between conserved cysteine residues. Bold reference sequences in the CRP family represent proteins from the LSG.
We also analyzed the primary amino acid sequences of several representative proteins from the various subfamilies of the TS superfamily. A unique pattern of spacing between conserved cysteine residues was found for each subfamily (Fig. 2). The TS and CRP subfamilies had the most similar pattern of spacing between conserved cysteines, and this result may indicate that these two subfamilies are most closely related.
Relationship of T. cruzi CRP to mammalian RCA and viral inhibitors of complement.CRP is functionally equivalent to DAF. Sequence alignment of CRP to DAF and other RCA family members (factor H, C4bp, MCP, CR1, and CR2) showed that CRP lacks primary sequence identity with the RCA. RCA proteins are composed primarily of CCP modules of about 60 amino acids in length. The spacing between cysteine residues in CCP modules is highly conserved and is probably an essential element in the structure of these modules. No CCP modules have been identified for T. cruzi CRP. Nevertheless, several cysteines are conserved among the putative CRP paralogs and the spacing between them is unique to this TS-like subfamily. CCP modules also share a conserved tryptophan (W) residue and well-conserved glycine (G), tyrosine (Y), and phenylalanine (F) residues. W is the only amino acid that is statistically more conserved within a protein family than C is (8). A ClustalW alignment of all 15 full-length CRP paralogs revealed that 87% of the W, 76% of the F, 60% of the Y, and 35% of the G amino acids were conserved throughout the family (see the supplemental material). In comparison, conservation values for methionines, arginines and prolines were 33%, 32% and 19%, respectively. Additionally C, W, F, and Y were absent or underrepresented in the carboxy-terminal third of these proteins. Other hydrophobic amino acids, such as isoleucine, leucine, and valine, are well represented in the carboxy termini of the CRPs, and the significance of this representation is not understood.
Viral homologs from the Poxviridae and Herpesviridae as well as from glycoprotein C of herpes simplex viruses 1 and 2 also lacked sequence identity or similarity to the CRP (data not shown).
Sites for asparagine-linked glycosylation are conserved among putative CRP paralogs.Previous work in our laboratory suggested that asparagine (N)-linked glycosylation may be important for CRP binding to C3b and C4b (19). Among the HSG members, four N-linked glycosylation sites were conserved, and two of these were also conserved among the LSG members (Fig. 3). Conserved sites for the addition of N-linked glycans were located exclusively within the amino-terminal two-thirds of these proteins, intimating a role of N-linked glycosylation in the structure and or function in this part of the molecules.
Conserved N-linked glycosylation sites between LSG and HSG of CRP and regions of highest and lowest sequence identity between group members.
Relationship of CRP to active trans-sialidase. CRP (AAB49414) and the active TS (BAA09334) share 31% primary sequence identity.A ClustalW alignment of the 15 full-length CRP paralogs to this T. cruzi TS molecule revealed that 7 of 21 TS putative active-site residues (25) were conserved among all CRP paralogs. An alignment of CRP AAB49414 to the active TS (BAA09334) (Fig. 4) illustrates these conserved residues. The conserved putative active-site residues are shown for both sequences. Aspartic acid (Asp) boxes (SXDXGXTW) are common features among bacterial sialidases, trypanosomal sialidases, and trans-sialidases (7, 22). Asp boxes are often found among enzymes that have polysaccharides as substrates, and it has been suggested that they are important structural elements of these proteins (5). TS Asp boxes are shown in Fig. 4. No Asp boxes were found among CRP paralogs, supporting previous experimental data that state CRP does not have trans-sialidase activity (K. A. Norris, unpublished data).
Alignment of BAA09334 (active TS) and AAB49414 (functional CRP). Asp boxes and FRIP boxes are highlighted in gray. Putative TS active-site residues are bold and underlined. Conserved tryptophan and cysteine residues are bold and italicized. Asterisks indicate identity, colons indicate conservative substitution, and periods indicate semiconservative substitution between aligned proteins.
CRPs from both HSG and LSG retain complement regulatory activity.An enzyme-linked immunosorbent assay-based DAA was used to determine AP complement regulatory activity for the recombinant CRPs expressed in F293 cells. The assay was performed in triplicate in multiple different experiments. Figure 5 shows a representative experiment with different CRP paralogs, including representative clones from the HSG and the LSG. All paralogs were capable of accelerating the decay of the AP C3 convertase compared to that of protein derived from vector-transfected cells. A two-tailed Student's t test that assumed nonpaired equal variance was used to compare each test group to the negative control. P values were 0.011, 0.009, 0.0038, and 0.00024 for CRP, CRP1.5, CRP1.6, and factor H, respectively.
Decay acceleration of the AP convertase. CRP and CRP1.6 are members of the HSG, and CRP1.5 is a member of the LSG. Factor H serves as a positive control.
Multiple different CRP molecules are synthesized by a single parasite strain and differentially recognized by Chagasic patient sera.Although multiple copies of the complete crp gene are contained in the T. cruzi genome, it is not clear how many functional copies of the protein are produced simultaneously. To address this question, [35S]methionine and cysteine-labeled CRP from a clonal line of T. cruzi (Y strain) was affinity purified with human complement protein C3b as described previously (20). Purified CRP was subjected to immunoprecipitation with several different chronic Chagasic sera, and proteins were analyzed by SDS-PAGE, followed by fluorography as described previously (21). Figure 6 shows the distinct migration patterns observed among the CRPs precipitated with different patient sera. These results suggests that multiple variants of the CRP are synthesized by a single strain, that these variants were capable of binding human C3b, and that different patients had predominant antibody reactivity to different variants.
Immunoprecipitation of C3b affinity-purified CRP in Chagasic patient sera reveals multiple variants expressed by a clonal strain of T. cruzi. CRP was affinity purified from a [35S]methionine- and cysteine-labeled TCT lysate and was immunoprecipitated with normal human sera (lane 1) or different Chagasic patient sera (lanes 2 to 8).
DISCUSSION
While rapid complement activation is an important effector mechanism of the innate and acquired immune systems, regulation of complement activation by the host is critical for preventing tissue damage from complement proteins. Proteins that restrict complement activation constitute an important immune evasion strategy used by many microbial pathogens (3). The protozoan parasite T. cruzi produces a 160-kDa CRP. CRP is functionally most similar to DAF: it inhibits the formation and accelerates the decay of both the AP and the classical pathway C3 convertases, and lacks cofactor activity for factor I-mediated cleavage of C3b and C4b (18, 19). DAF is a member of the RCA family, composed primarily of CCP modules, and mediates DAA via its interaction with Bb of the AP C3 convertase (10) and 2a of the classical pathway C3 convertase (14). The T. cruzi CRP lacks CCP modules, and the precise mechanism for complement regulation is not known; however, CRP binds to C3b and C4b via noncovalent interactions in C3b and C4b affinity purification procedures, suggesting that interaction with these molecules may be important for DAA. Herpes simplex virus gC also binds C3b, blocks properdin and C5 interaction with C3b, and accelerates the decay of the AP convertase. It does not, however, interact with C4b or accelerate the decay of the classical pathway convertase (16). The mammalian and viral complement control proteins, the herpes simplex virus gC, and the CRP of T. cruzi are significantly different families of proteins. It is clear from primary sequence analysis that the function of CRP evolved independently from mammalian and viral CCP as well as from gC of herpes simplex viruses 1 and 2 through convergent evolution.
Compared to mammalian CCP, CRPs are functionally most similar to DAF, but do not contain CCP modules or other repeating units. Nevertheless, several of the amino acids conserved among CCP modules are also the most conserved residues among CRPs: tryptophan (87%), phenylalanine (76%), tyrosine (60%), and cysteine (56%). As three of these four amino acids have very hydrophobic side chains and the fourth (tyrosine) is also somewhat hydrophobic, it is possible that they play an important role in maintaining a globular structure in the CRP that is typical of glycoproteins in an extracellular environment. Interestingly, these same amino acids were either absent or underrepresented in the carboxy-terminal third of these molecules, suggesting that like the TS and gp85, the structure/function resides in the amino termini of these proteins.
Multiple copies of the crp gene are found in the T. cruzi genome, but the antigenic diversity of the proteins encoded by these genes has not previously been addressed. Norris et al. showed that the immunization of mice with a DNA construct containing crp (AAB49414) protects against a lethal challenge with T. cruzi in a murine model of Chagas' disease (23). It is not currently known how well a single CRP will protect against different strains. A potentially effective strategy in the generation of a CRP subunit vaccine against T. cruzi is to target the immune response to the C3b binding region, thereby neutralizing the complement regulatory activity. An analysis of the sequence diversity and potential structural motifs of the CRP family will provide information regarding conserved regions that may be involved in complement regulatory activity, leading to improved vaccine design.
Using data from the T. cruzi genome project, we estimated the copy number of the CRP and analyzed the sequence diversity of the CRP family within a single parasite strain. AAB49414 is the only CRP whose function was verified experimentally prior to this report (2), and it was utilized as the prototype for a GenBank BLAST search. This search identified 14 full-length and 6 partial copies of CRP with E values of 0 from the CL-Brener strain. An additional short partial protein sequence, with sequence identity of 65% but with an E value of 8e−99, was also identified. Considering the difficulty encountered in sequencing this highly repetitive genome, it is possible that the partial protein sequences represent full-length copies of crp genes. The CRP paralogs fell into two distinct groups: an HSG comprised proteins with 80 to 99% identity to AAB49414, and an LSG contained proteins having between 54 to 62% identity relative to AAB49414. We showed that CRPs with 83% (high) and 58% (low) identity to CRP AAB49414 had AP complement regulatory activity, confirming that members of both groups are functional CRPs. These results suggest that complement regulatory activity may reside in the amino-terminal two-thirds of these molecules, where sequence identity and similarity are strongest.
Previously, Van Voorhis et al. reported that more than 750 copies of the FL-160 gene are present in the T. cruzi genome (26). Our analysis shows that FL-160 is a CRP paralog with sequence identity of more than 80% to CRP AAB49414, placing it in the HSG. The FL-160 copy number was based on slot blot and Southern blot hybridization experiments using whole genomic DNA and various partial gene probes (26). Some of the hybridization probes used were located within the sialidase domain, and it is likely that cross-hybridization with other TS superfamily genes contributed to the high estimation of copy number. Our analysis showed that the copy number of CRP was significantly less than originally estimated, with between 14 and 20 copies per genome. Because of the highly repetitive nature of the T. cruzi genome, however, it may never be sequenced completely (N. M. EL-Sayed, personal communication), making our approach to copy number determination an approximation as well. Nevertheless, our approach recognizes the contribution of the entire protein sequence instead of partial gene sequences and may therefore be a more accurate estimation.
Among extracellular proteins, cysteine residues often form disulfide bonds within and between polypeptides to stabilize the higher order of a protein. Mutations that lead to changes in these residues can affect the structure and ultimately the function of a protein. According to the percent accepted mutation (PAM) matrix developed by Dayhoff and colleagues, cysteines are well conserved within a protein family (8). We used the EMBL-EBI statistical analysis of protein sequences tool to determine the spacing between conserved cysteines in each full-length CRP paralog from the start of the mature protein through the GPI addition sequence. The spacings between conserved cysteines were almost identical for both the LSG and the HSG. The spacing was different for other BLAST-identified TS superfamily members whose E values were not equal to zero (3e−127 or higher), and sequence identity was below 40%, suggesting that these proteins were not in the same functional family. Additionally, Wilson et al., showed that 40% sequence identity is the threshold above which functional variation is rare (27). All other TS superfamily members in GenBank with E values not equal to zero had sequence identities ranging from 28 to 39% relative to that of CRP. The group of TS superfamily proteins closest to the CRP in our BLASTP search had E values other than zero. BLASTP with one of these proteins showed more than 80 proteins with E values of 0. An analysis of several proteins from this family showed that they did not possess GPI anchor sequences, and many possessed signal anchors instead of cleavable signal peptides, suggesting that they belong to a different functional family (data not shown).
Interestingly, spacing between conserved cysteines was unique for each subfamily within the TS superfamily. Although a comprehensive analysis was not performed, it may be possible to place TS superfamily members into a given subfamily based on sequence identity (greater than 40%) and similar spacing between conserved cysteines. The benefit of this type of analysis would be that proteins currently labeled as “putative TS” in the database could be reassigned into the proper subfamily (e.g., CRP, gp85, etc.), and this would define proteins of the TS superfamily more accurately.
Previously, Norris and Schrimpf showed that the incubation of parasites in the presence of an N glycosylation inhibitor reduced CRP binding to C3b by 60 to 70% (19). This finding suggests that N-linked glycosylation plays a role in CRP binding to C3b. Four sites predicted to be N-linked glycosylated were conserved among the HSG of CRP paralogs, and two of these sites were also conserved among the LSG. These conserved sites were located in the amino-terminal two-thirds of the proteins, further supporting the hypothesis that C3b/C4b binding ability resides in this portion of the molecule.
Mutations often occur in regions of a protein where structure and/or function is unaffected. As the carboxy-terminal third of the putative CRPs contain the majority of the sequence diversity, including insertions and deletions, it is likely that this region does not play a role in C3b binding or in the restriction of complement activity. It is possible that the role of the carboxy terminus is similar to that of DAF, in that it extends the functional domain beyond the cell membrane, so that it can more easily recognize and interact with C3b (6). Support for this theory is twofold: (i) the carboxy terminus of CRP is rich in aliphatic, hydrophobic amino acids, such as I, V, and T, that are C-beta branched, and (ii) several sites for O-linked glycosylation are predicted for this region. Both of these qualities are consistent with a rigid region meant to extend the functional domain beyond the membrane (1, 12).
More than 750 genes in the T. cruzi genome encode proteins with sialidase domains in their amino termini, and most of these proteins lack TS activity. Tryptophan appears to be well conserved within the sialidase domains of all superfamily members, but the significance of this conservation is not known. Other features, such as the FRIP box and Asp boxes, are variably retained within each subfamily. The CRPs form a functionally distinct subfamily that is involved in complement regulation, and the copy number of this family is relatively small. One member each from the HSG and the LSG was cloned, and the recombinant proteins were shown to have AP complement regulatory activity comparable to that of recombinant CRP (AAB49414). Additionally, we showed that individual Chagasic patient sera differentially recognized various CRPs purified by C3b affinity chromatography from a single parasite strain, suggesting that more than one copy of the CRP is synthesized and capable of binding to human C3b. This finding is important because it suggests that individual patients may differentially recognize immunodominant epitopes on CRPs. Understanding the antigenic diversity among CRP family members, particularly within the C3b binding domain, will aid in the development of an effective vaccine that targets the immune response to this region and is immunogenic against many CRP variant proteins.
ACKNOWLEDGMENTS
We thank Elodie Ghedin for critically reviewing the manuscript.
This study was supported by National Institutes of Health grant RO1 AI32719 (K.A.N.).
FOOTNOTES
- Received 8 August 2007.
- Returned for modification 24 September 2007.
- Accepted 19 November 2007.
- Copyright © 2008 American Society for Microbiology