ABSTRACT
A total of 1,921 expressed sequence tags (ESTs) were obtained from bloodstream trypomastigotes of Trypanosoma carassii, a parasite of economic importance due to its high prevalence in fish farms. Analysis of the data set allowed us to identify a trans-sialidase (TS)-like gene and three ESTs coding for putative mucin-like genes. TS activity was detected in cell extracts of bloodstream trypomastigotes. We have also used the sequence information obtained to identify genes that have not been previously described in trypanosomatids. (Additional information on these ESTs can be found at http://genoma.unsam.edu.ar/projects/tca .)
Trypanosoma carassii infects a variety of cyprinid fish, such as carp, goldfish, crucian carp, and tench, as well as some members of noncyprinid families. The prevalence in nature is very high and may approach 100% in densely populated fish cultures.
Unlike the stercorarian trypanosome Trypanosoma cruzi, T. carassii does not appear to have intracellular stages within the infected host (16). However, T. carassii trypomastigotes show a carbohydrate-rich coat of glycosyl-phosphatidylinositol (GPI)-anchored mucin-like proteins, which are biochemically similar to the mucin coat present in T. cruzi (13, 17). The carbohydrate moiety of T. carassii mucins contains sialic acid, a monosaccharide that in other trypanosomes is transferred from host glycoconjugates to parasite surface molecules by trans-sialidase (TS) (9). TS is a modified sialidase, unique to a few trypanosomatids, that instead of hydrolyzing sialic acid is highly efficient in transferring the monosaccharide to an acceptor molecule. In T. cruzi, the main acceptors are mucins. Sialylated mucins are considered to be essential for the survival of the parasite in the mammalian host (20).
A small-scale expressed sequence tag (EST) sequencing project for the blood stage of T. carassii was undertaken. Among others, genes encoding proteins homologous to trypanosome TS and putative mucin core proteins were identified. We also detected, for the first time, TS activity in T. carassii. The sequence information obtained was used to search for genes that have not been described previously in other trypanosomatids.
Starting with poly(A)+ RNA obtained from blood trypomastigotes, we constructed an oriented cDNA library in the pSport1 vector (Amersham Pharmacia Biotech). Randomly selected clones were sequenced from the 5′ end to generate 1,921 5′ ESTs with >100 bp of good-quality, nonvector sequence. The average length for the whole data set was 390 bp. Sequences were processed and compared against those in different databases by using BLAST, essentially as described previously (1).
Since the trypomastigote cDNA library was not normalized, we expected a significant number of clones to come from abundant transcripts. As is evident from Table 1, a large fraction of the matches correspond to ribosomal proteins. In a table included in supplemental material available online at http://genoma.unsam.edu.ar/projects/tca , we have merged together matches to the same or similar database entries to reduce the list for clarity. In the supplemental material presented online, we have also organized ESTs according to their pattern of hits against databases and provided the raw BLAST reports for all searches.
BLAST matches to protein databasesa
Comparison of our data set against ESTs from other trypanosomatids (GenBank, January 2002) by using BLASTN gave only 178 significant matches. The low number of hits against other trypanosomatid ESTs is probably due to the low EST coverage for this group of organisms (about 60% of the 17,479 ESTs come from the T. cruzi genome project and are derived from the epimastigote stage only). Thus, according to this result, over 90% of the ESTs in our data set (1,745 sequences) would provide novel information about genes expressed in trypanosomatids.
The bloodstream forms of T. carassii have mucin-like surface molecules that are sialylated, as is the case in T. cruzi (13). However, the trypanosomes studied so far do not synthesize sialic acid de novo but acquire it from the medium by using the enzyme TS (9). Hence, we investigated whether TS activity was present in T. carassii trypomastigotes (Table 2). TS activity was assayed with extracts from lyophilized T. carassii trypomastigotes (∼109 cells). TS activity was measured essentially as described in reference 19. The parasite extract contained an activity that was able to transfer sialic acid from sialyllactose to a lactose acceptor. Neuraminidase treatment of the purified reaction product prevented its interaction with a QAE-Sephadex resin and thus resulted in a decrease of radioactivity in this fraction (Table 2). This confirms that the negative charge of the reaction product is due to the transfer of a sialic acid residue from the donor to the labeled lactose acceptor, indicating the presence of a transferase activity in the parasite extracts. Thus, we conclude that the T. carassii trypomastigote extract has TS activity similar to those described in T. cruzi and T. brucei (14, 18).
TS activity in extracts of T. carassii trypomastigotes
In our EST data set, we found one clone (05b16; GenBank accession no. BU096369 ) that showed significant similarity (E = 10−16, 37% identity for the best hit) to members of the T. cruzi TS superfamily. Clone 05b16 contains part of the C-terminal lectin-like domain of the gene. Using the sequence information present in clone 05b16, we designed oligonucleotide primers to amplify the missing 5′ region and to confirm and extend the sequence of the 3′ region of the putative T. carassii TS gene. The translated full sequence of the tcats gene (GenBank accession no. AY142111 ) was compared with that of the gene coding for T. cruzi TS (Fig. 1 and supplemental material [http://genoma.unsam.edu.ar/projects/tca ]). The deduced TcaTS protein is predicted to have a signal peptide cleavage site between residues 22 and 23 (15; [http://www.cbs.dtu.dk/services/SignalP] ), and a GPI anchor signal (Fig. 2) in its C-terminal domain. Some residues critical for the catalytic activity are not conserved, like the ones corresponding to T. cruzi Tyr342 and Trp312, among others (4, 6) (see Fig. 1 and supplemental material [http://genoma.unsam.edu.ar/project/tca ]), suggesting that this gene might encode an enzymatically inactive product. The presence of TS genes encoding inactive products is not uncommon and is consistent with findings in other trypanosomes analyzed so far (namely T. cruzi and Trypanosoma brucei). In these cases, TS belongs to a large gene family that codes for both enzymatically active and inactive members (6, 14). Further work will indicate which member of this family is the tcats gene encoding an enzymatically active protein in T. carassii.
Schematic comparison of the T. carassii TS-like deduced protein with a T. cruzi active TS. T. cruzi TS (TcTS) and the T. carassii TS-like (TcaTS) protein were aligned together with ClustalW. A portion of the alignment including the mature catalytic domain is provided in the supplemental material available online (http://genoma.unsam.edu.ar/projects/tca ). Based on this alignment, we produced a schematic representation of the comparison (A). I, II, and III are conserved aspartic boxes. NA and TS are signature motifs of the neuraminidase (NA) and TS superfamilies, respectively (see reference 4 and references therein). The catalytic, lectin-like, and antigenic domains are marked for TcTS and are based on structural and experimental data (4). The antigenic domain of TcTS (SAPA) is composed of amino acid tandem repeats (3). (B) Antigenicity index plot. The Jameson-Wolf antigenicity index (12) was calculated by using the program Protean from the Macintosh Lasergene package (DNASTAR, Madison, Wis.). This index is actually a weighted composite index made from several independent indices that combine predictions for hydrophilicity (Hopp-Woods), surface probability (Emini), flexibility (Karplus-Schulz), and secondary structure predictions (Chou-Fasman and Garnier-Robson) in a weighted manner (12). (C) Amino acid sequence of the C-terminal extension showing a nearly perfect repetition (r1 and r2) and several smaller degenerate repetitive units (rI to rVIII).
ESTs coding for the mucin-like genes of T. carassii. The sequences from clones 01j19 and 04f14 were confirmed by resequencing from both ends, and ambiguities were resolved. These sequences, together with the single-pass sequence of clone 03o12 were translated and aligned by using ClustalW. A schematic representation of the sequence structure of some members of the T. cruzi mucin TcMUC family (8) compared with the partial data from the T. carassii ESTs is shown in panel A. The aligned sequences are shown in panel B. Shown in boldface are residues with positive O glycosylation prediction. Shown in reverse type (white over dark background) are the predicted C-terminal GPI modification sites (21). Residues shown over a black background are those predicted based on a protozoan-trained algorithm. Residues shown over a gray background are those predicted with the algorithm trained on a metazoan learning set (21). Identity between residues is indicated by an asterisk, conservative changes are indicated by double dots, and less-conservative changes are indicated by single dots. (C) Comparison of the C-terminal end sequences of several GPI-anchored trypanosomatid proteins. The GPI modification site (ω site) was experimentally determined for the T. brucei and Leishmania proteins shown (10) as well as for the MUC-RA1-type mucin (2). A member of another T. cruzi mucin family, TcSMUG; a member of the TS family, SAPA; and the putative T. carassii TS are also included for comparison. Superscript a and b indicate the GenBank accession number and reference number, respectively.
The TcaTS-like deduced protein contains a highly hydrophilic C-terminal extension after the conserved lectin-like domain. A C-terminal extension with similar characteristics, although not sequence related (SAPA, see Fig. 1A) is found in the T. cruzi TS that is expressed in blood trypomastigotes. SAPA (for shed acute-phase antigen) is a highly antigenic, immunodominant domain mainly composed of amino acid tandem repeats. These repeats were shown to be involved in delaying the clearance of TS from the blood of the host (3). The C-terminal domains of TcaTS and TcTS do not share significant sequence similarity. However, they do share several properties, as is evident from plotting the Jameson-Wolf antigenicity index (12) (Fig. 1B). Although the C-terminal portion of TcaTS does not contain the kind of perfect amino acid tandem repeats that the T. cruzi enzyme has, it nonetheless contains some less-conserved repetitive units (Fig. 1C).
The presence of TS activity and a TS-like gene prompted us to search for sialic acid acceptors. In T. cruzi, the major acceptors are the surface mucins. Interestingly, the surface of T. carassii trypomastigotes appears to be biochemically similar to that of T. cruzi (13). We thus focused our search on ESTs that would encode mucin genes.
Mucin domains are regions very rich in threonine (Thr) and/or serine (Ser) and proline, which conform the target sites for O glycosylation (11). Sequence similarity in this region is not usually high, because the only selective pressure is on maintaining a high number of O-glycosylation sites (23). T. cruzi mucins have two main regions: the O-glycosylated one and a C-terminal region that contains the GPI anchor signal (Fig. 2). The GPI anchor signal does not show sequence similarity when different species or even molecules from the same organism are compared (22), while the region immediately before the GPI anchor signal within the C-terminal region is the only one showing particular sequence features that appear to be conserved in different groups of the T. cruzi mucin family (7, 9) (Fig. 2C).
Since the presence of mucin-coding ESTs revealed by BLAST searches was not conclusive, we then searched for ESTs with the described bias in amino acid composition and the presence of an O-glycosylated region followed by a C-terminal region containing a GPI anchor signal. When we looked in our data set for sequences that meet these criteria, three ESTs (clones 01j19 [GenBank accession no. BU095637 ], 03o12 [GenBank accession no. BU096145 ], and 04f14 [GenBank accession no. BU096234 ]) were selected. The three ESTs overlap in their 3′ ends and have a high degree of similarity in this region, but are clearly distinct. Except for EST 03o12, for which we were not able to go back to the original clone, the other two clones were fully sequenced. The predicted proteins showed a conserved C-terminal region containing most of the predicted O-glycosylated Thr and Ser in a cluster. The regions toward the N termini differ, but still contain a high proportion of interspersed hydroxyamino acids (11 of 36 for 03o12, 19 of 64 for 04f14, and 7 of 36 for 01j19). There are 14 sites for 01j19, 27 sites for 04f14, and 34 sites for 03o12 with positive prediction for O glycosylation (Fig. 2). The C-terminal end of the three clones complies with the GPI anchor requirements (21), and the putative ω and ω+1 and ω+2 sites are very similar to other kinetoplastid proteins in which the GPI anchor addition site was experimentally determined (10) (Fig. 2), suggesting that the predicted proteins are attached to the surface through a GPI anchor.
Although the N-terminus end of these molecules is lacking, the sequence structure observed strongly resembles those of a particular group of T. cruzi apo mucins that have highly divergent N termini rich in Ser/Thr and Pro, followed by two or three Thr runs before a GPI anchor signal. Furthermore, this particular group in the T. cruzi mucin superfamily shows a characteristic amino acid motif consisting in the tripeptide GluAlaPro (EAP) at the end of each Thr run (8) (Fig. 2). The T. carassii mucin-like proteins show a similar motif that is present within a degenerate (not perfect) Thr run. This kind of EAP motif, although with some degeneracy, is present in most of the mucin domains throughout the eukaryotes. Thus, although the sequence similarity is not significant, the presence of the described sequence features supports the mucin-like characteristics of the three ESTs products described herein.
We were also interested in mining the sequence information obtained to discover new genes for trypanosomatid parasites. As summarized in Table 2 in the supplemental material online, ESTs showing significant similarity against proteins in online databases were divided into those from trypanosomatids and those from other organisms. A detailed analysis of the putative genes identified will certainly be done by interested researchers in the field. However, it is worth mentioning the finding of a group of clones that showed significant similarity to different proteins that interact with DNA or that carry DNA binding domains. Clone 08d1 (GenBank accession no. BU097020 ) is similar to a small helix-turn-helix DNA binding protein, clone 06d4 (GenBank accession no. BU096646 ) is similar to mammalian transcription factor BTF3, clone 03o4 (GenBank accession no. BU097036 ) is similar to a protein containing a cold shock domain that is highly similar to the RNP-1 RNA binding motif, and clone 08e1 (GenBank accession no. BU096153 ) is similar to a nucleosomal assembly protein. We highlight these matches because of the common agreement on the lack of transcriptional regulation in trypanosomatids (5). Searches for RNA polymerase II promoters for protein-coding genes have proved elusive, and no classic transcription factors have been described in these organisms. Interestingly, the ESTs described above show similarity to DNA binding proteins or are related to DNA binding. It might be possible that at least in a minority of cases, transcriptional regulation of some sort might occur, where these or other proteins could be involved.
Nucleotide sequence accession number.
All EST sequence data have been deposited in the dbEST division of GenBank, under accession no. BU095516 to BU097436 .
ACKNOWLEDGMENTS
Fernán Agüero and Vanina Campo contributed equally to this work.
We thank Rodrigo Pavón, Fernanda Peri, and Diego Rey Serantes for technical assistance.
This work was supported by grants from the World Bank/UNDP/WHO Special Program for Research and Training in Tropical Diseases (TDR), the Human Frontiers Science Program, and the Agencia Nacional de Promoción Científica y Tecnológica and the Ministerio de Salud, Argentina. The research from ACF was supported in part by an International Research Scholars Grant from the Howard Hughes Medical Institute and a fellowship from the John Simon Guggenheim Memorial Foundation. A.C.F. and D.O.S. are members of the Carrera del Investigador Científico, CONICET.
FOOTNOTES
- Received 30 May 2002.
- Returned for modification 27 July 2002.
- Accepted 8 September 2002.
- Copyright © 2002 American Society for Microbiology