ABSTRACT
Larvae of Toxocara canis, a nematode parasite of dogs, infect humans, causing visceral and ocular larva migrans. In noncanid hosts, larvae neither grow nor differentiate but endure in a state of arrested development. Reasoning that parasite protein production is orientated to immune evasion, we undertook a random sequencing project from a larval cDNA library to characterize the most highly expressed transcripts. In all, 266 clones were sequenced, most from both 3′ and 5′ ends, and similarity searches against GenBank protein and dbEST nucleotide databases were conducted. Cluster analyses showed that 128 distinct gene products had been found, all but 3 of which represented newly identified genes. Ninety-five genes were represented by a single clone, but seven transcripts were present at high frequencies, each composing >2% of all clones sequenced. These high-abundance transcripts include a mucin and a C-type lectin, which are both major excretory-secretory antigens released by parasites. Four highly expressed novel gene transcripts, termed ant (abundant novel transcript) genes, were found. Together, these four genes comprised 18% of all cDNA clones isolated, but no similar sequences occur in the Caenorhabditis elegans genome. While the coding regions of the four genes are dissimilar, their 3′ untranslated tracts have significant homology in nucleotide sequence. The discovery of these abundant, parasite-specific genes of newly identified lectins and mucins, as well as a range of conserved and novel proteins, provides defined candidates for future analysis of the molecular basis of immune evasion by T. canis.
Toxocara canis is a common nematode parasite of dogs and related species. Adult worms live in the gastrointestinal tract, releasing eggs which enter the environment by the fecal route. Within the eggs, larval T. canis develop over a 2- to 3-week period (27). Embryonated eggs are then infective if ingested by a new host, as larvae hatch in the stomach and penetrate the epithelial layer.T. canis larvae show a remarkable lack of host specificity, infecting a wide range of taxa, including humans (27, 28). In the definitive (canid) host, larvae may mature by migrating to the intestine and developing to the adult stage; such maturation is favored in pups and in reproducing females (40). In most other dogs and in all paratenic hosts such as humans, development is arrested at the larval stage.
The arrested stage is remarkable for surviving for as long as 9 years in vivo (7), without reproduction or differentiation and without succumbing to attack by the host immune system. This diapausal state is mirrored in vitro, where larvae survive for many months in serum-free medium, secreting quantities of excretory-secretory antigens which have been characterized in biochemical (5, 50, 58) and immunological (44, 45, 56) terms.
We hypothesized that in the absence of cell division, tissue growth, or gametogenesis, a significant proportion of larval protein production, and therefore mRNA, is likely to be directed at immune evasion. To characterize abundant mRNAs, we conducted a small-scale expressed sequence tag (EST) project. EST sequencing was pioneered for mammalian cells (1, 2) and Caenorhabditis elegans(49) and is an important component of major parasite sequencing projects (15, 18, 22, 23, 46, 64). By this means, we have identified a series of abundantly expressed genes from larvalT. canis, among which are likely to be important mediators of parasite immune evasion.
MATERIALS AND METHODS
cDNA library. T. canis larvae were hatched and maintained in vitro as previously described (20, 50) for a period of 6 months, with weekly changes of serum-free medium. From 200,000 cultured T. canis larvae, using a single-step guanidine-phenol-chloroform extraction, 265 μg of total RNA was recovered, from which 6 μg of poly(A)+ RNA was isolated by oligo(dT) chromatography. cDNA synthesized from this mRNA was unidirectionally cloned into the Uni-Zap XR phage vector, using packaging extracts from Stratagene. The amplified library contained 1.9 × 109 PFU/ml with 91% recombinants. The possibility of host contamination was essentially eliminated because eggs were first incubated in vitro in formalin, and once hatched, larvae were cultured in serum-free RPMI 1640 medium.
Isolation of cDNA clones and in vivo excision of phagemids.Single clones were randomly picked phage from the plated out cDNA library. Phagemids were rescued in vivo by using ExAssist helper phage and nonsuppressing Escherichia coli SOLR (Stratagene).
Selection and maintenance of clones.Plasmids were prepared from overnight cultures by using a Qiaprep Spin Miniprep kit (Qiagen) and stored at −20°C. Insert sizes were determined by PCR using vector primers (M13 reverse and M13 forward primers or T3 and T7 primers). In the few cases where insert sizes could not be determined by PCR, restriction digestion of purified recombinant plasmids was performed with XbaI and XhoI (Promega). All clones are available to the research community on request.
Sequencing.Plasmid templates were sequenced by using dye terminator cycle sequencing chemistry with Amplitaq DNA polymerase (FS enzyme) on an ABI 377 automated sequencer (Applied Biosystems, Foster City, Calif.). Both 5′ and 3′ ends were sequenced in some cases where 5′ sequences gave high-quality sequence through the poly(A) tail or where the 5′ sequence showed unequivocal identity with a previously characterized T. canis clone.
Analysis.SeqEd version 1.0.3 (Applied Biosystems) was used to edit out vector sequence and flanking sequences. Sequences were aligned by using AssemblyLign and MacVector 6.0 software (Oxford Molecular). Nonredundant database searches used ungapped BLAST (basic local alignment search tool) (3) on the National Center for Biotechnology Information server. Nucleotide sequences were subjected to BLASTX searches against the GenBank nr (nonredundant) protein database. Nucleotide sequence searches used BLASTN (on both nr and dbEST databases), and deduced protein sequence from any unambiguous open reading frame was used to search with TBLASTN (against the nr nucleotide database).
Nomenclature.Gene naming follows the convention for nematodes (11, 14) of a three-letter name and a number, and a prefix indicating the source organism, in this case Tc. Genes are denoted in italics; proteins are capitalized. Three sets of interim gene names were used where no functional homology exists:Tc-not (novel transcript), Tc-ant (abundant novel transcript, where ≥5 clones of 263 are identical), andTc-huf (homolog of unknown function, where similar sequences have been found in other nematode species). For interim gene designations, the number of the EST clone first sequenced is retained; for cDNAs assigned functional names, clones are generally numbered sequentially (e.g., Tc-ctl-1 and -2 for C-type lectins), except where numbering is significant in other organisms, principally for ribosomal proteins (e.g.,Tc-rps-4, -5, and -9 to conform with established nomenclature).
Database deposition.All sequences have been deposited in NCBI dbEST (17) with separate entries for 5′ and 3′ reads.
RESULTS AND DISCUSSION
Sequencing and similarity searches.A total of 261 clones, containing inserts ranging from 128 to 2,050 bp, were taken for analysis once nonrecombinant and chimeric clones were discarded. All were sequenced from the 5′ end, and 218 were also sequenced from the 3′ end. Most similarities were found, or were stronger, with the 5′ sequences, but a significant minority of similarity relationships were found only with 3′ sequence reads. In general, a probability value of 10−6 was sought as a minimum degree of similarity.
Clustering analysis.EST sequences were clustered on the basis of nucleotide identity. It was noted that in some of the larger clusters, identity of the 3′ sequences was critical to correctly group differentially truncated clones with nonoverlapping 5′ sequences.
As a result of these analyses, a total of 128 distinct gene products were identified. Of these, only three, mucin 1 (Tc-muc-1[25]), phosphatidylethanolamine-binding protein (Tc-peb-1 [26]), and the 60S ribosomal protein L3 (Tc-rpl-3 [51]), have been previously characterized. Approximately 50% (65/128) of the total number of genes have informative similarity to genes of known function from other species, a further 17 clones have database homologs of unknown function, and 47 genes (37%) show no similarity to known genes; among this last class, designated novel genes, 4 were classified as abundant transcripts (see below).
Abundant transcripts.A small number of transcripts are heavily represented in the library. Just 8 transcripts (6.2% of genes) account for 102 clones (39.1%) sequenced, while the 20 most abundant (all those isolated three or more times) accounted for 143 clones (54.8%). Table 1 presents the transcripts characterized in order of frequency, with the most highly sampled clone being cytochrome c oxidase subunit II (25/261 = 9.6% of clones), which is a mitochondrial DNA-encoded gene in other nematodes (36, 52). The second most common clone is a C-type lectin (16/261 = 6.1%) which in work to be published elsewhere we demonstrate encodes the major surface and secreted glycoprotein of T. canis larvae, TES-32 (42).
Frequency of transcripts
Abundant novel transcripts.Unexpectedly, four more abundant clones are all novel, with no similarities in the database or, in the coding regions, to each other. These transcripts each represent between 2.7 and 5.7% of ESTs and together account for more than 18% of the library. They have consequently been designated ant genes and retain the number of the EST clone from which they were first isolated (ant-003, -005, -030, and-034).
The four ant genes differ in length and composition, but all are 1.6 kb or more in length. The characterization of their full-length sequences, and of the protein products encoded by these genes, is currently under way, as none of the ESTs isolated include 5′ methionine start codons. Figure 1 presents a map of the EST clones isolated for each ant gene. From this it can be seen that 3′ sequencing proved essential in identifying all members of the cluster, because 5′ sequences do not in all cases overlap. Because the multiple clones all have different 5′ termini, the abundance of these transcripts appears not to be an artifactual amplification of single clones in the construction of the library.
Alignments of clones corresponding toant-003, -005, -030, and-034. Segments sequenced in single-pass reactions are lightly shaded; unsequenced portions are shown unshaded in broken lines. Darker shading corresponds to 3′ UTRs. The position and codon of the stop signal are indicated for each transcript. Numbers to the left of the bars indicate the approximate start of the cDNA relative to the longest clone. No start codons have yet been identified for any of these genes, and hence numbering is provisional. Note that without 3′ sequence data, the longer EST clones belonging to ant-005and ant-034 would not have been recognized.
Homology in 3′ UTRs of the four ant genes.Although ant-003, -005, -030, and-034 showed no similarities between coding regions, the 3′ untranslated regions (UTRs) of all four genes bore significant levels of identity. It is notable also that none of these 3′ ends contain consensus polyadenylation motifs such as AATAAA or similar sequence. This is not unprecedented among nematode genes: a recent survey of C. elegans 3′ UTRs found that 7% of mRNAs bore no identifiable polyadenylation signal (16).
An alignment of the 3′ coding regions and UTRs of these four transcripts shows little identity in coding region nucleotides (Fig.2A) or amino acids (not shown) but similar sequences in all four genes immediately after the stop codon (Fig. 2B). No similar sequences were found in any other T. canis genes or in genes from other organisms such as C. elegans. In C. elegans, there are examples of 3′ UTR motifs associated with repression of translation in genes such astra-2 and fem-3 in germ line cell differentiation (4, 68). This is unlikely to be a useful parallel for theT. canis ant genes, as there is no similarity in 3′ UTR sequence between the two species and there is no gonadal development in the larval stage of T. canis. Translational suppression of the ant genes would be a surprising event in light of their unusually high level of transcription.
Alignment of 250-nucleotide (nt) 3′-terminal coding sequences (A) and complete 3′ UTRs (B) of the four antgenes; identical settings were used for both alignments in MacVector ClustalW alignment (gap penalty = 10.0, extend gap penalty = 5.0). The stop codons at the end of the coding sequence and the beginning of the UTR are underlined. Bases identical in three or four of the sequences are boxed and shaded. The polyadenylated tail is shown boxed without shading.
Homologs of genes of unknown function.Sixteen clones showed significant similarities to known nematode sequences with no assigned function (Table 2). These were all designated huf genes, retaining the number of the corresponding EST clone. One clone, Tc-huf-001, contains a tandem array of four blocks of 36 amino acids each containing six cysteines in identical alignment. This motif, termed the NC6 (26) or SXC (12) domain, is found in T. canis and C. elegans proteins, particularly those associated with nematode surfaces, but the function of the array is not known.
Homologs to proteins of unknown function in other nematodesa
In addition, we found three genes which show similarity toAncylostoma secreted protein (ASP), which is associated with larval activation and development in other nematode species (10, 30, 31, 61). However, the T. canis stage from which the cDNA library was made is developmentally arrested and is not analogous to activated hookworm larvae. This gene family shows similarity to hymenopteran venom allergens and, in common with members reported for Brugia malayi and Onchocerca volvulus, has been designated vah (venom allergen homolog). Tc-vah-1, previously reported as Tc-CRISP (cysteine-rich secreted protein) by Liu (39a), is remarkable for containing two NC6/SXC domains in tandem with a VAH homology unit. Further characterization of these clones, Tc-vah-1,-2, and -3 is in progress (51a).
Other novel genes.A total of 43 additional transcripts were found to have no significant similarities to any other sequences deposited in GenBank nr protein and dbEST databases. These were present at between one and four copies in the 261-member data set from T. canis and are designated not genes. Further studies on selected clones, such as not-018, for which antibodies to the protein product strongly recognize T. canis larval excretory-secretory antigens (61a) are ongoing.
Homologs of known genes.Sixty-five genes with database homologs were found; these are listed in Table3. There are 23 metabolic and respiratory enzymes but remarkably few structural proteins (only actin, calponin, and α-tubulin) and no DNA replication proteins, consistent with the arrested state of the larval parasite. The various categories of genes are described below.
Homology of T. canis ESTs with known genesa
Mucins.A mucin gene, Tc-muc-1, has previously been reported to be abundantly expressed by T. canis larvae (25). Consistent with this, seven clones ofTc-muc-1 were recorded (2.7% of the library). Three new mucin genes were found among the EST clones, each of which contains similar mucin domains and flanking NC6/SXC domains (26). The mucins differ in the composition of repeat motifs, and in the number and positions of NC6/SXC domains, and work in progress indicates that all are members of the TES-120 glycoprotein family associated with the parasite surface coat (40a).
C-type lectins.One of the most abundant transcripts (16/261) was found to correspond to peptide sequence obtained from TES-32, a prominent secreted glycoprotein. These ESTs showed weak similarity to C-type lectins, but the homology was firmly established from full-length sequence. A detailed analysis of the functional lectin properties of TES-32/Tc-ctl-1 has been submitted for publication (42). Two variants of this sequence, designatedTc-ctl-2 and Tc-ctl-3, were also noted.
Proteases.Proteolytic enzymes have been prominent in most studies of parasitic helminths at the biochemical (32) and molecular (43) levels. Three transcripts, with strong similarities to cathepsin L (41), cathepsin Z (43), and asparaginyl endopeptidase (19), were each present as single copies. Full-length sequences of the cathepsins L (41) and Z (22a) have been determined. A protease inhibitor similar to the aspartyl protease inhibitor ofAscaris and Brugia (Bm33) (21) was also isolated as a single clone.
Transporters and receptors.One clone homologous to the acetylcholine receptors reported from other nematodes (24) has been isolated. A relatively frequent transcript (5/263) encodesTc-PEB-1, a previously identified phosphatidylethanolamine-binding protein (26) which is present in parasite secretions as a 26-kDa protein (TES-26).
Structural proteins.Few structural proteins were identified in the EST data set, probably reflecting the arrested state of this parasite stage. Actin (67) and calponin (33) have been sequenced from other parasitic nematodes. For tubulins, most attention has focused on the β-tubulins (29), with which benzimidazole resistance is associated. Tc-tua-1 has strong similarity to α-tubulins from Haemonchus contortus(37) and C. elegans.
Protein synthesis and modification.Two genes essential for protein synthesis (elongation factors 1a and 1b) were identified, as well as peptidyl-glycine alpha-hydroxylating monooxygenase, which modifies the C termini of peptides. Protein disulfide isomerase is a well-known requisite for protein folding and correct formation of disulfide bonds. In C. elegans, the pdi gene is found in an operon with one cyclophilin gene (55); a homolog from O. volvulus has also been characterized (65). Peptidyl-prolyl cis-trans-isomerase is similarly essential for correct protein conformation; theTc-ppi-1 gene product is related to the FK506-binding proteins of mammals and not to the multigene cyclophilin family of peptidyl-prolyl cis-trans isomerases described for C. elegans (55, 57).
Glycolysis, respiration, and other metabolic and citric acid cycle enzymes.Some 18 distinct metabolic enzymes had very high levels of similarity to T. canis ESTs and represent the major metabolic pathways of glycolysis and aerobic respiration, as well as essential processes such as amino acid synthesis and degradation. Particularly prominent among these is the mitochondrial cytochromec oxidase subunit II (25/261 = 9.7% of all clones). As mitochondrial mRNAs are not polyadenylated, the presence of mitochondrial DNA-encoded sequences in the cDNA library is difficult to interpret quantitatively.
Antioxidants.Oxidative stress is highly detrimental to both parasitic (54) and free-living (34) nematodes. In tissue-dwelling parasites, reactive oxygen intermediates from aggressive granulocytes may be countered by expression of antioxidants such as glutathione peroxidase and/or superoxide dismutase (SOD). Previous work characterized a SOD gene (Tc-sod-1) fromT. canis larvae and showed that no glutathione peroxidase activity or gene sequence was detectable in this parasite (47). The EST data set contained seven clones encoding SOD isoforms, all quite distinct from Tc-sod-1 (approximately 66% divergence in protein sequence) and more similar to C. elegans gene F55H2.1. The seven clones represent four distinct isoforms each showing 10 to 20% divergence in nucleotide and deduced amino acid sequence, including two-codon insertions/deletions. This level of divergence and the presence of triplet insertions/deletions led us to designate these four isoforms as separate genes,Tc-sod-2, -3, -4, and -5. Confirmation of this assignation is under way.
Ribosomal proteins.Twenty ESTs (comprising 13 different genes; 8.4%) encode ribosomal proteins. This is close to the range found with EST projects for other nematodes; for example, 8.5% (1,339/15,811) of B. malayi ESTs deposited are ribosomal (12a), as are 5.0% (11/218) of Necator americanus ESTs (12a).
Other proteins.Some ESTs had homology to mammalian proteins for which a function in nematodes is not obvious. Five particularly interesting findings were noted.
(i) Granulin/epithelin precursor (Tc-gep-1).Granulins are synthesized as large (500- to 600-amino-acid) precursors from which are derived seven small (∼60-amino-acid) 12-cysteine peptides with growth factor-like activity (8, 9, 69). Mammalian and fish kidney epithelial cells are rich sources of these peptides (8, 60), as are human and rodent leukocytes, suggesting that granulins may fulfill cytokine-like functions (6). If this is so, the synthesis of a granulin homolog byT. canis larvae may be important in the interaction between parasite and the host immune system.
(ii) Lupus autoantigen homolog (Tc-lah-1).The lupus autoantigen (also known as Sjögren syndrome type B antigen) is a highly conserved ribonucleoprotein which is a target of autoantibodies in systemic lupus erythrematosus (39). As autoimmune responses can be initiated by infectious agents (53), the expression of this homolog by T. canismay be significant. Similarly, O. volvulus expresses the RAL-1 product, which is homologous to the Sjögren syndrome type A antigen (48).
(iii) Olfactomedin (Tc-ofm-1).A T. canis member of the olfactomedin gene family was found. The prototype gene encodes a 57-kDa glycoprotein in the extracellular mucus matrix of the olfactory neuroepithelium of frogs (66), but additional homologs are widely expressed in mammalian brain (35). Tc-ofm-1 shows maximum similarity to aC. elegans homolog. As T. canis larvae produce high levels of mucins, including those constituting the surface coat (25, 59), Tc-OFM-1 protein may be involved with mucus layers in this parasite.
(iv) PC4/TIS7/“interferon-related protein” (Tc-pth-1).A T. canis gene which is similar to a mammalian product described as PC4 or TIS7, induced in cell lines by activators such as nerve growth factor or phorbol esters (63), has been isolated. The same mammalian gene has also been designated interferon-related protein, due to an erroneous deposition of this sequence in 1985 as murine beta interferon (accession no. J00424 ). We have named the T. canis gene, which bears no similarity to mammalian interferons,pth-1 (PC4/TIS7 homolog).
(v) Tubby-like protein (Tc-tlp-1).Recently, a new multigene family related to the mouse gene tubby has been recognized (38). Mutations in tubby in the mouse result in obesity and degenerative changes in adult life; however, the existence of conserved homologs in nematodes (including C. elegans) and in plants indicates that this gene encodes a protein which fulfills a fundamental—and as yet unrecognized—function in all higher organisms.
Conclusion.The analysis of expressed genes that we present here has achieved its aims of identifying a number of abundant transcripts, one of which (ctl-1) corresponds to a major secreted product and four of which (ant-003,-005, -030, and -034) represent novel gene sequences. We have also made the first steps toward a comprehensive gene catalogue of a biologically intriguing and clinically significant parasite, providing a resource and springboard for future studies.
The outcome of this study supports the proposition that the EST strategy is highly applicable to many metazoan organisms which have relatively large genome sizes (≥108 bp) (13, 14), especially where interest is focused on genes expressed at moderate to high levels. In addition, genes restricted to a life cycle stage can be identified in this manner. Further evidence for the success of this approach is seen in the Filarial Genome Project, which in 3 years has deposited in dbEST over 15,000 sequences, which are estimated to represent some 5,000 separate gene transcripts, or around 33% of the total gene complement ofB. malayi (15, 62). The availability of the full genome sequence of C. elegans lends an exceptional opportunity to compare free-living and parasite gene sequences and structure, with the potential to identify adaptations requisite for parasitism at the molecular level. Along their evolutionary path, parasitic species must have developed a myriad of immune evasion mechanisms, and we anticipate that our study and others like it will be instrumental in identifying the novel immune evasion genes upon which parasite survival depends.
ACKNOWLEDGMENTS
We thank the Medical Research Council for project grant support.
We thank Mark Blaxter for detailed critical comments on the manuscript.
Notes
Editor: J. M. Mansfield
FOOTNOTES
- Received 2 March 1999.
- Returned for modification 13 May 1999.
- Accepted 1 June 1999.
- Copyright © 1999 American Society for Microbiology