This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Liu, C.
Right arrow Articles by Abrahamsen, M. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liu, C.
Right arrow Articles by Abrahamsen, M. S.

 Previous Article  |  Next Article 

Infection and Immunity, August 1999, p. 3960-3969, Vol. 67, No. 8
0019-9567/99/$04.00+0
Copyright © 1999, American Society for Microbiology. All rights reserved.

A Random Survey of the Cryptosporidium parvum Genome

Chang Liu, Vladimir Vigdorovich, Vivek Kapur, and Mitchell S. Abrahamsen*

Department of Veterinary PathoBiology, University of Minnesota, St. Paul, Minnesota

Received 29 January 1999/Returned for modification 31 March 1999/Accepted 25 May 1999


    ABSTRACT
Top
Abstract
Introduction
Materials and Methods
Results and Discussion
References

Cryptosporidium parvum is an obligate intracellular pathogen responsible for widespread infections in humans and animals. The inability to obtain purified samples of this organism's various developmental stages has limited the understanding of the biochemical mechanisms important for C. parvum development or host-parasite interaction. To identify C. parvum genes independent of their developmental expression, a random sequence analysis of the 10.4-megabase genome of C. parvum was undertaken. Total genomic DNA was sheared by nebulization, and fragments between 800 and 1,500 bp were gel purified and cloned into a plasmid vector. A total of 442 clones were randomly selected and subjected to automated sequencing by using one or two primers flanking the cloning site. In this way, 654 genomic survey sequences (GSSs) were generated, corresponding to >320 kb of genomic sequence. These sequences were assembled into 408 contigs containing >250 kb of unique sequence, representing ~2.5% of the C. parvum genome. Comparison of the GSSs with sequences in the public DNA and protein databases revealed that 107 contigs (26%) displayed similarity to previously identified proteins and rRNA and tRNA genes. These included putative genes involved in the glycolytic pathway, DNA, RNA, and protein metabolism, and signal transduction pathways. The repetitive sequence elements identified included a telomere-like sequence containing hexamer repeats, 57 microsatellite-like elements composed of dinucleotide or trinucleotide repeats, and a direct repeat sequence. This study demonstrates that large-scale genomic sequencing is an efficient approach to analyze the organizational characteristics and information content of the C. parvum genome.


    INTRODUCTION
Top
Abstract
Introduction
Materials and Methods
Results and Discussion
References

Cryptosporidium parvum has emerged as a well-recognized cause of acute gastrointestinal disease in humans and animals throughout the world and is associated with a substantial degree of morbidity in patients with AIDS (15). C. parvum belongs to the phylum Apicomplexa and is one of several genera that are referred to as coccidia. The parasite primarily infects the microvillous border of the intestinal epithelium and to a lesser extent the extraintestinal epithelium (10). The life cycle of C. parvum resembles that of other coccidia and includes multiple asexual and sexual developmental stages.

Despite the medical and veterinary importance of C. parvum, studies of this organism at the genetic level have only begun in recent years and are still in their infancy. Although a relatively small number of basic metabolic and structural genes as well as several genes encoding immunogenic antigens have been identified (10), little is known about the basic cellular and molecular biology of this pathogen in terms of virulence factors, genome structure, or developmental biology. This is largely due to the inability to obtain purified samples of the various developmental stages of the parasite for biochemical studies. The relatively small size and simple organization of the 10.4-megabase (Mb) C. parvum genome, which is composed of eight chromosomes ranging from 1.04 to 1.5 Mb, however, balance these disadvantages (3). Since the genomic DNA sequence encodes all of the heritable information responsible for parasite development, disease pathogenesis, virulence, species permissiveness, and immune resistance, a comprehensive knowledge of the C. parvum genome will provide the necessary information required for targeted research into disease prevention and treatment.

Over the past few years, large-scale sequencing of randomly selected cDNA or fragments of genomic DNA has proven to be an efficient approach for expanding the understanding of the biology of an organism, including many pathogenic protozoa (6, 8, 21, 32, 36). Recently, a large-scale expressed sequence tag (EST) sequencing project was undertaken for C. parvum sporozoites (5). Due to the inability to obtain purified samples of other developmental stages, in particular, the intracellular stages, the ongoing C. parvum EST approach is limited to the discovery of genes that are expressed in sporozoites. Considering the absolute dependence of C. parvum development on the mammalian host cell, many unique biochemical pathways and molecular mechanisms involved in host-parasite interaction and pathogenesis are not likely to be identified by the ongoing sporozoite EST project.

In order to identify C. parvum genes, independent of their developmental expression, we conducted large-scale sequencing of random C. parvum genomic segments. In this report, we described the identification of 654 genomic survey sequences (GSSs) obtained by the random sequencing of clones from a small-insert C. parvum total genomic DNA library. The relatively high number of GSSs with similarity to previously characterized genes from other organisms implies that genomic sequencing is an efficient method for gene discovery in C. parvum. Furthermore, the identification of putative C. parvum genes and repetitive elements laid the foundation for studies directed toward understanding the biology of C. parvum and the development of strategies for subspecies differentiation and epidemiological surveillance of the parasite.


    MATERIALS AND METHODS
Top
Abstract
Introduction
Materials and Methods
Results and Discussion
References

DNA preparation. C. parvum oocysts (Iowa isolate; originally obtained from C. Sterling, University of Arizona, Tucson) were sterilized by incubation in Clorox (3 × 107 oocysts/ml; sodium hypochlorite, 5.25%; dilution rate, 1:3) for 7 min on ice. The oocysts were washed five times in phosphate-buffered saline (PBS) by centrifugation at 3,500 × g for 10 min at 4°C. The oocysts were resuspended in PBS at a concentration of 108 oocysts/ml. An equal volume of 2× excystation medium (0.05 g of trypsin and 0.15 g of sodium taurocholate in 5 ml of Hanks' buffered salt solution [pH 7.2 to 7.4]) was added, and the oocysts were incubated at 37°C for 1 h. The unexcysted oocysts and sporozoites were washed three times in PBS by centrifugation. The pelleted oocysts and sporozoites were suspended in 400 µl of DNA lysis solution (120 mM NaCl, 0.1 M EDTA, 25 mM Tris base, 1% Sarkosyl), and the suspension was subjected to three freeze-thaw cycles with liquid nitrogen and a 70°C water bath. The lysate was incubated with protease K (1 mg/ml) for 2 h at 37°C followed by phenol-chloroform extraction and ethanol precipitation by using standard methods (29). The DNA precipitate was resuspended in 0.5 ml of TE (10 mM Tris base, 1 mM EDTA [pH 8.0]) and treated with RNase A (1 mg/ml) for 1 h at 37°C. The DNA sample was extracted with phenol-chloroform, precipitated, and resuspended in TE as described above.

Library construction. Total genomic DNA (100 µg) was randomly sheared by using a gas-driven nebulizer as previously described (28), blunted with Escherichia coli DNA polymerase, and phosphorylated with T4 polynucleotide kinase. The DNA fragments were fractionated by electrophoresis, and fragments between 800 to 1,500 bp were excised from the agarose gel and purified with QIAEX II kits (Qiagen, Chatsworth, Calif.). The purified DNA fragments were cloned into the SmaI site of pBluescript II SK (+) vector (Stratagene, La Jolla, Calif.).

Sequencing and analysis. Randomly selected clones from the unamplified library were grown overnight, and plasmid DNA was purified with SNAP kits (Invitrogen, Carlsbad, Calif.) or Qiagen plasmid minikits (Qiagen). DNA sequencing was performed at the Advanced Genetics Analysis Center (College of Veterinary Medicine, University of Minnesota) by using dye termination cycle sequencing technology with AmpliTaq DNA polymerase (Perkin-Elmer, Foster City, Calif.) and was analyzed on an ABI fluorescence automated sequencer (PE Applied Biosystems, Foster City, Calif.). Sequence data were edited with EditSeq (DNASTAR, Inc., Madison, Wis.), to remove the vector sequence and/or to delete sequences of low reliability. Contig assembly and statistical analysis were performed by using SeqMan (DNASTAR). Public databases, including GenBank (release 105.0), EMBL (release 53.0), PIR (release 55.0), SWISS-PROT (release 35.0), PROSITE (release 14.0), and Profile Library, were searched for similarity to known sequences or motifs by using NETBLAST, MOTIFS, and PROFILESCAN (GCG Wisconsin Package, version 9.1; Genetics Computer Group [GCG], Madison, Wis.). Previously identified C. parvum sequences in GenBank were searched and retrieved with STRINGSEARCH and FETCH (GCG). The sequences were further compiled into a local database by using GCGTOBLAST (GCG) and were searched for similarities to our C. parvum GSSs by using BLAST (GCG). The mono-, di-, and trinucleotide compositions were calculated with COMPOSITION (GCG). Direct repeats and simple sequence repeats were identified with FINDPATTERNS (GCG).

Nucleotide sequence accession numbers. Nucleotide sequences reported in this paper are available in the GenBank database under accession no. AQ023473 to AQ024123.


    RESULTS AND DISCUSSION
Top
Abstract
Introduction
Materials and Methods
Results and Discussion
References

Characteristics of sequencing data. To generate a uniformly distributed, representative sequencing template library, high-molecular-weight C. parvum genomic DNA was mechanically sheared by nebulization as previously described (28). The sheared DNA was separated by gel electrophoresis, and fragments with a size distribution from 800 to 1,500 bp were purified and used to construct the genomic library in the vector pBluescript II SK (+). Automated DNA sequencing was performed on a total of 432 random clones. Among them, 212 clones were sequenced with primers flanking each side of the cloning site (T3 and T7 primers). The remaining 230 clones were sequenced with only one flanking primer (T3). A total of 324,076 bp of genomic sequence was generated. In order to identify overlapping sequences, all sequences were subjected to contig assembly. This analysis generated 408 contigs containing 256,935 bp of unique genomic sequence. This represented ~2.5% of the estimated 10.4-Mb C. parvum genome. The majority of nonunique sequence was the result of overlapping sequences generated from individual clones with both flanking primers. A total of 94% (408 individual contigs generated from 432 random clones) of the random clones contained unique sequences. To assess the quality of our sequence data, GSSs matching previous C. parvum database entries were aligned with their corresponding database entries. The accuracy of our sequences, indicated by the percentage of the identical nucleotides between the aligned sequences, was found to be greater than 99% (data not shown). Other than vector sequences used to construct the genomic library, no contaminating bacterial or bovine sequences were found among the generated GSSs. This is likely due to the harsh chemical treatments and extensive washing of the oocysts prior to DNA isolation, which greatly reduced the chance of contamination of the C. parvum genomic DNA library with host or other microbial DNA fragments.

Identification of putative genes. Database searching with the GSSs was performed by using the program NETBLAST against the nonredundant GenBank, PDB, SWISS-PROT, and PIR (1) databases. This analysis revealed that 134 GSSs, corresponding to 107 individual contigs (26%), displayed significant similarity (smallest probability [P <=  10-5]) to sequences present in the databases. Among them, 129 GSSs displayed similarity to known protein sequences, one displayed a significant similarity to telomeric sequences of several eukaryotes (CpGR254), and two (CpGR12A and CpGR12B) contained sequences representing C. parvum rRNA genes (GenBank accession no. AF040725). Seven of the GSSs represented previously characterized C. parvum sequences (precursor of oocyst wall [GenBank Z22537], tubulin beta chain [PIR A25342], C. parvum DNA segment B [GenBank M59420], C. parvum open reading frame [ORF] 2 gene [GenBank U18112], thrombospondin-related adhesive protein (TRAP) [GenBank AF017267], elongation factor [GenBank U71180], and protein disulfide isomerase [GenBank U48261]). In addition, searching the GSSs with the program tRNAscan-SE (20) identified one GSS (CpGR309B), which was highly homologous to the isoleucine tRNA gene of Thiobacillus ferrooxidans (GenBank U18089).

The GSSs which displayed significant similarities to database entries were grouped based on the biological roles of their matches (Table 1) by using the classification system developed by Riley (27). The distribution of putative genes in different functional groups is shown in Fig. 1. It is evident that genes involved in macromolecular and small-molecular biosynthesis are well represented in the C. parvum genome, as well as genes potentially involved in cellular signaling, energy production, and the regulation of mRNA and protein expression. Of special interest are those genes potentially involved in parasite survival, pathogenesis, and host-parasite interaction. Below, we described several groups of proteins that fall into these categories, which provide new insights into C. parvum biology.

                              
View this table:
[in this window]
[in a new window]
 
TABLE 1.   C. parvum GSSs matched to known sequences from C. parvum and other organisms in public databasesa


View larger version (50K):
[in this window]
[in a new window]
 
FIG. 1.   Functional classification of C. parvum GSSs, showing the proportions of predicted genes according to their putative biological functions. GSSs having a P value of <= 10-5 were classified into 12 functional categories.

The deduced amino acid sequence of CpGR24B displayed significant similarity (P = 2.8e-53) to members of the prohibitin gene family (Fig. 2). In mammals, the prohibitin gene product has been shown to negatively regulate cell proliferation (22). In addition to gene structure, the function of this protein is conserved across many lower and higher eukaryotes. For example, the Pneumocystis carinii prohibitin gene expressed in human fibroblasts has been shown to arrest the cell cycle in the G1 phase (23). The similarity between CpGR24B and members of the prohibitin family suggests that this GSS represents a portion of a C. parvum gene which may function in controlling C. parvum proliferation and development. It is interesting to note that in yeast, prohibitin has been found to be localized within the inner mitochondrial membrane and appears to play a role in mitochondrial inheritance and regulation of mitochondrial morphology (2). However, there is no evidence for the existence of mitochondria in C. parvum, suggesting that not all prohibitin functions are conserved.


View larger version (115K):
[in this window]
[in a new window]
 
FIG. 2.   Multiple sequence alignment of the deduced amino acid sequences of CpGR24B and members of the prohibitin family. The origins and accession numbers of the prohibitin sequences used in this alignment are as follows: Arabidopsis thaliana (At), U69155; Nicotiana tabacum (Nt), U69154; C. parvum (Cp), AQ023505; Homo sapiens (Hs), S85655; Rattus norvegicus (Rn), M61219; Toxocara canis (Tc), U97204; S. cerevisiae (Sc), U16737; Trypanosoma brucei (Tb), AF049901. The amino acid numbers for each sequence are indicated on the right. In the sequence alignment, identical residues are shown with a black background, and similar residues are shown with a gray background.

A total of 21 GSSs displayed limited similarity to proteins involved in the cell signaling pathway, including protein ligands, cell surface receptors and their associated proteins, and protein kinases and phosphatases. For example, CpGR231B displays limited homology to the shk1 kinase-binding protein (P = 4.0e-10). This kinase is an essential component of the Ras- and Cdc42-dependent signaling cascade, which has been demonstrated to be required for cell viability, normal morphology, and mitogen-activated protein kinase-mediated signal response in the fission yeast (11). Although these GSSs are not conserved to the extent that CpGR24B is within the prohibitin gene family, the similarity of these GSSs to proteins involved in intracellular signaling provides evidence that signal transduction pathways in C. parvum are similar to those used by other eukaryotic organisms. These C. parvum proteins are likely involved in the coordination of complex host-parasite interactions, signaling with other parasites, and the regulation of growth and differentiation of parasites in response to external signals.

In addition to the GSSs that displayed similarity to known genes involved in responding to extracellular signals, several GSSs displayed limited similarity to genes involved in cell adhesion and/or recognition. CpGR176A and CpGR260A displayed similarity (P = 4.3e-6 and P = 5.1e-14) to the F-spondin precursor gene of amphibians and mammals which plays a role in cell signaling and adhesion (18). In addition, CpGR176A and CpGR260A are also similar but not identical to the previously identified C. parvum family of TRAPs (GenBank accession no. AF017267, AF073838, AF033828, X77587, and U42213) and to the phylogenetically related sporozoan proteins including the Eimeria maxima EM100 antigen (M99058), the Eimeria tenella Etp100 protein (AF032905), the Toxoplasma gondii MIC2 microneme protein (U62660), and the Plasmodium falciparum circumsporozoite protein-TRAP-related protein (U34363). Members of the TRAP family of proteins have been shown to be localized in the apical end of C. parvum sporozoites and are structurally related to the micronemal proteins of Eimeria and Toxoplasma, which are involved in host-cell attachment and/or invasion (33). Another GSS, CpGR17B, displayed similarity (P = 1.3e-14) to human tyrosyl-tRNA synthetase and to endothelial monocyte-activating protein II (EMAP II) (P = 7.6e-11). Recently, an EMAP II-like domain has been found at the carboxyl-terminal end of human tyrosyl-tRNA synthetase. The human tyrosyl-tRNA synthetase is secreted as cells undergo programmed cell death (apoptosis) and is cleaved into two cytokines including the EMAP II-like molecule (37). EMAP II is a multifunctional tumor-derived cytokine that has been shown to activate endothelial cells, resulting in the elevation of cytosolic free calcium concentration, release of von Willebrand factor, induction of tissue factor, and expression of adhesion molecules such as E-selectin and P-selectin (17). In addition, mononuclear phagocytes exposed to EMAP II demonstrated the induction of tumor necrosis factor alpha (TNF-alpha ) and tissue factor.

The above-mentioned database search could identify only C. parvum genes which were similar to sequences currently present in the public databases. Consequently, GSSs representing unique C. parvum genes or genes that have not been characterized in other organisms would not be identified. In order to estimate the actual coding capacity of C. parvum genome, all sequences were subjected to analysis for the presence of ORFs by using the program ORF Finder (25). Since the expected frequency of the three stop codons is 3/64, the longer an ORF is, the more likely it represents a coding sequence. This analysis revealed that 615 of the 654 GSSs (94%) had the potential to encode proteins, under the condition that an ORF longer than 100 amino acids was considered to be a coding sequence (25). The high percentage of potential coding sequences in our GSSs suggests that the C. parvum genome has a high gene density with little intergenic spacing.

In order to further characterize potential C. parvum genes, GSSs which contained ORFs that did not display a high degree of similarity to those in the databases were further analyzed by using the programs MOTIFS and PROFILESCAN (GCG) to determine the presence of functional protein motifs. In our analysis, only motifs with a low false-positive rate and within ORFs of >100 amino acids were characterized. This search resulted in the identification of 11 functional protein motifs or profiles (Table 2). These included ATP or GTP binding domains, signatures for transport proteins, and surface receptors. This analysis suggests that these GSSs may represent additional C. parvum genes.

                              
View this table:
[in this window]
[in a new window]
 
TABLE 2.   Protein motifs and profiles identified in C. parvum GSSs

Identification of repetitive sequences. Microsatellite DNA sequences, also called simple sequence repeats or simple tandem repeats, are ubiquitous elements of eukaryotic genomes. The function of these repeats is not well understood, despite a number of hypotheses that have been proposed, including modulation of gene regulation, sites of frequent recombination, and formation of left-handed DNA conformation (or Z-DNA) (13). These tandem repeats of 1- to 5-bp motifs have been found to be distributed throughout eukaryotic genomes and have been demonstrated to be useful markers for the rapid and sensitive genetic fingerprinting of an organism (34).

In order to identify potential genetic markers for strain typing and tracking of C. parvum, we analyzed the nature and frequency of microsatellite DNA sequences including all possible dinucleotide and trinucleotide repeats in the C. parvum GSSs. The GSSs containing a di- or trinucleotide repeat and the number of repeats they contained are listed in Table 3 and Table 4. Among the 57 GSSs found to contain microsatellite-like elements, the most abundant dinucleotide repeats included (TT)n, (AA)n, (TA)n, and (AT)n. Similarly, (AAT)n, (TAA)n, (TAT)n, (ATA)n, (TTA)n, and (ATT)n constitute the most abundant trinucleotide repeats. A potential role for several of these microsatellite DNA sequences as genetic markers is currently being investigated.

                              
View this table:
[in this window]
[in a new window]
 
TABLE 3.   Numbers of simple dinucleotide repeats identified in GSSsa


                              
View this table:
[in this window]
[in a new window]
 
TABLE 4.   Numbers of simple trinucleotide repeats identified in GSSsa

To further characterize structural features of the C. parvum genome, we examined the GSSs for the presence of additional repetitive sequences. This analysis identified two GSSs, CpGR265A and CpGR254, that contained complex repetitive sequences. CpGR265A contained multiple direct repeats of 14 bp with a consensus sequence 5' TCTCTTTCAATYCT 3'. Twenty-five copies of the direct repeat were present within 512 bp of sequence. Database searching revealed no significant identity with any other sequences. Similarly, CpGR254 contained 48 copies of an imperfect direct repeat sequence T(2-12)AG(3-5). This basic repeat unit was similar in base composition and structure to telomeric sequences characterized from other lower and higher eukaryotes (14). Further characterization demonstrated that this repetitive sequence represents a portion of a C. parvum telomeric DNA sequence (19).

Analysis of nucleotide compositions. To investigate the correlation between the nucleotide composition of a sequence and its coding potential in the C. parvum genome, the nucleotide compositions of the GSSs were calculated with the program COMPOSITION (GCG) and compared with those of known C. parvum coding sequences. The coding sequences used in this study (accession nos. AF001211, U24082, U90628, L31806, AF013984, U34390, U95995, AF017267, U41365, U95996, S76665, U48261, U35027, S76666, U11761, U48717, U35028, U18120, U65981, U42213, U21667, U71181, L08612, U22892, U83169, and M86241) were retrieved from GenBank with the FETCH program (GCG). The overall AT contents were 62.4% for the coding sequences and 68.0% for the random sequences, suggesting that there is no bias against AT content in the coding region as has been found for other eukaryotes (24). However, an interesting discrepancy was observed when the frequencies of individual nucleotides in the coding and the random sequences were compared. The frequencies of A (33.1%) and C (17.0%) in the coding sequences were nearly identical to those in the random sequences (A, 33.7%; C, 16.1%). In contrast, the occurrence of G in coding sequences (20.6%) is 32% greater than that in the random sequences (15.9%). This was offset by the corresponding decrease in the occurrence of T (29.3% in the coding sequences and 34.3% in the random sequences). Previously, a bias of GC content has been reported in C. parvum (10). Our analysis demonstrated that this bias is due to the preference of G in the coding sequences. The relevance, if any, of this finding is not clear.

To investigate the presence of dinucleotide bias in the C. parvum genome, the dinucleotide preference (DiP) of the random genomic sequences was calculated by dividing the observed frequency of the dinucleotide by its expected frequency (Fig. 3). Dinucleotides CG (0.54), AC (0.76), GT (0.76), and TA (0.78) are significantly disfavored in the C. parvum genome. The low dinucleotide frequency (DiF) of dinucleotides CG and TA is consistent with the fact that both these dinucletides are underrepresented in the genes of Drosophila and a wide range of bacteria, yeast, primates, and other apicomplexans (7). Previously, the low DiF of dinucleotide CG was observed, based on the study of four C. parvum sequences (4).


View larger version (64K):
[in this window]
[in a new window]
 
FIG. 3.   DiP in C. parvum GSSs. The observed DiF in C. parvum GSSs was calculated with the COMPOSITION program (GCG). The expected DiF was calculated by multiplying the observed frequencies of the two mononucleotides that constitute the dinucleotide. The DiP for a dinucleotide was calculated as the ratio of its observed DiF to its expected DiF. The DiPs were plotted against their corresponding dinucleotide sequences. The actual value for each DiP is indicated on the right of each column.

Comparison of different sequencing approaches. The efficiencies of C. parvum gene discovery with random genomic sequencing and EST sequencing were compared to determine the usefulness of each of these approaches. The GSSs generated in this study were compared to 567 C. parvum sporozoite ESTs (5). The average sequence length of the GSSs (496 bp) is longer than that of the ESTs (476 bp), which may be attributed to the length of the cDNA insert, which is shorter than that of the genomic sequence insert. A total of 384 unique ESTs were generated, at a redundancy rate of 32.3%, which is significantly higher than that of the GSS project (6%; 408 individual contigs generated from 432 random clones). However, as ESTs are derived from expressed sequences, all 384 unique ESTs are assumed to represent expressed C. parvum genes regardless of their matching with database entries. Among the unique ESTs, 37% (142 of 384) displayed significant similarity with sequences in the current databases. In contrast, 26% of the individual genomic contigs (107 of 408) displayed significant similarity with sequences in the current databases. This difference is not unexpected, as GSSs do not necessarily represent coding sequences. In general, the characteristics of the C. parvum GSS and EST projects are comparable with those conducted on other organisms, in terms of total sequence length, percentage of sequences with database match, and redundancy rate (6, 8).

In order to examine the redundancy of sequence data generated between the ESTs and GSSs, the ESTs were compiled into a local database and searched with our GSSs. Forty-eight of the 654 GSSs (7.3%) matched sequences present in 33 of the C. parvum ESTs (33 of 568 [5.8%]). Eighteen of these 33 EST sequences matched database entries. Among the 18 sequences, five sequences encoded rRNA and proteins, eight sequences encoded proteins with known functions, and five sequences encoded hypothetical proteins.

During the course of our study, another C. parvum GSS project (5) and a C. parvum sequence tagged site project (26) were initiated. To determine the redundancy of sequences generated from different sources with the same genomic DNA sequencing approach, sequences generated from these projects were retrieved from GenBank, compiled into separate local databases, and searched with our GSSs. One hundred twenty-nine of our C. parvum GSSs (19%) matched 134 GSSs (8.9%) retrieved from the public databases. Eight (1.2%) of our GSSs matched seven (7 of 149 [4.7%]) retrieved C. parvum sequence tagged sites. The above analysis indicates that currently there are relatively few redundancies among C. parvum sequences generated by different approaches or from different sources using the same genomic DNA sequencing approach. This will change as more sequences become available.

Concluding comments. In this study, we employed a random genomic sequencing approach to conduct a general survey of the organizational characteristics and informational content of the 10.4-Mb C. parvum genome. Of the 408 assembled contigs, 107 displayed significant similarity with gene sequences currently in the public databases. These 107 putative C. parvum genes were identified from a total of 256,935 bp of unique genomic sequence. This predicts a minimum gene density of approximately 1 gene/2,500 bp of genomic sequence. In related work, we have obtained more than 15 kb of contiguous DNA sequence from the smallest C. parvum chromosome. Within this locus, eight expressed ORFs were identified (unpublished data). The gene density identified at this locus (8 genes/15 kb) is approximately 1 gene/2,000 bp, consistent with that predicted from the GSS data. This predicted gene density suggests that the 10.4-Mb C. parvum genome may contain ~4,000 to 5,000 genes, comparable to the coding capacity of Saccharomyces cerevisiae, which has a genome size of 13.5 Mb and contains 5,800 genes (12). Other data (16, 30, 31) and analysis of the transcript sizes of the eight ORFs on the smallest C. parvum chromosome (unpublished data) suggest that the average size of the transcript (untranslated and coding sequences) of a C. parvum gene is 1,000 to 2,000 bases (unpublished data). Together these data predict that approximately 50 to 75% (the number of genes times the average length of a gene) of the C. parvum genome is transcribed into RNA sequences. As the GSS analysis could not identify those genes without database matches, the above-estimated coding capacity of the C. parvum genome may be less than the actual capacity. Indeed, the ORF analysis of nonmatching GSSs indicates that many of these sequences likely represent additional C. parvum genes.

Repetitive sequences are known to be present in eukaryote genomes at significantly different frequencies (9). Of the 408 contigs generated in this study, only two contained direct repeat sequences, one of which represented a telomeric sequence (19). This suggests that repetitive sequences may comprise < 0.5% of the C. parvum genome. This percentage is significantly lower than that reported in similar studies for other organisms. In addition to direct repeat sequences, diverse microsatellite sequences have been identified in this study, constituting less than 1% of the C. parvum genome that was characterized (2,308 of 250,000 bp). The paucity of repetitive sequences is consistent with the notion that a large percentage of the C. parvum genome contains coding sequences.


    ACKNOWLEDGMENTS

We thank Bruce A. Roe (University of Oklahoma) for help in initiating this project. We are also grateful to Alison A. Schroeder and Cheryl A. Lancto for technical assistance, Yuan Wang for batch submission of GSSs to GenBank, and Elizabeth Shoop from Computational Biology Center (University of Minnesota) for the analysis and web publication of GSSs.

This work was supported in part by grants from the NIH (AI-35479) and the Minnesota Agricultural Experiment Station to M.S.A.


    FOOTNOTES

* Corresponding author. Mailing address: Department of Veterinary PathoBiology, University of Minnesota, 1988 Fitch Ave., St. Paul, MN 55108. Phone: (612) 624-1244. Fax: (612) 625-0204. E-mail: abrah025{at}tc.umn.edu.

Editor:   J. M. Mansfield


    REFERENCES
Top
Abstract
Introduction
Materials and Methods
Results and Discussion
References

1. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402[Abstract/Free Full Text].
2. Berger, K. H., and M. P. Yaffe. 1998. Prohibitin family members interact genetically with mitochondrial inheritance components in Saccharomyces cerevisiae. Mol. Cell. Biol. 18:4043-4052[Abstract/Free Full Text].
3. Blunt, D. S., N. V. Khramtsov, S. J. Upton, and B. A. Montelone. 1997. Molecular karyotype analysis of Cryptosporidium parvum: evidence for eight chromosomes and a low-molecular-size molecule. Clin. Diagn. Lab. Immunol. 4:11-13[Abstract].
4. Char, S., P. Kelly, A. Naeem, and M. J. Farthing. 1996. Codon usage in Cryptosporidium parvum differs from that in other Eimeriorina. Parasitology 112:357-362.
5. Cryptosporidium parvum sequence tag home page. 2 November 1998, posting date. [Online.] http://medsfgh.ucsf.edu/id/CpTags/home.html. [10 April 1999, last date accessed.]
6. Dame, J. B., D. E. Arnot, P. F. Bourke, D. Chakrabarti, Z. Christodoulou, R. L. Coppel, A. F. Cowman, A. G. Craig, K. Fischer, J. Foster, N. Goodman, K. Hinterberg, A. A. Holder, D. C. Holt, D. J. Kemp, M. Lanzer, A. Lim, C. I. Newbold, J. V. Ravetch, G. R. Reddy, J. Rubio, S. M. Schuster, X. Z. Su, J. K. Thompson, E. B. Werner, et al. 1996. Current status of the Plasmodium falciparum genome project. Mol. Biochem. Parasitol. 79:1-12[Medline].
7. Ellis, J., H. Griffin, D. Morrison, and A. M. Johnson. 1993. Analysis of dinucleotide frequency and codon usage in the phylum Apicomplexa. Gene 126:163-170[Medline].
8. El-Sayed, N. M., and J. E. Donelson. 1997. A survey of the Trypanosoma brucei rhodesiense genome using shotgun sequencing. Mol. Biochem. Parasitol. 84:167-178[Medline].
9. Epplen, J. T., W. Maueler, and C. Epplen. 1994. Exploiting the informativity of `meaningless' simple repetitive DNA from indirect gene diagnosis to multilocus genome scanning. Biol. Chem. Hoppe-Seyler 375:795-801[Medline].
10. Fayer, R. 1997. Cryptosporidium and cryptosporidiosis. CRC Press, Inc., Boca Raton, Fla.
11. Gilbreth, M., P. Yang, D. Wang, J. Frost, A. Polverino, M. H. Cobb, and S. Marcus. 1996. The highly conserved skb1 gene encodes a protein that interacts with Shk1, a fission yeast Ste20/PAK homolog. Proc. Natl. Acad. Sci. USA 93:13802-13807[Abstract/Free Full Text].
12. Goffeau, A., B. G. Barrell, H. Bussey, R. W. Davis, B. Dujon, H. Feldmann, F. Galibert, J. D. Hoheisel, C. Jacq, M. Johnston, E. J. Louis, H. W. Mewes, Y. Murakami, P. Philippsen, H. Tettelin, and S. G. Oliver. 1996. Life with 6000 genes. Science 274:563-567.
13. Hamada, H., M. Seidman, B. H. Howard, and C. M. Gorman. 1984. Enhanced gene expression by the poly(dT-dG) · poly(dC-dA) sequence. Mol. Cell. Biol. 4:2622-2630[Abstract/Free Full Text].
14. Henderson, E. 1995. Telomere DNA structure, p. 11-34. In E. H. Blackburn, and C. W. Greider (ed.), Telomeres. Cold Spring Harbor Laboratory Press, Plainview, N.Y.
15. Hoepelman, A. I. 1996. Current therapeutic approaches to cryptosporidiosis in immunocompromised patients. J. Antimicrob. Chemother. 37:871-880[Abstract/Free Full Text].
16. Jenkins, M. C., R. Fayer, M. Tilley, and S. J. Upton. 1993. Cloning and expression of a cDNA encoding epitopes shared by 15- and 60-kilodalton proteins of Cryptosporidium parvum sporozoites. Infect. Immun. 61:2377-2382[Abstract/Free Full Text].
17. Kao, J., K. Houck, Y. Fan, I. Haehnel, S. K. Libutti, M. L. Kayton, T. Grikscheit, J. Chabot, R. Nowygrod, S. Greenberg, et al. 1994. Characterization of a novel tumor-derived cytokine. Endothelial-monocyte activating polypeptide II. J. Biol. Chem. 269:25106-25119[Abstract/Free Full Text].
18. Klar, A., M. Baldassare, and T. M. Jessell. 1992. F-spondin: a gene expressed at high levels in the floor plate encodes a secreted protein that promotes neural cell adhesion and neurite extension. Cell 69:95-110[Medline].
19. Liu, C., A. A. Schroeder, V. Kapur, and M. S. Abrahamsen. 1998. Telomeric sequences of Cryptosporidium parvum. Mol. Biochem. Parasitol. 94:291-296[Medline].
20. Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955-964[Abstract/Free Full Text].
21. Manger, I. D., A. Hehl, S. Parmley, L. D. Sibley, M. Marra, L. Hillier, R. Waterston, and J. C. Boothroyd. 1998. Expressed sequence tag analysis of the bradyzoite stage of Toxoplasma gondii: identification of developmentally regulated genes. Infect. Immun. 66:1632-1637[Abstract/Free Full Text].
22. McClung, J. K., E. R. Jupe, X. T. Liu, and R. T. Dell'Orco. 1995. Prohibitin: potential role in senescence, development, and tumor suppression. Exp. Gerontol. 30:99-124[Medline].
23. Narasimhan, S., M. Armstrong, J. K. McClung, F. F. Richards, and E. K. Spicer. 1997. Prohibitin, a putative negative control element present in Pneumocystis carinii. Infect. Immun. 65:5125-5130[Abstract].
24. Oliver, J. L., and A. Marin. 1996. A relationship between GC content and coding-sequence length. J. Mol. Evol. 43:216-223[Medline].
25. ORF Finder home page. 26 February 1999, posting date. [Online.] http://www.ncbi.nlm.nih.gov/gorf/gorf.html. [10 April 1999, last date accessed.]
26. Piper, M. B., A. T. Bankier, and P. H. Dear. 1998. A HAPPY map of Cryptosporidium parvum. Genome Res. 8:1299-1307[Abstract/Free Full Text].
27. Riley, M. 1993. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57:862-952[Abstract/Free Full Text].
28. Roe, B. A., J. S. Crabtree, and A. S. Khan. 1996. DNA isolation and sequencing. John Wiley & Sons, New York, N.Y.
29. Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
30. Schroeder, A. A., A. M. Brown, and M. S. Abrahamsen. 1998. Identification and cloning of a developmentally regulated Cryptosporidium parvum gene by differential mRNA display PCR. Gene 216:327-334[Medline].
31. Schroeder, A. A., C. E. Lawrence, and M. S. Abrahamsen. 1999. Differential mRNA display cloning and characterization of a Cryptosporidium parvum gene expressed during intracellular development. J. Parasitol. 85:213-220[Medline].
32. Smith, M. W., S. B. Aley, M. Sogin, F. D. Gillin, and G. A. Evans. 1998. Sequence survey of the Giardia lamblia genome. Mol. Biochem. Parasitol. 95:267-280[Medline].
33. Spano, F., L. Putignani, S. Naitza, C. Puri, S. Wright, and A. Crisanti. 1998. Molecular cloning and expression analysis of a Cryptosporidium parvum gene encoding a new member of the thrombospondin family. Mol. Biochem. Parasitol. 92:147-162[Medline].
34. Tautz, D., and C. Schlotterer. 1994. Simple sequences. Curr. Opin. Genet. Dev. 4:832-837[Medline].
35. University of Minnesota Cryptosporidium parvum genomic survey sequences home page. 21 June 1998, posting date. [Online.] http://www.cbc.umn.edu/ResearchProjects/Cp/. [10 April 1999, last date accessed.]
36. Verdun, R. E., N. Di Paolo, T. P. Urmenyi, E. Rondinelli, A. C. C. Frasch, and D. O. Sanchez. 1998. Gene discovery through expressed sequence tag sequence in Trypanosoma cruzi. Infect. Immun. 66:5393-5398[Abstract/Free Full Text].
37. Wakasugi, K., and P. Schimmel. 1999. Two distinct cytokines released from a human aminoacyl-tRNA synthetase. Science 284:147-151[Abstract/Free Full Text].


Infection and Immunity, August 1999, p. 3960-3969, Vol. 67, No. 8
0019-9567/99/$04.00+0
Copyright © 1999, American Society for Microbiology. All rights reserved.



This article has been cited by other articles:

  • Rider, S. D. Jr., Cai, X., Sullivan, W. J. Jr., Smith, A. T., Radke, J., White, M., Zhu, G. (2005). The Protozoan Parasite Cryptosporidium parvum Possesses Two Functionally and Evolutionarily Divergent Replication Protein A Large Subunits. J. Biol. Chem. 280: 31460-31469 [Abstract] [Full Text]  
  • Wetzel, D. M., Schmidt, J., Kuhlenschmidt, M. S., Dubey, J. P., Sibley, L. D. (2005). Gliding Motility Leads to Active Cellular Invasion by Cryptosporidium parvum Sporozoites. Infect. Immun. 73: 5379-5387 [Abstract] [Full Text]  
  • Menotti, J., Santillana-Hayat, M., Cassinat, B., Sarfati, C., Derouin, F., Molina, J.-M. (2005). Inhibitory Activity of Human Immunodeficiency Virus Aspartyl Protease Inhibitors against Encephalitozoon intestinalis Evaluated by Cell Culture-Quantitative PCR Assay. Antimicrob. Agents Chemother. 49: 2362-2366 [Abstract] [Full Text]  
  • Templeton, T. J., Lancto, C. A., Vigdorovich, V., Liu, C., London, N. R., Hadsall, K. Z., Abrahamsen, M. S. (2004). The Cryptosporidium Oocyst Wall Protein Is a Member of a Multigene Family and Has a Homolog in Toxoplasma. Infect. Immun. 72: 980-987 [Abstract] [Full Text]  
  • Puiu, D., Enomoto, S., Buck, G. A., Abrahamsen, M. S., Kissinger, J. C. (2004). CryptoDB: the Cryptosporidium genome resource. Nucleic Acids Res 32: D329-331 [Abstract] [Full Text]  
  • Hommer, V., Eichholz, J., Petry, F. (2003). Effect of antiretroviral protease inhibitors alone, and in combination with paromomycin, on the excystation, invasion and in vitro development of Cryptosporidium parvum. J Antimicrob Chemother 52: 359-364 [Abstract] [Full Text]  
  • Deng, M., Templeton, T. J., London, N. R., Bauer, C., Schroeder, A. A., Abrahamsen, M. S. (2002). Cryptosporidium parvum Genes Containing Thrombospondin Type 1 Domains. Infect. Immun. 70: 6987-6995 [Abstract] [Full Text]  
  • Gurnett, A. M., Liberator, P. A., Dulski, P. M., Salowe, S. P., Donald, R. G. K., Anderson, J. W., Wiltsie, J., Diaz, C. A., Harris, G., Chang, B., Darkin-Rattray, S. J., Nare, B., Crumley, T., Blum, P. S., Misura, A. S., Tamas, T., Sardana, M. K., Yuan, J., Biftu, T., Schmatz, D. M. (2002). Purification and Molecular Characterization of cGMP-dependent Protein Kinase from Apicomplexan Parasites. A NOVEL CHEMOTHERAPEUTIC TARGET. J. Biol. Chem. 277: 15913-15922 [Abstract] [Full Text]  
  • Striepen, B., White, M. W., Li, C., Guerini, M. N., Malik, S.-B., Logsdon, J. M. Jr., Liu, C., Abrahamsen, M. S. (2002). Genetic complementation in apicomplexan parasites. Proc. Natl. Acad. Sci. USA 99: 6304-6309 [Abstract] [Full Text]  
  • Rotte, C., Stejskal, F., Zhu, G., Keithly, J. S., Martin, W. (2001). Pyruvate : NADP Oxidoreductase from the Mitochondrion of Euglena gracilis and from the Apicomplexan Cryptosporidium parvum: A Biochemical Relic Linking Pyruvate Metabolism in Mitochondriate and Amitochondriate Protists. Mol Biol Evol 18: 710-720 [Abstract] [Full Text]  
  • Hehl, A. B., Lekutis, C., Grigg, M. E., Bradley, P. J., Dubremetz, J.-F., Ortega-Barria, E., Boothroyd, J. C. (2000). Toxoplasma gondii Homologue of Plasmodium Apical Membrane Antigen 1 Is Involved in Invasion of Host Cells. Infect. Immun. 68: 7078-7086 [Abstract] [Full Text]  
  • Feng, X., Rich, S. M., Akiyoshi, D., Tumwine, J. K., Kekitiinwa, A., Nabukeera, N., Tzipori, S., Widmer, G. (2000). Extensive Polymorphism in Cryptosporidium parvum Identified by Multilocus Microsatellite Analysis. Appl. Environ. Microbiol. 66: 3344-3349 [Abstract] [Full Text]  
  • Striepen, B., White, M. W., Li, C., Guerini, M. N., Malik, S.-B., Logsdon, J. M. Jr., Liu, C., Abrahamsen, M. S. (2002). Genetic complementation in apicomplexan parasites. Proc. Natl. Acad. Sci. USA 99: 6304-6309 [Abstract] [Full Text]  

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Liu, C.
Right arrow Articles by Abrahamsen, M. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liu, C.
Right arrow Articles by Abrahamsen, M. S.