Materials
- Contigs 47 and 21-the sequence chunks from Drosophila elegans.
- UCSC Genome Browser-provides all of the prediction programs and RNA sequence data into a format that is easy to understand.
- Gene Record Finder- provides the number of exons and general location of each gene.
- BLAST- uses the exons in gene record finder to create a general location of the gene on the contig.
- Gene Model Checker-Takes finished sequence and makes sure that all conditions and restrictions are met. It also provides several helpful tools in determining finished sequences.
- Raptor X-Protein function prediction software that helps determine the function of unknown genes
Methods
Contig 47 of Drosophila elegans was chosen from the possible projects that the Genomics Education Partnership (the group of people that organize the research done on fruit flies) had provided. The UCSC Genome Browser was then used to figure out what genes were located within the contig. The genes were NfI, Syt7, and Rad23. After, the gene NfI was crude-mapped. This was done so by going to the Gene Record Finder and searching for the gene in Drosophila melanogaster. The protein coding sequence from each exon found in the Gene Record Finder was then BLASTed against the contig as a whole in order to determine the general location of each exon within the contig.
Contig 47 of Drosophila elegans was chosen from the possible projects that the Genomics Education Partnership (the group of people that organize the research done on fruit flies) had provided. The UCSC Genome Browser was then used to figure out what genes were located within the contig. The genes were NfI, Syt7, and Rad23. After, the gene NfI was crude-mapped. This was done so by going to the Gene Record Finder and searching for the gene in Drosophila melanogaster. The protein coding sequence from each exon found in the Gene Record Finder was then BLASTed against the contig as a whole in order to determine the general location of each exon within the contig.
Above, the colored boxes is the gene record finder, and the long sequence of letters is the BLAST alignment
Next, the UCSC genome browser was used to locate the beginning and end of each exon in the NfI gene and isoforms of that gene. Several computer programs, RNA sequence data, and possible splice acceptor and donor sites were used as data to determine the outcome.
When all of the isoforms were finished, the Gene Model Checker was used to recheck if the gene annotated was correct. A dot plot comparison, an exon comparison, and a protein alignment comparison were used to determine if the model was accurate. These results were recorded. This process was repeated for Syt7 and Rad23. Then, the entire process was repeated for contig 21 of D. elegans.
Here are the final checks that the Gene Model Checker does to see if the annotated sequence is good or bad.
Next, using a protein function prediction program, the function was determined for each protein, and these results were recorded. Through all of these steps, it was determined whether conservation was greater with more important genes. Importance was determined by its necessity in body function; however, gene CG11148 and its function were unknown, so in order to predict function Raptor X was used to determine a possible function for the protein. Finally, all of the results were recorded.