Some users are experiencing upload errors at the moment. If you receive a "UWSpace is down for maintenance" error, please email jordan.hale@uwaterloo.ca as soon as possible. We are very sorry for the inconvenience.

Show simple item record

dc.contributor.authorCui, Xuefeng
dc.date.accessioned2007-01-18 18:24:32 (GMT)
dc.date.available2007-01-18 18:24:32 (GMT)
dc.date.issued2007-01-18T18:24:32Z
dc.date.submitted2007-01-11
dc.identifier.urihttp://hdl.handle.net/10012/2652
dc.description.abstractThe homology search problem and the gene finding problem are two fundamental problems in bioinformatics. The homology search problem is to find the homologous regions of two biological sequences; the gene finding problem is to find all the genes in both strands of a genomic sequence. Recently, gene finding research has demonstrated that homology search results can be used to improve the accuracy of gene finding. By combining the two problems, we define a new problem called the homologous gene finding problem. The homologous gene finding problem is to find homologous genes of a query gene in a target genomic sequence. Consequently, we present a new homologous gene finding algorithm in this thesis. We borrow the idea of gene mapping and alignment algorithms, and apply existing seed-based homology search algorithms and hidden Markov model-based (HMM-based) gene finding algorithms to solve the homologous gene finding problem. After we find high-scoring segment pairs (HSPs) between the query gene and the target genomic sequence, we locate target regions that we believe contain a gene homologous to the query gene. Then, we extend existing HMM-based gene finding algorithms to find homologous gene candidates. To improve the accuracy of homologous gene finding, we train a HMM to be biased toward the query gene. We also introduce a new coding sequence (CDS) length penalty as a measure of how the CDS lengths of the query gene and its homologous gene vary to further improve the accuracy. We use the new CDS length penalty together with our enhanced Viterbi algorithm and our flexible finish condition to improve the speed of homologous gene fining without harming the accuracy. Finally, we use protein alignment to pick and rank the best homologous gene candidates. In this thesis, we also describe several experiments to evaluate and support our homologous gene finding algorithm.en
dc.format.extent470048 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectHomologyen
dc.subjectGeneen
dc.titleHomologous Gene Finding with a Hidden Markov Modelen
dc.typeMaster Thesisen
dc.pendingfalseen
dc.subject.programComputer Scienceen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeMaster of Mathematicsen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages