Homologous Gene Finding with a Hidden Markov Model

Cui, Xuefeng

Homologous Gene Finding with a Hidden Markov Model

dc.comment.hidden	I have submitted once on Dec 20, but I have not yet received any feedbacks. This copy is identical to the previous one except that this copy is 1.4 spaced instead of 2.0 spaced.	en
dc.contributor.author	Cui, Xuefeng
dc.date.accessioned	2007-01-18T18:24:32Z
dc.date.available	2007-01-18T18:24:32Z
dc.date.issued	2007-01-18T18:24:32Z
dc.date.submitted	2007-01-11
dc.description.abstract	The homology search problem and the gene finding problem are two fundamental problems in bioinformatics. The homology search problem is to find the homologous regions of two biological sequences; the gene finding problem is to find all the genes in both strands of a genomic sequence. Recently, gene finding research has demonstrated that homology search results can be used to improve the accuracy of gene finding. By combining the two problems, we define a new problem called the homologous gene finding problem. The homologous gene finding problem is to find homologous genes of a query gene in a target genomic sequence. Consequently, we present a new homologous gene finding algorithm in this thesis. We borrow the idea of gene mapping and alignment algorithms, and apply existing seed-based homology search algorithms and hidden Markov model-based (HMM-based) gene finding algorithms to solve the homologous gene finding problem. After we find high-scoring segment pairs (HSPs) between the query gene and the target genomic sequence, we locate target regions that we believe contain a gene homologous to the query gene. Then, we extend existing HMM-based gene finding algorithms to find homologous gene candidates. To improve the accuracy of homologous gene finding, we train a HMM to be biased toward the query gene. We also introduce a new coding sequence (CDS) length penalty as a measure of how the CDS lengths of the query gene and its homologous gene vary to further improve the accuracy. We use the new CDS length penalty together with our enhanced Viterbi algorithm and our flexible finish condition to improve the speed of homologous gene fining without harming the accuracy. Finally, we use protein alignment to pick and rank the best homologous gene candidates. In this thesis, we also describe several experiments to evaluate and support our homologous gene finding algorithm.	en
dc.format.extent	470048 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10012/2652
dc.language.iso	en	en
dc.pending	false	en
dc.publisher	University of Waterloo	en
dc.subject	Homology	en
dc.subject	Gene	en
dc.subject.program	Computer Science	en
dc.title	Homologous Gene Finding with a Hidden Markov Model	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	School of Computer Science	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: main.pdf
Size:: 459.03 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 246 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science