Evidence Combination in Hidden Markov Models for Gene Prediction

Brejova, Bronislava

Evidence Combination in Hidden Markov Models for Gene Prediction

dc.contributor.author	Brejova, Bronislava	en
dc.date.accessioned	2006-08-22T14:28:50Z
dc.date.available	2006-08-22T14:28:50Z
dc.date.issued	2005	en
dc.date.submitted	2005	en
dc.description.abstract	This thesis introduces new techniques for finding genes in genomic sequences. Genes are regions of a genome encoding proteins of an organism. Identification of genes in a genome is an important step in the annotation process after a new genome is sequenced. The prediction accuracy of gene finding can be greatly improved by using experimental evidence. This evidence includes homologies between the genome and databases of known proteins, or evolutionary conservation of genomic sequence in different species. <br /><br /> We propose a flexible framework to incorporate several different sources of such evidence into a gene finder based on a hidden Markov model. Various sources of evidence are expressed as partial probabilistic statements about the annotation of positions in the sequence, and these are combined with the hidden Markov model to obtain the final gene prediction. The opportunity to use partial statements allows us to handle missing information transparently and to cope with the heterogeneous character of individual sources of evidence. On the other hand, this feature makes the combination step more difficult. We present a new method for combining partial probabilistic statements and prove that it is an extension of existing methods for combining complete probability statements. We evaluate the performance of our system and its individual components on data from the human and fruit fly genomes. <br /><br /> The use of sequence evolutionary conservation as a source of evidence in gene finding requires efficient and sensitive tools for finding similar regions in very long sequences. We present a method for improving the sensitivity of existing tools for this task by careful modeling of sequence properties. In particular, we build a hidden Markov model representing a typical homology between two protein coding regions and then use this model to optimize a component of a heuristic algorithm called a spaced seed. The seeds that we discover significantly improve the accuracy and running time of similarity search in protein coding regions, and are directly applicable to our gene finder.	en
dc.format	application/pdf	en
dc.format.extent	2466556 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10012/1036
dc.language.iso	en	en
dc.pending	false	en
dc.publisher	University of Waterloo	en
dc.rights	Copyright: 2005, Brejova, Bronislava. All rights reserved.	en
dc.subject	Computer Science	en
dc.subject	Gene finding	en
dc.subject	hidden Markov model	en
dc.subject	sequence alignment	en
dc.title	Evidence Combination in Hidden Markov Models for Gene Prediction	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	School of Computer Science	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: bbrejova2005.pdf
Size:: 2.35 MB
Format:: Adobe Portable Document Format

Download

Collections

Theses
Computer Science