Enhancements to Hidden Markov Models for Gene Finding and Other Biological Applications

dc.contributor.authorVinar, Tomasen
dc.date.accessioned2006-08-22T14:21:16Z
dc.date.available2006-08-22T14:21:16Z
dc.date.issued2005en
dc.date.submitted2005en
dc.description.abstractIn this thesis, we present enhancements of hidden Markov models for the problem of finding genes in DNA sequences. Genes are the parts of DNA that serve as a template for synthesis of proteins. Thus, gene finding is a crucial step in the analysis of DNA sequencing data. <br /><br /> Hidden Markov models are a key tool used in gene finding. Yhis thesis presents three methods for extending the capabilities of hidden Markov models to better capture the statistical properties of DNA sequences. In all three, we encounter limiting factors that lead to trade-offs between the model accuracy and those limiting factors. <br /><br /> First, we build better models for recognizing biological signals in DNA sequences. Our new models capture non-adjacent dependencies within these signals. In this case, the main limiting factor is the amount of training data: more training data allows more complex models. Second, we design methods for better representation of length distributions in hidden Markov models, where we balance the accuracy of the representation against the running time needed to find genes in novel sequences. Finally, we show that creating hidden Markov models with complex topologies may be detrimental to the prediction accuracy, unless we use more complex prediction algorithms. However, such algorithms require longer running time, and in many cases the prediction problem is NP-hard. For gene finding this means that incorporating some of the prior biological knowledge into the model would require impractical running times. However, we also demonstrate that our methods can be used for solving other biological problems, where input sequences are short. <br /><br /> As a model example to evaluate our methods, we built a gene finder ExonHunter that outperforms programs commonly used in genome projects.en
dc.formatapplication/pdfen
dc.format.extent1509842 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10012/1191
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2005, Vinar, Tomas. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjectgene findingen
dc.subjecthidden Markov modelsen
dc.subjectprobabilistic modelingen
dc.titleEnhancements to Hidden Markov Models for Gene Finding and Other Biological Applicationsen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
tvinar2005.pdf
Size:
1.44 MB
Format:
Adobe Portable Document Format