Show simple item record

dc.contributor.authorVinar, Tomasen
dc.date.accessioned2006-08-22 14:21:16 (GMT)
dc.date.available2006-08-22 14:21:16 (GMT)
dc.date.issued2005en
dc.date.submitted2005en
dc.identifier.urihttp://hdl.handle.net/10012/1191
dc.description.abstractIn this thesis, we present enhancements of hidden Markov models for the problem of finding genes in DNA sequences. Genes are the parts of DNA that serve as a template for synthesis of proteins. Thus, gene finding is a crucial step in the analysis of DNA sequencing data. <br /><br /> Hidden Markov models are a key tool used in gene finding. Yhis thesis presents three methods for extending the capabilities of hidden Markov models to better capture the statistical properties of DNA sequences. In all three, we encounter limiting factors that lead to trade-offs between the model accuracy and those limiting factors. <br /><br /> First, we build better models for recognizing biological signals in DNA sequences. Our new models capture non-adjacent dependencies within these signals. In this case, the main limiting factor is the amount of training data: more training data allows more complex models. Second, we design methods for better representation of length distributions in hidden Markov models, where we balance the accuracy of the representation against the running time needed to find genes in novel sequences. Finally, we show that creating hidden Markov models with complex topologies may be detrimental to the prediction accuracy, unless we use more complex prediction algorithms. However, such algorithms require longer running time, and in many cases the prediction problem is NP-hard. For gene finding this means that incorporating some of the prior biological knowledge into the model would require impractical running times. However, we also demonstrate that our methods can be used for solving other biological problems, where input sequences are short. <br /><br /> As a model example to evaluate our methods, we built a gene finder ExonHunter that outperforms programs commonly used in genome projects.en
dc.formatapplication/pdfen
dc.format.extent1509842 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2005, Vinar, Tomas. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjectgene findingen
dc.subjecthidden Markov modelsen
dc.subjectprobabilistic modelingen
dc.titleEnhancements to Hidden Markov Models for Gene Finding and Other Biological Applicationsen
dc.typeDoctoral Thesisen
dc.pendingfalseen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeDoctor of Philosophyen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages