Improvements in the Accuracy of Pairwise Genomic Alignment

Hudek, Alexander Karl

Improvements in the Accuracy of Pairwise Genomic Alignment

dc.contributor.author	Hudek, Alexander Karl
dc.date.accessioned	2010-04-16T13:36:12Z
dc.date.available	2010-04-16T13:36:12Z
dc.date.issued	2010-04-16T13:36:12Z
dc.date.submitted	2010
dc.description.abstract	Pairwise sequence alignment is a fundamental problem in bioinformatics with wide applicability. This thesis presents three new algorithms for this well-studied problem. First, we present a new algorithm, RDA, which aligns sequences in small segments, rather than by individual bases. Then, we present two algorithms for aligning long genomic sequences: CAPE, a pairwise global aligner, and FEAST, a pairwise local aligner. RDA produces interesting alignments that can be substantially different in structure than traditional alignments. It is also better than traditional alignment at the task of homology detection. However, its main negative is a very slow run time. Further, although it produces alignments with different structure, it is not clear if the differences have a practical value in genomic research. Our main success comes from our local aligner, FEAST. We describe two main improvements: a new more descriptive model of evolution, and a new local extension algorithm that considers all possible evolutionary histories rather than only the most likely. Our new model of evolution provides for improved alignment accuracy, and substantially improved parameter training. In particular, we produce a new parameter set for aligning human and mouse sequences that properly describes regions of weak similarity and regions of strong similarity. The second result is our new extension algorithm. Depending on heuristic settings, our new algorithm can provide for more sensitivity than existing extension algorithms, more specificity, or a combination of the two. By comparing to CAPE, our global aligner, we find that the sensitivity increase provided by our local extension algorithm is so substantial that it outperforms CAPE on sequence with 0.9 or more expected substitutions per site. CAPE itself gives improved sensitivity for sequence with 0.7 or more expected substitutions per site, but at a great run time cost. FEAST and our local extension algorithm improves on this too, the run time is only slightly slower than existing local alignment algorithms and asymptotically the same.	en
dc.identifier.uri	http://hdl.handle.net/10012/5074
dc.language.iso	en	en
dc.pending	false	en
dc.publisher	University of Waterloo	en
dc.subject	bioinformatics	en
dc.subject	pairwise alignment	en
dc.subject	Hidden Markov Models	en
dc.subject.program	Computer Science	en
dc.title	Improvements in the Accuracy of Pairwise Genomic Alignment	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	School of Computer Science	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: akhudek-phdthesis-final.pdf
Size:: 1.95 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 255 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science