The Libraries will be performing routine maintenance on UWSpace on October 20th, 2025, from 10:00-10:30 pm ET. UWSpace will be unavailable during this time. Service should resume by 10:30 pm ET.
 

Some string problems in computational biology

Loading...
Thumbnail Image

Date

2000

Authors

Lanctôt, J. Kevin

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

This thesis introduces and analyzes a collection of string algorithms that are at the core of several biological problems. First. it presents the Grammar Transform Analysis and Compression (GTAC) entropy estimator. the first entropy estimator for DNA sequences that has both proven properties and excellent entropy estimates. Additionally. the estimator uses a novel data structure to repeatedly solve the Longest Non-overlapping Pattern Problem in linear time. GTAC beats all known competitors in running time. in the low values of its entropy estimates. and in the number of properties that have been proven about it. Second. it presents the Distinguishing String Problem. which has many biological applications such as creating diagnostic probes. universal primers. unbiased consensus sequences. and discovering potential drug targets. All these applications reduce to the task of finding a pattern that. with some e1Tor. occurs in one set of strings ( the Closest String Problem and the Closest Substring Problem) and does not occur in another set ( the Farthest String Problem and the Farthest Substring Problem). The NP-hardness of approximation properties of these problems are characterized. and approximation algorithms are presented.

Description

Keywords

Harvested from Collections Canada

LC Subject Headings

Citation