An Approximation Algorithm for Character Compatibility and Fast Quartet-based Phylogenetic Tree Comparison
Phylogenetic analysis, or the inference of evolutionary history is done routinely by biologists and is one of the most important problems in systematic biology. In this thesis, we study two computational problems in the area. First, we study the evolutionary tree reconstruction problem under the character compatibility (CC) paradigm and give a polynomial time approximation scheme (PTAS) for a variation of the formulation called fractional character compatibility (FCC), which has been proven to be NP-hard. We also present a very simple algorithm called the Ordinal Split Method (OSM) to generate bipartitions given sequence data, which can be served as a front-end to the PTAS. The performance of the OSM and the validity of the FCC formulation are studied through simulation experiments. The second part of this thesis presents an efficient algorithm to compare evolutionary trees using the quartet metric. Different evolutionary hypothesis arises when different data sets are used or when different tree inference methods are applied to the same data set. Tree comparisons are routinely done by biologists to evaluate the quality of their tree inference experiments. The quartet metric has many desirable properties but its use has been hindered by its relatively heavy computational requirements. We address this problem by giving the first O(n^2) time algorithm to compute the quartet distance between two evolutionary trees.