Show simple item record

dc.contributor.authorMendler, Kerrin 20:11:45 (GMT) 20:11:45 (GMT)
dc.description.abstractThe growth of genomic information in public databases has dramatically improved our view of the tree of life and at the same time expanded our knowledge of protein diversity. Through the use of automated annotation pipelines, researchers can predict many of the functional capabilities of organisms directly from their genome sequence. Although there exist numerous phylogenetic and protein databases, there have been fewer attempts to combine these data, which is essential for the study of protein evolution. The web application AnnoTree ( was developed as part of this thesis to facilitate the exploration and visualization of protein families (Pfams) and KEGG orthologs (KOs) on a phylogeny composed of nearly 24,000 bacterial genomes. The visualization includes an interactive tree of life, a summary of the taxonomic distribution of the query, basic taxonomic information, and annotation confidence scores. All protein sequences, visualizations, and summary information can be downloaded directly from the interface. The AnnoTree framework is open-source and can be modified to incorporate any custom tree, taxonomy, and proteome dataset. AnnoTree allows users to visualize the phylogenetic distribution of a Pfam of interest, which, in combination with obtained gain/loss data, promotes hypothesis-generation in the context of protein evolution. To identify functions that are more tightly associated with evolutionary mechanisms such as horizontal gene transfer and evolutionary conservation, the pre-computed annotation data were combined with the bacterial tree of life in a phylogenomics analysis. The phyletic patchiness of all Pfam and KO annotations was measured using the normalized consistency index (CI), a measure of disagreement between the presence/absence states of traits across the tree and the tree topology. Pfams and KOs with the highest normalized CI represent functions known to be associated with mobile genetic elements and viral defence. These annotations were most commonly found within the genomes of symbiotic and pathogenic bacteria. The most highly conserved Pfams and KOs were functions related to core processes such as transcription, DNA replication, and protein synthesis as well as those required for oxygenic photosynthesis and sporulation. Lineage-specific Pfams and KOs were classified in many bacterial taxa, revealing many clade-defining functions in the Baccilus_A genus, the Oxyphotobacteria class, and the Actinobacteria class, among others. An additional phylogenomics analysis was performed to identify branches of a phylogeny encompassing representatives from all three domains of life undergoing the most Pfam gain and loss events. The branches dividing the three taxonomic domains had the highest density of gain events, all of which were associated with well-known clade-defining functions. Missing data influenced the frequency of Pfam losses in lower taxonomic levels, but some characterized genome streamlining events within Eukaryotes were uncovered. Ultimately, the development of AnnoTree and accompanying analyses provide new insights into large-scale bacterial phylogenomics and the evolution and distributions of bacterial protein domains and gene families.en
dc.publisherUniversity of Waterlooen
dc.subjectBacterial evolutionen
dc.subjectFunctional evolutionen
dc.subjectWeb applicationen
dc.titleLarge-scale phylogenomic visualization and analysis of functional traits in bacteriaen
dc.typeMaster Thesisen
dc.pendingfalse of Waterlooen
uws-etd.degreeMaster of Scienceen
uws.contributor.advisorDoxey, Andrew
uws.contributor.affiliation1Faculty of Scienceen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages