UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Exploring functional annotation through genomic and metagenomic data mining

dc.contributor.authorLobb, Briallen
dc.date.accessioned2020-09-09T18:14:11Z
dc.date.available2020-09-09T18:14:11Z
dc.date.issued2020-09-09
dc.date.submitted2020-08-12
dc.description.abstractFunctional profiling of genomes and metagenomes, as well as data mining for novel proteins, all rely on computational methods for functional annotation of protein sequences. Standard methods assign protein function based on detected homology to reference sequences, but often leave behind a significant fraction of hypothetical sequences ("dark matter") that cannot be annotated. To maximize our ability to extract new biological insights from newly sequenced genomes, it is critical to understand the advantages and limitations of homology-based annotation, and explore alternative methods for inferring function. In this thesis, I performed a comprehensive exploration of computational protein annotation, with a focus on bacterial genomes and metagenomes. First, I applied homology-based methods to functionally annotate and analyze original datasets including newly sequenced Streptomyces strains, a wastewater metagenome, and microbial communities involved in vertebrate decomposition. These studies identified genes and functions of interest including cellulases, antibiotic resistance genes, and virulence factors. I then explored the limits of homology-based annotation by measuring annotation coverage, the fraction of annotated proteins in a proteome, across ~27,000 organisms in the microbial tree of life. This study demonstrated a wide range in annotation coverage across bacteria, from 2-86%. In addition, it revealed multiple factors including taxonomy, genome size, and research bias, as heavy influences on the degree to which proteomes could be annotated. To gain biological insights into hypothetical proteins of unknown function, I analyzed 4,049 domains of unknown function (DUFs) from Pfam. Using phylogenomic, taxonomic and metagenomic information, I detected statistical associations between domains and biological traits. Association-based methods uncovered environment, lineage, and/or pathogen associations in just under half of all DUFs and highlighted new families such as DUF4765 as intriguing virulence factor candidates. Finally, I constructed a database of "ORFan" metagenomic sequences that cannot be annotated using standard approaches, and inferred functions for tens of thousands of these sequences using profile-profile comparison approaches. Motif analysis and genomic context validated these predictions, enabling the discovery of hundreds of novel candidate metalloproteases. Protein "dark matter", which includes a large pool of unannotated coding sequences, is an incredible resource to find new proteins and functions of interest, and included are suggestions on how to prioritize these sequences for future study. A combination of homology-based and alternative annotation methods will be most effective for broad functional profiling of genomes and metagenomes, and can push the boundaries for functional interpretation of sequence data.en
dc.identifier.urihttp://hdl.handle.net/10012/16267
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectgenome/metagenome annotationen
dc.subjectfunctional annotationen
dc.subjectbacterial genomicsen
dc.subjectannotation coverageen
dc.subjectphenotype associationsen
dc.subjectremote homologyen
dc.subjectgenomic contexten
dc.subjecthomology-based annotationen
dc.subjectORFan sequencesen
dc.subjectdomains of unknown functionen
dc.titleExploring functional annotation through genomic and metagenomic data miningen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentBiologyen
uws-etd.degree.disciplineBiologyen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorDoxey, Andrew
uws.contributor.affiliation1Faculty of Scienceen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lobb_Briallen.pdf
Size:
63.28 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections