A Data Mining Approach for Detecting Evolutionary Divergence in Transcriptomic Data

Loading...
Thumbnail Image

Date

2019-11-19

Authors

Woody, Owen

Advisor

McConkey, Brendan

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

It has become common to produce genome sequences for organisms of scientific or popular interest. Although these genome projects provide insight into the gene and protein complements of a species including their evolutionary relationships, it remains challenging to determine gene regulatory behavior from genome sequence alone. It has also become common to produce “expression atlas” transcriptomic data sets. These atlases employ high-throughput transcript assays to survey an assortment of tissues, developmental states, and responses to stimuli that each may individually elicit or inhibit the transcription of genes. Although genomic and transcriptomic data sets are both routinely collected, they are seldom analyzed in tandem. Here I present a novel approach to combining these complementary data with a software package called BranchOut. BranchOut uses genomic information to construct gene family phylogenies, and then attempts to map gene expression activity onto this phylogeny to allow estimation of ancestral expression states. This allows the identification of specific innovations due to gene duplications that resulted in fundamental diversification in the roles of otherwise closely related genes. As a proof of concept, the BranchOut technique is first applied to a tangible small-scale example in Apis mellifera. Subsequently, the power of BranchOut to analyze complete genomes is shown for two mammalian genomes, Sus scrofa and Bos taurus. The transcriptomic data sets for these two mammals employ microarray and RNAseq platforms, respectively, for expression analysis, demonstrating BranchOut’s applicability to both future and historic expression atlases. Potential refinements to the approach are also discussed.

Description

Keywords

evolution, gene expression, bioinformatics, data mining, phylogenetics

LC Subject Headings

Citation

Collections