Comparative analysis and visualization of microbial gene neighborhoods: applications to pathogen genomics

Loading...
Thumbnail Image

Date

2023-12-20

Authors

Wei, Xin

Advisor

Doxey, Andrew

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Gene neighborhoods are clusters of genes that are encoded together within the same genomic region, and may share similarity in expression and/or function. Bioinformatic analysis of gene neighborhoods is a powerful approach that provides insights into genome structure, function, and evolution. Despite its importance, existing bioinformatics tools are limited in terms of their ability to analyze and compare gene neighborhoods, especially on a large-scale. In this thesis, I present a novel software tool called AnnoView for large-scale exploration and analysis of microbial gene neighborhoods. I then demonstrate its use by exploring several different bacterial pathogens and their virulence factors, and using genomic context analysis to gain novel insights into pathogen evolution and genome function. AnnoView is a new gene neighborhood analysis tool that facilitates interactive explo- ration of microbial gene neighborhoods from 30,238 bacterial genomes and 1,672 archaeal genomes with pre-computed functional annotations derived from KEGG, Pfam, and TIGR- FAM. Users also have the flexibility to upload custom datasets in various formats for gene neighborhood visualization. As a first application of AnnoView, I analyzed the adenylate isopentenyl transferase (IPT) gene in bacteria, which encodes a cytokinin-producing enzyme found in plant pathogenic as well as plant growth-promoting bacteria (PGPB). To understand how this gene may function differently between pathogens and PGPB, AnnoView was used to explore and compare its genomic contexts across bacteria. Analysis revealed numerous distinguishing features, including the tendency for pathogen-associated adenylate IPT genes tend to occur in predicted virulence loci. As a second case study utilizing AnnoView, two novel gene clusters were discovered con- taining putative clostridial neurotoxin (CNT) genes within the genomes of Paraclostridium ghonii and Bacillus toyonensis. Both gene clusters were analyzed using AnnoView and found to contain unique features not present in other CNT-containing gene clusters. In particular, analysis of the P. ghonii toxin gene neighborhood revealed the identification of nearby genes indicating that P. ghonii toxin may have a specificity toward insect hosts. This prediction was experimentally validated in collaborative work, revealing P. ghonii toxin as a novel insecticidal neurotoxin. Lastly, as a third case study, I used AnnoView to perform a comparative genomic analysis of a putatively novel species of Clostridium known as ”clade X” identified from ancient DNA. The identification of conserved, unique gene neighborhood patterns within clade X but absent from closely related Clostridium genomes reinforces previous claims that clade X represents a distinct species. Ultimately, comparative analysis of gene neighborhoods using AnnoView and large-scale genomic information from existing databases (GTDB, AnnoTree, NCBI) is a powerful approach for for microbial genomics studies. Comparative analysis of gene neighborhoods in pathogenic genomes helps to expand our knowledge of virulence factors and pathogen- associated genomic traits, contributing to the characteristic of pathogen biology including even the identification of host species.

Description

Keywords

LC Keywords

Citation

Collections