Discovering new viral lineages and estimating their abundance in wastewater
Loading...
Date
2022-09-27
Authors
Ellmen, Isaac
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Wastewater surveillance of SARS-CoV-2 has emerged as a critical tool for tracking the spread of COVID-19. In addition to estimating the relative case numbers using qPCR, SARS-CoV-2 genomic RNA can be extracted from wastewater and sequenced. The sequenced genomes provide information about which lineages, in particular which variants of concern (VOCs) are present in a community. Wastewater RNA sequencing data has two distinct challenges: First, the genomes are highly fragmented and the alignments often have poor genome coverage. Second, the samples are comprised of a mixture of genomes so mutations cannot be directly attributed to a single lineage. In this thesis, I explore methods to overcome these two challenges to extract useful information from the samples. First, I look at the problem of determining the relative abundance of VOCs. Most existing techniques only consider mutations which are unique to a particular VOC which massively reduces the amount of usable data. I introduce a new technique which extends mean and median frequencies over shared mutations in order to make use of the huge pool of shared mutations. Next, I investigate strategies for designing single-amplicon sequencing methods. I look at selecting single amplicons which are well-conserved and rich in information. I also design a single amplicon which is capable of amplifying multiple coronaviruses. I conclude the SARS-CoV-2 work by providing a technique which can identify novel lineages and sublineages from wastewater sequencing runs. Finally, I show that the techniques for analyzing SARS-CoV-2 in wastewater can also be applied to an important plant pathogen, the Tomato Brown Rugose Fruit Virus.
Description
Keywords
bioinformatics, sars-cov-2, wastewater, sequencing