The Automation of Glycopeptide Discovery in High Throughput MS/MS Data
Glycosylation, the addition of one or more carbohydrates molecules to a protein, is crucial for many cellular processes. Aberrant glycosylation is a key marker for various diseases such as cancer and rheumatoid arthritis. It has also recently been discovered that glycosylation is important in the ability of the Human Immunodeficiency Virus (HIV) to evade recognition by the immune system. Given the importance of glycosylation in disease, major efforts are underway in life science research to investigate the glycome, the entire glycosylation profile of an organelle, cell or tissue type. To date, little bioinformatics research has been performed in glycomics due to the complexity of glycan structures and the low throughput of carbohydrate analysis. Recent advances in mass spectrometry (MS) have greatly facilitated the analysis of the glycome. Increasingly, this technology is preferred over traditional methods of carbohydrate analysis which are often laborious and unsuitable for low abundance glycoproteins. When subject to mass spectrometry with collision-induced dissociation, glycopeptides produce characteristic MS/MS spectra that can be detected by visual inspection. However, given the high volume of data output from proteome studies today, manually searching for glycopeptides is an impractical task. In this thesis, we present a tool to automate the identification of glycopeptide spectra from MS/MS data. Further, we discuss some methodologies to automate the elucidation of the structure of the carbohydrate moiety of glycopeptides by adapting traditional MS/MS ion searching techniques employed in peptide sequence determination. MS/MS ion searching, a common technique in proteomics, aims to interpret MS/MS spectra by correlating structures from a database to the patterns represented in the spectrum. The tool was tested on high throughput proteomics data and was shown to identify 97% of all glycopeptides present in the test data. Further, the tool assigned correct carbohydrate structures to many of these glycopeptide MS/MS spectra. Applications of the tool in a proteomics environment for the analysis of glycopeptide expression in cancer tissue are also be presented.