Spectrum and Retention Time Prediction for N-Glycopeptides Using Deep Learning

Zhang, Shuyang

dc.contributor.author	Zhang, Shuyang
dc.date.accessioned	2023-08-28 13:01:08 (GMT)
dc.date.available	2023-08-28 13:01:08 (GMT)
dc.date.issued	2023-08-28
dc.date.submitted	2023-08-23
dc.identifier.uri	http://hdl.handle.net/10012/19763
dc.description.abstract	Sequencing proteins and glycans have important clinical applications, as glycosylation is shown to play a significant role in cellular communication and immune response. Certain glycans are linked to the diagnosis of cancer as well as targeted immunotherapy. Mass spectrometry is a powerful tool that helps us gain insight into peptide sequences and glycan structures, by using database search, spectral library, or de novo sequencing. Spectrum and retention time prediction using deep learning has gained popularity with studies on non-glycosylated peptides and has been shown to improve database search results via rescoring. This thesis proposes deep learning models to predict spectrum and retention time for N-glycopeptides and then discusses the applications of these models with respect to glycopeptide sequencing. Chapter 3 presents a graph deep learning model to predict fragment ion intensities of observed spectrums and define a spectrum representation for glycan fragments with up to three cleavages. The spectrum prediction model has a median cosine similarity of 0.921, which is 20% higher than previous attempts at glycopeptide spectrum prediction. For retention time prediction in Chapter 4, we propose a model with two parallel encoders for both peptide and glycan input and apply transfer learning for the sequence encoder. The retention time prediction model has a Pearson correlation of 1.0, which is higher than the previous 0.98 and 0.96 attempts. We also introduce the 95 percentile delta as an evaluation metric, as well as discuss the interpretability of our model. Finally in Chapter 5, we apply our spectrum and retention time prediction models in glycopeptide sequencing pipelines, including database search and de novo search. We show that our model improves identification by rescoring and has the potential to be used as a filter for false positives. We also demonstrate that our model improves de novo identification when used in the scoring function.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	deep learning	en
dc.subject	bioinformatics	en
dc.subject	glycomics	en
dc.title	Spectrum and Retention Time Prediction for N-Glycopeptides Using Deep Learning	en
dc.type	Master Thesis	en
dc.pending	false
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Master of Mathematics	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Li, Ming
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Zhang_Shuyang.pdf
Size:: 2.552Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record