Spectrum and Retention Time Prediction for N-Glycopeptides Using Deep Learning

dc.contributor.advisorLi, Ming
dc.contributor.authorZhang, Shuyang
dc.date.accessioned2023-08-28T13:01:08Z
dc.date.available2023-08-28T13:01:08Z
dc.date.issued2023-08-28
dc.date.submitted2023-08-23
dc.description.abstractSequencing proteins and glycans have important clinical applications, as glycosylation is shown to play a significant role in cellular communication and immune response. Certain glycans are linked to the diagnosis of cancer as well as targeted immunotherapy. Mass spectrometry is a powerful tool that helps us gain insight into peptide sequences and glycan structures, by using database search, spectral library, or de novo sequencing. Spectrum and retention time prediction using deep learning has gained popularity with studies on non-glycosylated peptides and has been shown to improve database search results via rescoring. This thesis proposes deep learning models to predict spectrum and retention time for N-glycopeptides and then discusses the applications of these models with respect to glycopeptide sequencing. Chapter 3 presents a graph deep learning model to predict fragment ion intensities of observed spectrums and define a spectrum representation for glycan fragments with up to three cleavages. The spectrum prediction model has a median cosine similarity of 0.921, which is 20% higher than previous attempts at glycopeptide spectrum prediction. For retention time prediction in Chapter 4, we propose a model with two parallel encoders for both peptide and glycan input and apply transfer learning for the sequence encoder. The retention time prediction model has a Pearson correlation of 1.0, which is higher than the previous 0.98 and 0.96 attempts. We also introduce the 95 percentile delta as an evaluation metric, as well as discuss the interpretability of our model. Finally in Chapter 5, we apply our spectrum and retention time prediction models in glycopeptide sequencing pipelines, including database search and de novo search. We show that our model improves identification by rescoring and has the potential to be used as a filter for false positives. We also demonstrate that our model improves de novo identification when used in the scoring function.en
dc.identifier.urihttp://hdl.handle.net/10012/19763
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectdeep learningen
dc.subjectbioinformaticsen
dc.subjectglycomicsen
dc.titleSpectrum and Retention Time Prediction for N-Glycopeptides Using Deep Learningen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.comment.hiddenThe previous deposit from me was accidentally an older version of my thesis. The only difference in this new deposit is in the acknowledgment section. I would appreciate it if this could be used instead of the other deposit.en
uws.contributor.advisorLi, Ming
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_Shuyang.pdf
Size:
2.55 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: