dc.contributor.author | Zhang, Shuyang | |
dc.date.accessioned | 2023-08-28 13:01:08 (GMT) | |
dc.date.available | 2023-08-28 13:01:08 (GMT) | |
dc.date.issued | 2023-08-28 | |
dc.date.submitted | 2023-08-23 | |
dc.identifier.uri | http://hdl.handle.net/10012/19763 | |
dc.description.abstract | Sequencing proteins and glycans have important clinical applications, as glycosylation is
shown to play a significant role in cellular communication and immune response. Certain
glycans are linked to the diagnosis of cancer as well as targeted immunotherapy. Mass
spectrometry is a powerful tool that helps us gain insight into peptide sequences and glycan
structures, by using database search, spectral library, or de novo sequencing. Spectrum
and retention time prediction using deep learning has gained popularity with studies on
non-glycosylated peptides and has been shown to improve database search results via
rescoring. This thesis proposes deep learning models to predict spectrum and retention
time for N-glycopeptides and then discusses the applications of these models with respect
to glycopeptide sequencing.
Chapter 3 presents a graph deep learning model to predict fragment ion intensities of
observed spectrums and define a spectrum representation for glycan fragments with up to
three cleavages. The spectrum prediction model has a median cosine similarity of 0.921,
which is 20% higher than previous attempts at glycopeptide spectrum prediction.
For retention time prediction in Chapter 4, we propose a model with two parallel
encoders for both peptide and glycan input and apply transfer learning for the sequence
encoder. The retention time prediction model has a Pearson correlation of 1.0, which is
higher than the previous 0.98 and 0.96 attempts. We also introduce the 95 percentile delta
as an evaluation metric, as well as discuss the interpretability of our model.
Finally in Chapter 5, we apply our spectrum and retention time prediction models
in glycopeptide sequencing pipelines, including database search and de novo search. We
show that our model improves identification by rescoring and has the potential to be
used as a filter for false positives. We also demonstrate that our model improves de novo
identification when used in the scoring function. | en |
dc.language.iso | en | en |
dc.publisher | University of Waterloo | en |
dc.subject | deep learning | en |
dc.subject | bioinformatics | en |
dc.subject | glycomics | en |
dc.title | Spectrum and Retention Time Prediction for N-Glycopeptides Using Deep Learning | en |
dc.type | Master Thesis | en |
dc.pending | false | |
uws-etd.degree.department | David R. Cheriton School of Computer Science | en |
uws-etd.degree.discipline | Computer Science | en |
uws-etd.degree.grantor | University of Waterloo | en |
uws-etd.degree | Master of Mathematics | en |
uws-etd.embargo.terms | 0 | en |
uws.contributor.advisor | Li, Ming | |
uws.contributor.affiliation1 | Faculty of Mathematics | en |
uws.published.city | Waterloo | en |
uws.published.country | Canada | en |
uws.published.province | Ontario | en |
uws.typeOfResource | Text | en |
uws.peerReviewStatus | Unreviewed | en |
uws.scholarLevel | Graduate | en |