Peptide Sequencing with Deep Learning
Loading...
Date
2020-09-30
Authors
Qiao, Rui
Advisor
Ghodsi, Ali
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
In shotgun proteomics, de novo peptide sequencing from tandem mass spectrometry
data is the key technology for finding new peptide or protein sequences. It has successful applications in assembling monoclonal antibody sequences and great potentials for
identifying neoantigens for personalized cancer vaccines. In this thesis, I propose a novel
deep neural network-based de novo peptide sequencing model: PointNovo. The proposed
PointNovo model not only outperforms the previous state-of-the-art model by a significant
margin but also solves the long-standing accuracy–speed/memory trade-off problem that
exists in previous de novo peptide sequencing tools. Further, our experiment results show
that even though PointNovo is not trained to distinguish between true and false peptide
spectrum matching, its resulting log probability score can be used as a scoring function
to perform database searching. On several different datasets, we show that PointNovo,
when used as a database search engine, can achieve an identification rate that is at least
comparable to existing popular database search softwares.
We also extend and adapt an existing model to process Data Independent Acquisition
(DIA) data and propose the first de novo peptide sequencing algorithm for DIA tandem
mass spectra.
Finally, we develop a workflow that can identify tumor-specific antigens directly and
purely from mass spectrometry data of tumor tissues and test it on a published dataset of
tumor samples from melanoma patients. Our workflow applies de novo peptide sequencing
to detect mutated endogenous peptides, in contrast to the prevalent indirect approach of
combining exome sequencing, somatic mutation calling, and epitope prediction in existing
methods. More importantly, we develop machine learning models that are tailored to each
patient based on their own MS data. Such a personalized approach enables accurate identification of neoantigens for the development of personalized cancer vaccines. We applied
the workflow to datasets of five melanoma patients and expanded their immunopeptidomes
by 5% to 15%. Subsequently, we discovered 17 neoantigens of both HLA–I and HLA–II,
including those with validated T cell responses and those novel neoantigens that had not
been reported in previous studies.
Description
Keywords
de novo peptide sequencing, deep learning, mass spectrometry, neoantigen identification