Peptide Sequencing with Deep Learning

Loading...
Thumbnail Image

Date

2020-09-30

Authors

Qiao, Rui

Advisor

Ghodsi, Ali

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

In shotgun proteomics, de novo peptide sequencing from tandem mass spectrometry data is the key technology for finding new peptide or protein sequences. It has successful applications in assembling monoclonal antibody sequences and great potentials for identifying neoantigens for personalized cancer vaccines. In this thesis, I propose a novel deep neural network-based de novo peptide sequencing model: PointNovo. The proposed PointNovo model not only outperforms the previous state-of-the-art model by a significant margin but also solves the long-standing accuracy–speed/memory trade-off problem that exists in previous de novo peptide sequencing tools. Further, our experiment results show that even though PointNovo is not trained to distinguish between true and false peptide spectrum matching, its resulting log probability score can be used as a scoring function to perform database searching. On several different datasets, we show that PointNovo, when used as a database search engine, can achieve an identification rate that is at least comparable to existing popular database search softwares. We also extend and adapt an existing model to process Data Independent Acquisition (DIA) data and propose the first de novo peptide sequencing algorithm for DIA tandem mass spectra. Finally, we develop a workflow that can identify tumor-specific antigens directly and purely from mass spectrometry data of tumor tissues and test it on a published dataset of tumor samples from melanoma patients. Our workflow applies de novo peptide sequencing to detect mutated endogenous peptides, in contrast to the prevalent indirect approach of combining exome sequencing, somatic mutation calling, and epitope prediction in existing methods. More importantly, we develop machine learning models that are tailored to each patient based on their own MS data. Such a personalized approach enables accurate identification of neoantigens for the development of personalized cancer vaccines. We applied the workflow to datasets of five melanoma patients and expanded their immunopeptidomes by 5% to 15%. Subsequently, we discovered 17 neoantigens of both HLA–I and HLA–II, including those with validated T cell responses and those novel neoantigens that had not been reported in previous studies.

Description

Keywords

de novo peptide sequencing, deep learning, mass spectrometry, neoantigen identification

LC Subject Headings

Citation