Deep Learning for Peptide Feature Detection from Liquid Chromatography - Mass Spectrometry Data
Abstract
Proteins are the main workhorses of biological functions and activities, such as catalyzing metabolic reactions, DNA replication, providing structure to cells and organisms, etc. Comparative analysis of protein samples from a healthy person and disease afflicted person can discover disease biomarkers, which can be diagnostic or prognostic of the respective disease. Liquid chromatography with tandem mass spectrometry (LC-MS/MS) is the cutting-edge technology for protein identification and quantification. In this thesis, we target the first step in the LC-MS/MS analysis: peptide feature detection from LC-MS map, which is promising for disease biomarker discovery and protein quantification. LC-MS map is usually a three-dimensional plot where peptide features form multi-isotopic patterns. Each map may contain hundreds of thousands of peptide features, which frequently overlap, are tiny with respect to the background, and are often blended with feature-like noisy signals. All of these characteristics make peptide feature detection very challenging. However, deep learning is bringing groundbreaking results in various pattern recognition contexts. Therefore, in this thesis, we investigate deep learning models to address the peptide feature detection problem.
Existing tools for peptide feature detection are designed with domain-specific parameters whose different settings bring very different outcomes and, thus, prone to human error. Moreover, they are hardly updated despite a vast amount of newly coming proteomics data. As a solution, we develop a foundation for applying deep learning in automating peptide feature detection for the first time. The main strength of our approach is that it provides higher sensitivity than other existing tools by learning necessary parameters through training on the appropriate dataset, and newly available information can be easily integrated through fine-tuning the model. We first propose DeepIso, combining convolutional neural network (CNN) and recurrent neural network (RNN), providing higher sensitivity for peptide feature detection than other existing models. Then we offer PointIso, a point cloud based (set of data points in space) deep learning model with attention-based segmentation, which is three times faster than DeepIso and improves the feature detection as well. PointIso's sensitivity for detecting identified spiked peptides on a benchmark dataset is about 98%, which is 5% higher than other existing models. Then we perform a quality assessment of the peptide features generated by PointIso, showing its potential for biomarker discovery. We also apply PointIso to relative peptide abundance calculation among multiple samples, demonstrating its utility in label-free quantification. Finally, we adapt our 3D PointIso model to handle 4D data, achieving 4-6% higher sensitivity than other algorithms on the human proteome dataset. Therefore, our model is transferable to various contexts. We believe our research makes a notable contribution to accelerating the progress of deep learning in the proteomics area, as well as general pattern recognition study.
Collections
Cite this version of the work
Fatema Zohora
(2022).
Deep Learning for Peptide Feature Detection from Liquid Chromatography - Mass Spectrometry Data. UWSpace.
http://hdl.handle.net/10012/18181
Other formats
Related items
Showing items related by title, author, creator and subject.
-
A Machine-Learning-Based Algorithm for Peptide Feature Detection from Protein Mass Spectrometry Data
Zeng, Xiangyuan (University of Waterloo, 2021-05-13)Liquid chromatography with tandem mass spectrometry (LC-MS/MS) has been widely used in proteomics. Two types of data, MS and MS/MS data, are produced in an LC- MS/MS experiment. The MS data contains signal peaks corresponding ... -
Discovery of New Features for Peptide Sequencing with Mass Spectrometry
Wang, Tiancong (University of Waterloo, 2017-09-21)Bioinformaticians have been working on peptide sequencing with tandem mass spectrometry (MS/MS) for decades. However, the results are still not perfect. A lot of research have been carried on two peptide sequencing methods, ... -
Peptide-Driven Tri-Modal Gene Delivery Systems (PDTMG): Novel Versatile Peptide-Based Lipopolyplexes Incorporating Peptide-Functionalized Gemini Surfactants for Targeted Gene Therapy- Implementation of RGD Motifs as a Means for Endosomal Escape
Rafiee, Amirreza (University of Waterloo, 2018-09-04)The development of non-viral gene delivery vectors is highly challenging and aims to provide a safe while cost-effective manufacturing alternative to viral vectors. Eleven novel gemini surfactants (G4-G14) were designed ...