Discovery of New Features for Peptide Sequencing with Mass Spectrometry
MetadataShow full item record
Bioinformaticians have been working on peptide sequencing with tandem mass spectrometry (MS/MS) for decades. However, the results are still not perfect. A lot of research have been carried on two peptide sequencing methods, database search and de novo sequencing. However, due to the quality of spectra and the inherent difficulty of this problem itself, both methods are having problem improving their results further better. The publishing of the NIST peptide library in May 2014 brought fresh ideas into this long lasting problem. This peptide library contains a large amount of MS/MS spectra and their corresponding peptide sequences. Taking advantage of this high-quality dataset, more and more researches have started to find internal patterns in MS/MS spectra since then. In this thesis, we are going to look more into this peptide library and use statistical and machine learning ideas to find new features to help improve peptide sequencing results. Two main contributions have been made. First, a general scoring feature is presented that can be incorporated in the scoring functions of other peptide sequencing software. The scoring feature is based on the intensity ratios between two adjacent y-ions in the spectrum. A method is proposed to obtain the probability distributions of such ratios, and to calculate the scoring feature based on the distributions. To demonstrate the performance of the method, this new feature is incorporated with X!Tandem and Novor and significantly improved their performances on testing data, respectively. Second, a machine learning model to predict the appearances of internal fragment ions in MS/MS spectra is presented. Even though this is the first model on this topic to the best of our knowledge, it achieves fairly good results. Several possible applications of this model are also discussed to show that this topic is valuable for peptide sequencing and thus worth further research.
Cite this version of the work
Tiancong Wang (2017). Discovery of New Features for Peptide Sequencing with Mass Spectrometry. UWSpace. http://hdl.handle.net/10012/12421