Mass Spectrometry Based De Novo Peptide Sequencing Error Correction

Yao, Chenyu

Mass Spectrometry Based De Novo Peptide Sequencing Error Correction

Files

Yao_Chenyu.pdf (843.98 KB)

Date

2017-09-26

Authors

Yao, Chenyu

Advisor

Ma, Bin

Publisher

University of Waterloo

Abstract

Extensive study has been conducted on the identification of peptide sequences with mass spectrometry. With the development of computer hardware and algorithms, de novo sequencing has drawn attention from researchers for many years. Because it does not require a protein database, de novo sequencing is able to serve as either a complement of database searching or a stand alone method. As shown by Novor \cite{novor}, the speed of de novo sequencing significantly exceeds the speed of protein database searching. Improving the accuracy of de novo sequencing is essential. Overlapping peptides occur quite frequently in a typical heavy chain proteomics sample. In this thesis, we have proposed an algorithm to efficiently and reliably detect the overlapping peptides. In addition, two strategies named labeling and voting are designed to utilize overlapping peptides so as to improve the accuracy of de novo sequencing. According to the results, the effect of our labeling strategy is not obvious with the current version of Novor. Although the improvement made by the labeling strategy is not significant, we still demonstrate the potential of the method. However, the performance of our voting strategy is surprising and noteworthy. It is able to achieve significant improvement of de novo sequencing with little running time.