SPIDER: Reconstructive Protein Homology Search with De Novo Sequencing Tags
MetadataShow full item record
In the field of proteomic mass spectrometry, proteins can be sequenced by two independent yet complementary algorithms: de novo sequencing which uses no prior knowledge and database search which relies upon existing protein databases. In the case where an organism’s protein database is not available, the software Spider was developed in order to search sequence tags produced by de novo sequencing against a database from a related organism while accounting for both errors in the sequence tags and mutations. This thesis further develops Spider by using the concept of reconstruction in order to predict the real sequence by considering both the sequence tags and their matched homologous peptides. The significant value of these reconstructed sequences is demonstrated. Additionally, the runtime is greatly reduced and separated into independent caching and matching steps. This new approach allows for the development of an efficient algorithm for search. In addition, the algorithm’s output can be used for new applications. This is illustrated by a contribution to a complete protein sequencing application.