Towards Automating Protein Structure Determination from NMR Data
MetadataShow full item record
Nuclear magnetic resonance (NMR) spectroscopy technique is becoming exceedingly significant due to its capability of studying protein structures in solution. However, NMR protein structure determination has remained a laborious and costly process until now, even with the help of currently available computer programs. After the NMR spectra are collected, the main road blocks to the fully automated NMR protein structure determination are peak picking from noisy spectra, resonance assignment from imperfect peak lists, and structure calculation from incomplete assignment and ambiguous nuclear Overhauser enhancements (NOE) constraints. The goal of this dissertation is to propose error-tolerant and highly-efficient methods that work well on real and noisy data sets of NMR protein structure determination and the closely related protein structure prediction problems. One major contribution of this dissertation is to propose a fully automated NMR protein structure determination system, AMR, with emphasis on the parts that I contributed. AMR only requires an input set with six NMR spectra. We develop a novel peak picking method, PICKY, to solve the crucial but tricky peak picking problem. PICKY consists of a noise level estimation step, a component forming step, a singular value decomposition-based initial peak picking step, and a peak refinement step. The first systematic study on peak picking problem is conducted to test the performance of PICKY. An integer linear programming (ILP)-based resonance assignment method, IPASS, is then developed to handle the imperfect peak lists generated by PICKY. IPASS contains an error-tolerant spin system forming method and an ILP-based assignment method. The assignment generated by IPASS is fed into the structure calculation step, FALCON-NMR. FALCON-NMR has a threading module, an ab initio module, an all-atom refinement module, and an NOE constraints-based decoy selection module. The entire system, AMR, is successfully tested on four out of five real proteins with practical NMR spectra, and generates 1.25A, 1.49A, 0.67A, and 0.88A to the native reference structures, respectively. Another contribution of this dissertation is to propose novel ideas and methods to solve three protein structure prediction problems which are closely related to NMR protein structure determination. We develop a novel consensus contact prediction method, which is able to eliminate server correlations, to solve the protein inter-residue contact prediction problem. We also propose an ultra-fast side chain packing method, which only uses local backbone information, to solve the protein side chain packing problem. Finally, two complementary local quality assessment methods are proposed to solve the local quality prediction problem for comparative modeling-based protein structure prediction methods.