New Approaches to Protein NMR Automation
Alipanahi Ramandi, Babak
MetadataShow full item record
The three-dimensional structure of a protein molecule is the key to understanding its biological and physiological properties. A major problem in bioinformatics is to efficiently determine the three-dimensional structures of query proteins. Protein NMR structure de- termination is one of the main experimental methods and is comprised of: (i) protein sample production and isotope labelling, (ii) collecting NMR spectra, and (iii) analysis of the spectra to produce the protein structure. In protein NMR, the three-dimensional struc- ture is determined by exploiting a set of distance restraints between spatially proximate atoms. Currently, no practical automated protein NMR method exists that is without human intervention. We first propose a complete automated protein NMR pipeline, which can efficiently be used to determine the structures of moderate sized proteins. Second, we propose a novel and efficient semidefinite programming-based (SDP) protein structure determination method. The proposed automated protein NMR pipeline consists of three modules: (i) an automated peak picking method, called PICKY, (ii) a backbone chemical shift assign- ment method, called IPASS, and (iii) a protein structure determination method, called FALCON-NMR. When tested on four real protein data sets, this pipeline can produce structures with reasonable accuracies, starting from NMR spectra. This general method can be applied to other macromolecule structure determination methods. For example, a promising application is RNA NMR-assisted secondary structure determination. In the second part of this thesis, due to the shortcomings of FALCON-NMR, we propose a novel SDP-based protein structure determination method from NMR data, called SPROS. Most of the existing prominent protein NMR structure determination methods are based on molecular dynamics coupled with a simulated annealing schedule. In these methods, an objective function representing the error between observed and given distance restraints is minimized; these objective functions are highly non-convex and difficult to optimize. Euclidean distance geometry methods based on SDP provide a natural formulation for realizing a three-dimensional structure from a set of given distance constraints. However, the complexity of the SDP solvers increases cubically with the input matrix size, i.e., the number of atoms in the protein, and the number of constraints. In fact, the complexity of SDP solvers is a major obstacle in their applicability to the protein NMR problem. To overcome these limitations, the SPROS method models the protein molecule as a set of intersecting two- and three-dimensional cliques. We adapt and extend a technique called semidefinite facial reduction for the SDP matrix size reduction, which makes the SDP problem size approximately one quarter of the original problem. The reduced problem is solved nearly one hundred times faster and is more robust against numerical problems. Reasonably accurate results were obtained when SPROS was applied to a set of 20 real protein data sets.