New Algorithms for Predicting Conformational Polymorphism and Inferring Direct Couplings for Side Chains of Proteins
Soltan Ghoraie, Laleh
MetadataShow full item record
Protein crystals populate diverse conformational ensembles. Despite much evidence that there is widespread conformational polymorphism in protein side chains, most of the xray crystallography data are modelled by single conformations in the Protein Data Bank. The ability to extract or to predict these conformational polymorphisms is of crucial importance, as it facilitates deeper understanding of protein dynamics and functionality. This dissertation describes a computational strategy capable of predicting side-chain polymorphisms. The applied approach extends a particular class of algorithms for side-chain prediction by modelling the side-chain dihedral angles more appropriately as continuous rather than discrete variables. Employing a new inferential technique known as particle belief propagation (PBP), we predict residue-speci c distributions that encode information about side-chain polymorphisms. The predicted polymorphisms are in relatively close agreement with results from a state-of-the-art approach based on x-ray crystallography data. This approach characterizes the conformational polymorphisms of side chains using electron density information, and has successfully discovered previously unmodelled conformations. Furthermore, it is known that coupled uctuations and concerted motions of residues can reveal pathways of communication used for information propagation in a molecule and hence, can help in understanding the \allostery" phenomenon in proteins. In order to characterize the coupled motions, most existing methods infer structural dependencies among a protein's residues. However, recent studies have highlighted the role of coupled side-chain uctuations alone in the allosteric behaviour of proteins, in contrast to a common belief that the backbone motions play the main role in allostery. These studies and the aforementioned recent discoveries about prevalent alternate side-chain conformations (conformational polymorphism) accentuate the need to devise new computational approaches that acknowledge side chains' roles. As well, these approaches must consider the polymorphic nature of the side chains, and incorporate e ects of this phenomenon (polymorphism) in the study of information transmission and functional interactions of residues in a molecule. Such frameworks can provide a more accurate understanding of the allosteric behaviour. Hence, as a topic related to the conformational polymorphism, this dissertation addresses the problem of inferring directly coupled side chains, as well. First, we present a novel approach to generate an ensemble of conformations and an e cient computational method to extract direct couplings of side chains in allosteric proteins. These direct couplings are used to provide sparse network representations of the coupled side chains. The framework is based on a fairly new statistical method, named graphical lasso (GLASSO), iii devised for sparse graph estimation. In the proposed GLASSO-based framework, the sidechain conformational polymorphism is taken into account. It is shown that by studying the intrinsic dynamics of an inactive structure alone, we are able to construct a network of functionally crucial residues. Second, we show that the proposed method is capable of providing a magni ed view of the coupled and conformationally polymorphic side chains. This model reveals couplings between the alternate conformations of a coupled residue pair. To the best of our knowledge, this is the rst computational method for extracting networks of side chains' alternate conformations. Such networks help in providing a detailed image of side-chain dynamics in functionally important and conformationally polymorphic sites, such as binding and/or allosteric sites. This information may assist in new drug-design alternatives. Side-chain conformations are commonly represented by multivariate angular variables. However, the GLASSO and other existing methods that can be applied to the aforementioned inference task are not capable of handling multivariate angular data. This dissertation further proposes a novel method to infer direct couplings from this type of data, and shows that this method is useful for identifying functional regions and their interactions in allosteric proteins. The proposed framework is a novel extension of canonical correlation analysis (CCA), which we call \kernelized partial CCA" (or simply KPCCA). Using the conformational information and uctuations of the inactive structure alone for allosteric proteins in the Ras and other Ras-like families, the KPCCA method identi ed allosterically important residues not only as strongly coupled ones but also in densely connected regions of the interaction graph formed by the inferred couplings. The results were in good agreement with other empirical ndings and outperformed those obtained by the GLASSO-based framework. By studying distinct members of the Ras, Rho, and Rab sub-families, we show further that KPCCA is capable of inferring common allosteric characteristics in the small G protein super-family.