A Study of Random Duplication Graphs and Degree Distribution Pattern of Protein-Protein Interaction Networks
The motivation of this thesis is to find the reason why protein-protein interaction networks present a unique degree distribution pattern, where the majority of the proteins are sparsely connected, while densely-connected proteins also exist. Since the degree distribution pattern of protein-protein interaction networks arises through a long-time evolutionary process of gene duplication, we introduce the model of random duplication graph to depict protein-protein networks mathematically. Specifically, we intend to derive the degree distribution function of protein-protein interaction networks by modeling protein-protein interaction networks as a special case of random duplication graph. The random duplication graph model mimics the gene duplication process. In a random duplication graph, one vertex is chosen uniformly at random to duplicate at every timestep t, and all the edges of the original vertex are preserved by the new vertex. We derive the expected degree distribution function of the model from the probability master function. Furthermore, we learn from the Erdös-Rényi random graph model that the degree distribution function does not necessarily converge in a single random duplication graph. In consequences, we define the n-fold of random duplication graphs, a combination of n independent random duplication graphs, under which we are able to prove that the degree distribution function converges. Furthermore, we model the protein-protein interaction networks as a special case of random duplication graph with sparse initial graph, and the degree distribution function of protein-protein interaction networks is derived. We compare this degree distribution function with degree distribution data of reconstructed protein-protein interaction networks, and we show that this degree distribution function indeed resembles the degree distribution pattern in protein-protein interaction networks. Our model gives a theoretical analysis of the self-organization process of protein-protein interaction networks. Moreover, we have shown that it is the gene duplication process combined with the sparsely-connected initial condition that leads to the unique degree distribution pattern in protein-protein interaction networks. We can make a further prediction based on our analysis—as the gene duplication process proceeds, the percentage of densely-connected proteins will be higher.