|dc.description.abstract||Over the past decade, there has been an explosion of network data in a vast number of circumstances, such as the World Wide Web, social networks, gene interactions, economic networks, etc. Scientific analysis of networks is of significant importance in revealing laws governing complex systems. Community detection, one of the fundamental problems in network analysis, discovers the underlying cluster structure of nodes in a network. The Stochastic Block Model (SBM) is an influential framework for model-based community detection. In this thesis, we first propose a Continuous-time Stochastic Block Model (CSBM). Furthermore, we develop a multistate CSBM and use it to analyze Basketball games. Finally, we propose a novel variable selection approach by constructing networks among variables and applying SBM techniques.
Various Stochastic Block Models have been developed for static networks, such as a network of Facebook users. Theoretical properties of these models have been studied recently. However, for transactional networks, for example, a network of email users who frequently send emails to each other, research is relatively limited. Most existing works either do not take time into account or treat time in a discrete manner (as in discrete-time Markov chains). In contrast, we propose a Continuous-time Stochastic Block Model (CSBM) for transactional networks. Transactions between pairs of nodes are modeled as inhomogeneous Poisson processes, where the rate function of each Poisson process is characterized by the underlying cluster labels of the corresponding pair of nodes. The CSBM is capable of not only detecting communities but also capturing how transaction patterns evolve among communities.
As an important application, a multistate CSBM is developed and applied to basketball games. Basketball data analysis has gained enormous attention from enthusiasts and professionals from various fields. We advocate that basketball games can be analyzed as transactional networks. The multistate CSBM models basketball plays as continuous-time Markov chains. The model clusters players according to their playing styles and performance. It also provides cluster-specific estimates of the effectiveness of players at scoring, rebounding, stealing, etc, and also captures player interaction patterns within and between clusters. Moreover, the model reveals subtle differences in the offensive strategies of different teams. Applications to NBA basketball games illustrate the performance of the multistate CSBM.
The SBM framework can also be employed for variable selection. In the past two decades, variable selection has become one of the central topics in statistical learning and high-dimensional data analysis. Numerous methods have been successfully developed. In general, there are mainly three types of approaches: penalized likelihood methods, variable screening methods and Bayesian variable selection methods. Nevertheless, in this thesis, we propose a novel variable selection method: Variable Selection Networks, which is in a new framework --- Variable Selection Ensembles. Given a regression model with p covariates, we consider the ensemble of all p(p-1)/2 submodels with two covariates. We construct networks with nodes being the p variables and each edge being a measure of the importance of the pair of variables to which it connects. We show that such networks have block structures. Variable selection is conducted by applying SBM techniques to these networks.||en