Statistical Foundations for Learning on Graphs

Baranwal, Aseem

Statistical Foundations for Learning on Graphs

dc.contributor.advisor	Fountoulakis, Kimon
dc.contributor.advisor	Jagannath, Aukosh
dc.contributor.author	Baranwal, Aseem
dc.date.accessioned	2024-11-27T19:18:07Z
dc.date.available	2024-11-27T19:18:07Z
dc.date.issued	2024-11-27
dc.date.submitted	2024-11-22
dc.description.abstract	Graph Neural Networks are one of the most popular architectures used to solve classification problems on data where entities have attribute information accompanied by relational information. Among them, Graph Convolutional Networks and Graph Attention Networks are two of the most popular GNN architectures. In this thesis, I present a statistical framework for understanding node classification on feature-rich relational data. First, I use the framework to study the generalization error and the effects of existing neural network architectures, namely, graph convolutions and graph attention on the Contextual Stochastic Block Model in the regime where the average degree of a node is at least order log squared n in the number of nodes n. Second, I propose a notion of asymptotic local optimality for node classification tasks and design a GNN architecture that is provably optimal in this notion, for the sparse regime, i.e., average degree O(1). In the first part, I present a rigorous theoretical understanding of the effects of graph convolutions in neural networks through the node classification problem of a non-linearly separable Gaussian mixture model coupled with a stochastic block model. First, I identify two quantities corresponding to the signal from the two sources of information: the graph, and the node features, followed by a result that shows that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data by a factor of up to one over square root of the expected degree of a node. Second, I show that with a slightly stronger graph density, two graph convolutions improve this factor to up to 1/sqrt{n}, where n is the number of nodes in the graph. This set of results provides both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a neural network, concluding that the performance is mutually similar for all combinations of the placement. In the second part, the analysis of graph attention is provided, where the main result states that in a well-defined ``hard'' regime, every attention mechanism fails to distinguish the intra-class edges from the inter-class edges. In addition, if the signal in the node attributes is sufficiently weak, graph attention convolution cannot perfectly classify the nodes even if the intra-class edges are separable from the inter-class edges. In the third part, I study the node classification problem on feature-decorated graphs in the sparse setting, i.e., when the expected degree of a node is O(1) in the number of nodes, in the fixed-dimensional asymptotic regime, i.e., the dimension of the feature data is fixed while the number of nodes is large. Such graphs are typically known to be locally tree-like. Here, I introduce a notion of Bayes optimality for node classification tasks, called asymptotic local Bayes optimality, and compute the optimal classifier according to this criterion for a fairly general statistical data model with arbitrary distributions of the node features and edge connectivity. The optimal classifier is implementable using a message-passing graph neural network architecture. This is followed by a result that precisely computes the generalization error of this optimal classifier, and compares its performance statistically against existing learning methods on a well-studied data model with naturally identifiable signal-to-noise ratios (SNRs). We find that the optimal message-passing architecture interpolates between a standard MLP in the regime of low graph signal and a typical graph convolutional layer in the regime of high graph signal. Furthermore, I provide a corresponding non-asymptotic result that demonstrates the practical potential of the asymptotically optimal classifier.
dc.identifier.uri	https://hdl.handle.net/10012/21209
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	https://github.com/opallab/Graph-Convolution-for-Semi-Supervised-Classification-Improved-Linear-Separability-and-OoD-Gen.
dc.relation.uri	https://github.com/opallab/Effects-of-Graph-Convs-in-Deep-Nets
dc.relation.uri	https://github.com/opallab/Graph-Attention-Retrospective
dc.relation.uri	https://github.com/opallab/optimality-mp-archs-sparse-graphs
dc.title	Statistical Foundations for Learning on Graphs
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Fountoulakis, Kimon
uws.contributor.advisor	Jagannath, Aukosh
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Baranwal_Aseem.pdf
Size:: 10.81 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science