Statistical Foundations for Learning on Graphs
Loading...
Date
2024-11-27
Authors
Advisor
Fountoulakis, Kimon
Jagannath, Aukosh
Jagannath, Aukosh
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Graph Neural Networks are one of the most popular architectures used to solve classification problems on data where entities have attribute information accompanied by relational information. Among them, Graph Convolutional Networks and Graph Attention Networks are two of the most popular GNN architectures.
In this thesis, I present a statistical framework for understanding node classification on feature-rich relational data. First, I use the framework to study the generalization error and the effects of existing neural network architectures, namely, graph convolutions and graph attention on the Contextual Stochastic Block Model in the regime where the average degree of a node is at least order log squared n in the number of nodes n.
Second, I propose a notion of asymptotic local optimality for node classification tasks and design a GNN architecture that is provably optimal in this notion, for the sparse regime, i.e., average degree O(1).
In the first part, I present a rigorous theoretical understanding of the effects of graph convolutions in neural networks through the node classification problem of a non-linearly separable Gaussian mixture model coupled with a stochastic block model.
First, I identify two quantities corresponding to the signal from the two sources of information: the graph, and the node features, followed by a result that shows that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data by a factor of up to one over square root of the expected degree of a node.
Second, I show that with a slightly stronger graph density, two graph convolutions improve this factor to up to 1/sqrt{n}, where n is the number of nodes in the graph.
This set of results provides both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a neural network, concluding that the performance is mutually similar for all combinations of the placement.
In the second part, the analysis of graph attention is provided, where the main result states that in a well-defined ``hard'' regime, every attention mechanism fails to distinguish the intra-class edges from the inter-class edges. In addition, if the signal in the node attributes is sufficiently weak, graph attention convolution cannot perfectly classify the nodes even if the intra-class edges are separable from the inter-class edges.
In the third part, I study the node classification problem on feature-decorated graphs in the sparse setting, i.e., when the expected degree of a node is O(1) in the number of nodes, in the fixed-dimensional asymptotic regime, i.e., the dimension of the feature data is fixed while the number of nodes is large. Such graphs are typically known to be locally tree-like. Here, I introduce a notion of Bayes optimality for node classification tasks, called asymptotic local Bayes optimality, and compute the optimal classifier according to this criterion for a fairly general statistical data model with arbitrary distributions of the node features and edge connectivity. The optimal classifier is implementable using a message-passing graph neural network architecture. This is followed by a result that precisely computes the generalization error of this optimal classifier, and compares its performance statistically against existing learning methods on a well-studied data model with naturally identifiable signal-to-noise ratios (SNRs). We find that the optimal message-passing architecture interpolates between a standard MLP in the regime of low graph signal and a typical graph convolutional layer in the regime of high graph signal. Furthermore, I provide a corresponding non-asymptotic result that demonstrates the practical potential of the asymptotically optimal classifier.