Recovery Guarantees for Graph Clustering Problems

Loading...
Thumbnail Image

Date

2021-12-06

Authors

Majmudar, Jimit

Advisor

Vavasis, Stephen

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Graph clustering is widely-studied unsupervised learning problem in which the task is to group similar entities together based on observed pairwise entity interactions. This problem has applications in diverse domains such as social network analysis and computational biology. There are multiple ways to formalize a graph clustering problem. In this thesis, using tools from convex optimization, we develop algorithms for two specific graph clustering formulations \emph{Overlapping Community Detection} and \emph{Correlation Clustering}. We study these formulations using the provable recovery paradigm which requires establishing theoretical guarantees for recovery of a certain ground truth clustering as posited by a chosen generative model. In the Overlapping Community Detection problem, we expect clusters in the input graph to potentially overlap, i.e. share some common nodes. For this problem, often a \emph{pure nodes} assumption is made in literature which requires each cluster to have a node that belongs exclusively to that cluster. This assumption, however, may not be satisfied in practice. We propose a linear-programming-based algorithm to provably recover overlapping communities in weighted graphs without explicitly making the pure nodes assumption. We demonstrate the success of our algorithm on synthetic and real-world datasets. In the Correlation Clustering problem, we wish to determine non-overlapping clusters in the input graph without any prior knowledge of the number of clusters. We introduce a new graph generative model based on generating feature vectors/embeddings for the nodes in the graph which are interpreted as latent variables in the model, and propose a tuning-parameter-free semidefinite-programming-based algorithm to recover nodes with sufficiently strong cluster membership. We make progress towards showing that the proposed algorithm is provably robust.

Description

Keywords

LC Subject Headings

Citation