Fast Multi-Level Co-Clustering

Loading...
Thumbnail Image

Date

2013

Authors

Xu, Haifeng

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We present a new multilevel method for hierarchical co-clustering. The fast multilevel co-clustering method (FMCC) implements a bi-coarsening process on the bipartite graph induced by the feature matrix. It does so in a recursive manner, producing a hierarchy of overlapping co-clusters and their connections to each other, as encoded in the co-cluster membership matrices and the coarse feature matrices that are obtained in the graph coarsening procedure. FMCC is inspired by principles of the algebraic multigrid (AMG) method for solving linear equation systems, which uses heuristic grouping criteria that are based on strength of connection in the operator matrix and are fast and scalable. Compared with other co-clustering algorithms, FMCC has the following advantages: it is computationally efficient (almost linear in the data size); there is no need to specify the number of clusters since FMCC finds it automatically; and the clustering gives hierarchical structure for both the row and the column variables. FMCC produces interpretable co-clusters on several recursive levels along with information on how they are connected, and thus allows to investigate the potential multilevel co-cluster structure of complex real data. The method is accurate, fast and scalable, as demonstrated by numerical tests on co-clustering problems with synthetic and real data from the fields of gene expression data and online social networks.

Description

Keywords

LC Subject Headings

Citation