Grammar-Based Representations of Large Sparse Binary Matrices

Bian, Jingyun

Grammar-Based Representations of Large Sparse Binary Matrices

Files

Bian_Jingyun.pdf (1.05 MB)

Date

2016-10-26

Authors

Bian, Jingyun

Advisor

Yang, En-hui
Yang, En-hui

Publisher

University of Waterloo

Abstract

Large sparse matrices representation is a fundamental problem in big data processing and analysis. In some applications dealing with large sparse matrices, the I/O of these sparse matrices is the bottleneck of the whole system. To reduce the requirement of memory bandwidth in this scenario, it is important to develop alternative compact representations of large sparse matrices, while facilitating, if possible, matrix operations. In this thesis, we propose two grammar-based methods to compactly represent a sparse binary matrix with the capability of random accessing an element in the matrix. The first approach combines dimension coding (proposed by Yang[12]) with one of raster scan or Hilbert scan, where the so-called directionless grammar is applied. With the power of scanning, dimension coding’s capability of representing 1-D sparse signals can be extended to 2-D sparse matrices. This approach inherits the random accessibility of dimension coding. In the second approach, we will introduce a new concept called Context-free Bipartite Grammar (CFBG) and present a framework wherein large sparse binary matrices can be represented by CFBG. Similar to the traditional concept of Context-free Grammar (CFG), a CFBG consists of a set of production rules. Unlike CFGs, however, the right member of each production rule in a CFBG is a labeled bipartite graph with each edge labeled either as a variable or terminal symbol. As the right-hand side of a production rule is an ordered edge set, CFBG is also directionless. Two bipartite grammar transforms, a Sequential D-Neighborhood Pairing Transform (SNPT) and an Iterative Pairing Transform (IPT), are further presented to convert any binary matrix into a CFBG representing it. Experiments show that compared with popular sparse matrix storage methods such as compressed row storage and quadtree, grammar-based sparse binary matrix representations can reduce the storage requirement of sparse matrices significantly (by a factor of as much as 70).