Scalable Informative Rule Mining

Feng, Guoyao

Scalable Informative Rule Mining

Files

Feng_Guoyao.pdf (1.23 MB)

Date

2016-08-10

Authors

Feng, Guoyao

Advisor

Golab, Lukasz
Keshav, Srinivasan

Publisher

University of Waterloo

Abstract

In this thesis we present SIRUM: a system for Scalable Informative RUle Mining from multi-dimensional data. Informative rules have recently been studied in several contexts, including data summarization, data cube exploration and data quality. The objective is to produce a concise set of rules (patterns) over the values of the dimension attributes that provide the most information about the distribution of a numeric measure attribute. SIRUM optimizes this task for big, wide and distributed datasets. We implemented SIRUM in Spark and observed significant performance improvements on real data due to our optimizations.