High-order pattern discovery and analysis of discrete-valued data sets

Loading...
Thumbnail Image

Date

Authors

Wang, Yang

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Automatic pattern discovery from data collections and the analysis of the patterns for useful information are common and important in both science and engineering today. This discovery is especially demanding in challenging industrial and business applications where the explosive volume of data makes manual analysis virtually impossible. The problems of pattern discovery and analysis that this research addresses include: 1) the discovery of polythetic patterns; 2) the discovery of patterns in the presence of noise and uncertainties; 3) schema for representing different order patterns; 4) the inference process for flexible pattern prediction; and 5) the application of pattern discovery to large database analysis and data mining. In this thesis, the design and development of a system for pattern discovery and analysis of categorical or discrete-valued data is presented. The system starts with detecting the event association patterns of different orders and provides a probabilistic inference mechanism to achieve flexible classification and prediction. Here a pattern is defined as a significant event association in a problem domain. To detect significant event associations, residual analysis in statistics is used. The insights gained from the analysis of the event association of different orders and the properties of the residuals lead to a general pattern discovery paradigm which detects patterns according to the deviations of the observed patterns from a default model. Along with the paradigm, techniques are developed to avoid exhaustive search in the process of discovering high order patterns from a large data set. An attribute hypergraph is proposed to represent and to operate on the discovered patterns which can be of different orders. The pattern discovery process can be viewed as a hypergraph generation process. The attributed hypergraph acts as a bridge linking the pattern discovery process with the inference process. For pattern analysis and inference, a generalized reasoning process based on the weight of evidence is introduced. With this paradigm, flexible prediction becomes possible. This thesis covers also the implementation of the major ideas outlined in the pattern discovery framework in an integrated software system. It ends with discussions on the experimental results of pattern discovery and analysis on data obtained from various sources (including synthetic and real-world data). Compared with the existing systems, the new methodology this thesis presents stands out, possessing significant and superior characteristics in both pattern discovery and pattern analysis.

Description

LC Subject Headings

Citation