Efficient Pattern Search in Large, Partial-Order Data Sets

Nichols, Matthew

Efficient Pattern Search in Large, Partial-Order Data Sets

Files

uw-ethesis.pdf (1.1 MB)

Date

2008-09-12T20:21:59Z

Authors

Nichols, Matthew

Publisher

University of Waterloo

Abstract

The behaviour of a large, distributed system is inherently complex. One step towards making this behaviour more understandable to a user involves instrumenting the system and collecting data about its execution. We can model the data as traces (representing various sequential entities in the system such as single-threaded processes) that contain both events local to the trace and communication events involving another trace. Visualizing this data provides a modest benefit to users as it makes basic interactions in the system clearer and, with some user effort, more complex interactions can be determined. Unfortunately, visualization by itself is not an adequate solution, especially for large numbers of events and complex interactions among traces. A search facility has the ability to make this event data more useful. Work has been done previously on various frameworks and algorithms that could form the core of such a search facility; however, various shortcomings in the completeness of the frameworks and in the efficiency of the algorithms resulted in an inconsistent, incomplete, and inefficient solution. This thesis takes steps to remedy this situation. We propose a provably-complete framework for determining precedence between sets of events and propose additions to a previous pattern-specification language so it can specify a wider variety of search patterns. We improve the efficiency of the existing search algorithm, and provide a new, more efficient, algorithm that processes a pattern in a fundamentally different way. Furthermore, the various proposed improvements have been implemented and are analysed empirically.