Log Event Filtering Using Clustering Techniques
Loading...
Date
2009-10-01T18:05:40Z
Authors
Wasfy, Ahmed
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Large software systems are composed of various different run-time components, partner
applications and, processes. When such systems operate they are monitored so that audits can be
performed once a failure occurs or when maintenance operations are performed. However, log files
are usually sizeable, and require filtering and reduction to be processed efficiently. Furthermore, there
is no apparent correspondence of how logged events relate to particular use cases the system may be
performing. In this thesis, we have developed a framework that is based on heuristic clustering
algorithms to achieve log filtering, log reduction and, log interpretation. More specifically we define
the concept of the Event Dependency Graph, and we present event filtering and use case
identification techniques, that are based on event clustering. The clustering process groups together
all events that relate to a collection of initial significant events that relate to a use case. We refer to
these significant events as beacon events. Beacon events can be identified automatically or semiautomatically
by examining log event types or event names against event types or event names in the
corresponding specification of a use case being considered (e.g. events in sequence diagrams).
Furthermore, the user can select other or additional initial clustering conditions based on his or her
domain knowledge of the system. The clustering technique can be used in two possible ways. The
first is for large logs to be reduced or sliced, with respect to a particular use case so that, operators can
better focus their attention to specific events that relate to specific operations. The second is for the
determination of active use cases where operators select particular seed events of interest and then
examine the resulting reduced logs against events or event types stemming from different alternative
known use cases being considered, in order to identify the best match and consequently provide
insights on which of these alternative use cases may be running at any given time. The approach has
shown very promising results towards the identification of executing use cases among various
alternative ones in various runs of the Session Initiation Protocol.
Description
Keywords
log filtering, root cause analysis