Log Event Filtering Using Clustering Techniques
Large software systems are composed of various different run-time components, partner applications and, processes. When such systems operate they are monitored so that audits can be performed once a failure occurs or when maintenance operations are performed. However, log files are usually sizeable, and require filtering and reduction to be processed efficiently. Furthermore, there is no apparent correspondence of how logged events relate to particular use cases the system may be performing. In this thesis, we have developed a framework that is based on heuristic clustering algorithms to achieve log filtering, log reduction and, log interpretation. More specifically we define the concept of the Event Dependency Graph, and we present event filtering and use case identification techniques, that are based on event clustering. The clustering process groups together all events that relate to a collection of initial significant events that relate to a use case. We refer to these significant events as beacon events. Beacon events can be identified automatically or semiautomatically by examining log event types or event names against event types or event names in the corresponding specification of a use case being considered (e.g. events in sequence diagrams). Furthermore, the user can select other or additional initial clustering conditions based on his or her domain knowledge of the system. The clustering technique can be used in two possible ways. The first is for large logs to be reduced or sliced, with respect to a particular use case so that, operators can better focus their attention to specific events that relate to specific operations. The second is for the determination of active use cases where operators select particular seed events of interest and then examine the resulting reduced logs against events or event types stemming from different alternative known use cases being considered, in order to identify the best match and consequently provide insights on which of these alternative use cases may be running at any given time. The approach has shown very promising results towards the identification of executing use cases among various alternative ones in various runs of the Session Initiation Protocol.