Problem Determination In Message-Flow Internet Services Based On Statistical Analysis of Event Logs
MetadataShow full item record
In a message-flow Internet service where messages travel through multiple nodes, event log analysis is one of the most important methods to identify the root causes of problems. Traditional approaches for event log analysis have been largely based on expert systems that build static dependency models on rules and patterns defined by human experts. However, the semantic complexity and the various formats of event logs make it difficult to be modeled. In addition, it is time consuming to maintain such static model for constantly evolving Internet services. Recent research has been focused on building statistical models. However, all of these models rely on the trace information provided by J2EE or .NET frameworks, which are not available to all Internet services. In this thesis, we propose a framework of problem determination based on statistical analysis of event logs. We assume a unique message ID will be logged in multiple log lines to trace the message flow in the system. A generic log adaptor is defined to extract valuable information from the log entries. We also develop an algorithm of log event clustering and log pattern clustering. Frequency analysis will be performed based on the log patterns in order to build a statistical model of the system behaviors. Once the system is modeled, we can determine problems by running a chi-square goodness of fit test using a sliding window approach. As event logs are available on all major operating systems, we believe our framework is a generic solution for problem determination in message-flow Internet services. Our solution has been validated by the log data collected from the Blackberry Internet Service (BIS) engine  , a wireless email service that serves millions of users across the world. According to the test results, our solution shows high accuracy of problem determination.