Fault Detection and Identification in Computer Networks: A soft Computing Approach

Mohamed, Abduljalil

Fault Detection and Identification in Computer Networks: A soft Computing Approach

Files

NFMS_PhD_Thesis_Jan7.pdf (1.19 MB)

Date

2010-01-07T21:19:55Z

Authors

Mohamed, Abduljalil

Publisher

University of Waterloo

Abstract

Governmental and private institutions rely heavily on reliable computer networks for their everyday business transactions. The downtime of their infrastructure networks may result in millions of dollars in cost. Fault management systems are used to keep today’s complex networks running without significant downtime cost, either by using active techniques or passive techniques. Active techniques impose excessive management traffic, whereas passive techniques often ignore uncertainty inherent in network alarms,leading to unreliable fault identification performance. In this research work, new algorithms are proposed for both types of techniques so as address these handicaps. Active techniques use probing technology so that the managed network can be tested periodically and suspected malfunctioning nodes can be effectively identified and isolated. However, the diagnosing probes introduce extra management traffic and storage space. To address this issue, two new CSP (Constraint Satisfaction Problem)-based algorithms are proposed to minimize management traffic, while effectively maintain the same diagnostic power of the available probes. The first algorithm is based on the standard CSP formulation which aims at reducing the available dependency matrix significantly as means to reducing the number of probes. The obtained probe set is used for fault detection and fault identification. The second algorithm is a fuzzy CSP-based algorithm. This proposed algorithm is adaptive algorithm in the sense that an initial reduced fault detection probe set is utilized to determine the minimum set of probes used for fault identification. Based on the extensive experiments conducted in this research both algorithms have demonstrated advantages over existing methods in terms of the overall management traffic needed to successfully monitor the targeted network system. Passive techniques employ alarms emitted by network entities. However, the fault evidence provided by these alarms can be ambiguous, inconsistent, incomplete, and random. To address these limitations, alarms are correlated using a distributed Dempster-Shafer Evidence Theory (DSET) framework, in which the managed network is divided into a cluster of disjoint management domains. Each domain is assigned an Intelligent Agent for collecting and analyzing the alarms generated within that domain. These agents are coordinated by a single higher level entity, i.e., an agent manager that combines the partial views of these agents into a global one. Each agent employs DSET-based algorithm that utilizes the probabilistic knowledge encoded in the available fault propagation model to construct a local composite alarm. The Dempster‘s rule of combination is then used by the agent manager to correlate these local composite alarms. Furthermore, an adaptive fuzzy DSET-based algorithm is proposed to utilize the fuzzy information provided by the observed cluster of alarms so as to accurately identify the malfunctioning network entities. In this way, inconsistency among the alarms is removed by weighing each received alarm against the others, while randomness and ambiguity of the fault evidence are addressed within soft computing framework. The effectiveness of this framework has been investigated based on extensive experiments. The proposed fault management system is able to detect malfunctioning behavior in the managed network with considerably less management traffic. Moreover, it effectively manages the uncertainty property intrinsically contained in network alarms,thereby reducing its negative impact and significantly improving the overall performance of the fault management system.