A Novel Framework of Board-Level Failure Localization in Optical Transport Networks
Loading...
Date
2024-06-21
Authors
Jiao, Yan
Advisor
Ho, Pin-Han
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Optical transport networks (OTNs) serve as a pivotal role in Internet backbones thanks to their support for multi-tenant and multi-service environments with high reliability and low cost. A failure event may affect one or multiple boards in OTN that ignite a vast number of alarms, which significantly boosts the complexity of failure localization and alarm analysis. Accordingly, there is an urgent need for a systematic framework that harnesses the known network state and received alarms to achieve effective failure localization.
Alarm correlation has been considered as a representative approach to identifying the dependencies among alarms, aiming at eliminating as many descendent alarms as possible, thereby fulfilling failure localization with much decreased complexity. Nevertheless, existing methods of alarm correlation are subject to the following issues. Firstly, they ignore the fact that alarm propagation mostly takes place along certain connections and that the network topology and traffic distribution may solidly underpin the required alarm correlation process. Secondly, they necessitate heuristically setting initial parameters but lack a general rule that adjusts their values according to various network characters. Lastly, they are deficient in generality to versatile network environments, where the obtained result grounded in a specific network state may not be migrated to another.
Enlightened by its significance and stringent requirements, this thesis proposes a novel framework of board-level failure localization in OTN, called Failure-Alarm Correlation Tree based Failure Localization (FACT-FL). It aims to construct one or multiple FACTs that achieve both failure localization and alarm correlation, where each FACT takes a failed board and its associated alarms as the tree root and leaves, respectively. We have designed three methodologies to obtain viable FACTs. A scheme named FACT-FL-Heuristic is firstly attempted via a learned binary classifier that intelligently captures the historical correlations in the form of board → alarm and alarm → alarm, followed by heuristically creating the feasible FACT(s). To further improve FACT-FL-Heuristic’s performance, a method termed FACT-FL-Chain treats each FACT as a suite of correlation chains with different order values and generates viable FACT(s) by elegantly solving an integer linear programming (ILP) problem. Moreover, to reduce the computational complexity incurred by enumerating all chain candidates with FACT-FL-Chain, an approach dubbed FACT-FL-GNN leverages graph neural network (GNN) for evaluating the edge weights of potential FACT(s), which facilitates formulating an alternative simplified ILP to yield the most likely FACT(s). The above three methods share the same functional blocks including feature extraction, binary classifier training, and FACT formation, while each method realizes each functional block with different strategies. Extensive case studies are conducted to unveil the proposed methods’ advantage over their counterparts in terms of the metrics assessing the recognized failed boards/root alarms. We also explore their performance in volatile environmental variations such as diverse failure scenarios, network topologies, traffic distributions, and noise alarms.
Description
Keywords
optical transport network, failure localization, alarm correlation, integer linear programming