Automated Analysis and Optimization of Distributed Self-Stabilizing Algorithms
MetadataShow full item record
Self-stabilization  is a versatile technique for recovery from erroneous behavior due to transient faults or wrong initialization. A system is self-stabilizing if (1) starting from an arbitrary initial state it can automatically reach a set of legitimate states in a finite number of steps and (2) it remains in legitimate states in the absence of faults. Weak-stabilization  and probabilistic-stabilization  were later introduced in the literature to deal with resource consumption of self-stabilizing algorithms and impossibility results. Since the system perturbed by fault may deviate from correct behavior for a finite amount of time, it is paramount to minimize this time as much as possible, especially in the domain of robotics and networking. This type of fault tolerance is called non-masking because the faulty behavior is not completely masked from the user . Designing correct stabilizing algorithms can be tedious. Designing such algorithms that satisfy certain average recovery time constraints (e.g., for performance guarantees) adds further complications to this process. Therefore, developing an automatic technique that takes as input the specification of the desired system, and synthesizes as output a stabilizing algorithm with minimum (or other upper bound) average recovery time is useful and challenging. In this thesis, our main focus is on designing automated techniques to optimize the average recovery time of stabilizing systems using model checking and synthesis techniques. First, we prove that synthesizing weak-stabilizing distributed programs from scratch and repairing stabilizing algorithms with average recovery time constraints are NP-complete in the state-space of the program. To cope with this complexity, we propose a polynomial-time heuristic that compared to existing stabilizing algorithms, provides lower average recovery time for many of our case studies. Second, we study the problem of fine tuning of probabilistic-stabilizing systems to improve their performance. We take advantage of the two properties of self-stabilizing algorithms to model them as absorbing discrete-time Markov chains. This will reduce the computation of average recovery time to finding the weighted sum of elements in the inverse of a matrix. Finally, we study the impact of scheduling policies on recovery time of stabilizing systems. We, in particular, propose a method to augment self-stabilizing programs with k-central and k-bounded schedulers to study dierent factors, such as geographical distance of processes and the achievable level of parallelism.