The Library will be performing maintenance on UWSpace on September 4th, 2024. UWSpace will be offline for all UW community members during this time.
 

An Analysis of Partial Network Partitioning Failures in Modern Distributed Systems

dc.contributor.authorAlfatafta, Mohammed
dc.date.accessioned2020-01-03T16:37:58Z
dc.date.available2021-01-04T05:50:06Z
dc.date.issued2020-01-03
dc.date.submitted2019-12-19
dc.description.abstractWe present a comprehensive study of system failures from 12 popular systems caused by a peculiar type of network partitioning faults: partial partitions. Partial partitions isolate a set of nodes from some, but not all, nodes in the cluster. Our study reveals the studied failures are catastrophic; they lead to data loss, complete system unavailability, or stale and dirty reads. Furthermore, our study reveals that these failures are easy to manifest, they are deterministic, they can be triggered by isolating a single node, and without any interaction with the system’s clients. We dissected the implemented fault tolerance techniques in eight popular systems. We identified four principled approaches for building a fault tolerance mechanism for partial partitions and identified the shortcomings of the current approaches. The currently implemented fault tolerance techniques are either specific to a particular protocol or implementation or may lead to a complete cluster shut down despite the availability of alternative network paths between the nodes. Finally, we present NIFTY, a generic communication layer that leverages the capabilities of modern software-defined networking to monitor and recover the connectivity of the cluster in case of partial network partitions. NIFTY is transparent to the application running on top of it. We built NiftyDB, a database system atop NIFTY. NiftyDB implements a set of optimizations that reduce the network overhead and operation latency in case of partial network partitioning. Our analysis and evaluation show that the proposed approach can effectively mask partial network partitioning faults without incurring additional overheads.en
dc.identifier.urihttp://hdl.handle.net/10012/15385
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectnetwork partitionsen
dc.subjectfault toleranceen
dc.titleAn Analysis of Partial Network Partitioning Failures in Modern Distributed Systemsen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorAl-Kiswany, Samer
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Alfatafta_Mohammed.pdf
Size:
471.55 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.08 KB
Format:
Item-specific license agreed upon to submission
Description: