On Improving Distributed Pregel-like Graph Processing Systems

dc.contributor.authorHan, Minyang
dc.date.accessioned2015-07-24T18:22:29Z
dc.date.embargountil2015-11-21T18:22:29Z
dc.date.issued2015-07-24
dc.date.submitted2015
dc.description.abstractThe considerable interest in distributed systems that can execute algorithms to process large graphs has led to the creation of many graph processing systems. However, existing systems suffer from two major issues: (1) poor performance due to frequent global synchronization barriers and limited scalability; and (2) lack of support for graph algorithms that require serializability, the guarantee that parallel executions of an algorithm produce the same results as some serial execution of that algorithm. Many graph processing systems use the bulk synchronous parallel (BSP) model, which allows graph algorithms to be easily implemented and reasoned about. However, BSP suffers from poor performance due to stale messages and frequent global synchronization barriers. While asynchronous models have been proposed to alleviate these overheads, existing systems that implement such models have limited scalability or retain frequent global barriers and do not always support graph mutations or algorithms with multiple computation phases. We propose barrierless asynchronous parallel (BAP), a new computation model that overcomes the limitations of existing asynchronous models by reducing both message staleness and global synchronization while retaining support for graph mutations and algorithms with multiple computation phases. We present GiraphUC, which implements our BAP model in the open source distributed graph processing system Giraph, and evaluate it at scale to demonstrate that BAP provides efficient and transparent asynchronous execution of algorithms that are programmed synchronously. Secondly, very few systems provide serializability, despite the fact that many graph algorithms require it for accuracy, correctness, or termination. To address this deficiency, we provide a complete solution that can be implemented on top of existing graph processing systems to provide serializability. Our solution formalizes the notion of serializability and the conditions under which it can be provided for graph processing systems. We propose a partition-based synchronization technique that enforces these conditions efficiently to provide serializability. We implement this technique into Giraph and GiraphUC to demonstrate that it is configurable, transparent to algorithm developers, and more performant than existing techniques.en
dc.description.embargoterms4 monthsen
dc.identifier.urihttp://hdl.handle.net/10012/9484
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterloo
dc.subjectgraph processingen
dc.subjectBSPen
dc.subjectBAPen
dc.subjectGiraphUCen
dc.subjectserializabilityen
dc.subjectpartition-based distributed lockingen
dc.subjectPregel-likeen
dc.subject.programComputer Scienceen
dc.titleOn Improving Distributed Pregel-like Graph Processing Systemsen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentComputer Science (David R. Cheriton School of)en
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Han_Minyang.pdf
Size:
1.74 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.17 KB
Format:
Item-specific license agreed upon to submission
Description: