UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Implementations of iterative algorithms in Hadoop and Spark

dc.contributor.authorLai, Junyu
dc.date.accessioned2014-07-29T14:16:10Z
dc.date.available2014-11-27T06:30:07Z
dc.date.issued2014-07-29
dc.date.submitted2014
dc.description.abstractFacing the challenges of large amounts of data generated by various companies (such as Facebook, Amazon, and Twitter), cloud computing frameworks such as Hadoop are used to store and process the Big Data. Hadoop, an open source cloud computing framework, is popular because of its scalability and fault tolerance. However, by frequently writing and reading data from the Hadoop Distributed File System (HDFS), Hadoop is quite slow in many applications. Apache Spark, a new cloud computing framework developed at AMPLab of UC Berkeley, solves this problem by caching data in memory. Spark develops a new abstraction called resilient distributed dataset (RDD) which is both scalable and fault-tolerant. In this thesis, we describe the architecture of Hadoop and Spark and discuss their differences. Properties of RDDs and how they work in Spark are discussed in detail, which gives a guide on how to use them efficiently. The main contribution of the thesis is to implement the PageRank algorithm and Conjugate Gradient (CG) method in Hadoop and Spark, and show how Spark out-performs Hadoop by taking advantage of memory caching.en
dc.description.embargoterms4 monthsen
dc.identifier.urihttp://hdl.handle.net/10012/8586
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectHadoopen
dc.subjectSparken
dc.subjectResilient Distributed Datasetsen
dc.subjectConjugate Gradient methoden
dc.subject.programApplied Mathematicsen
dc.titleImplementations of iterative algorithms in Hadoop and Sparken
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentApplied Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lai_Junyu.pdf
Size:
2.84 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.67 KB
Format:
Item-specific license agreed upon to submission
Description: