Show simple item record

dc.contributor.authorLai, Junyu
dc.date.accessioned2014-07-29 14:16:10 (GMT)
dc.date.available2014-11-27 06:30:07 (GMT)
dc.date.issued2014-07-29
dc.date.submitted2014
dc.identifier.urihttp://hdl.handle.net/10012/8586
dc.description.abstractFacing the challenges of large amounts of data generated by various companies (such as Facebook, Amazon, and Twitter), cloud computing frameworks such as Hadoop are used to store and process the Big Data. Hadoop, an open source cloud computing framework, is popular because of its scalability and fault tolerance. However, by frequently writing and reading data from the Hadoop Distributed File System (HDFS), Hadoop is quite slow in many applications. Apache Spark, a new cloud computing framework developed at AMPLab of UC Berkeley, solves this problem by caching data in memory. Spark develops a new abstraction called resilient distributed dataset (RDD) which is both scalable and fault-tolerant. In this thesis, we describe the architecture of Hadoop and Spark and discuss their differences. Properties of RDDs and how they work in Spark are discussed in detail, which gives a guide on how to use them efficiently. The main contribution of the thesis is to implement the PageRank algorithm and Conjugate Gradient (CG) method in Hadoop and Spark, and show how Spark out-performs Hadoop by taking advantage of memory caching.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectHadoopen
dc.subjectSparken
dc.subjectResilient Distributed Datasetsen
dc.subjectConjugate Gradient methoden
dc.titleImplementations of iterative algorithms in Hadoop and Sparken
dc.typeMaster Thesisen
dc.pendingfalse
dc.subject.programApplied Mathematicsen
dc.description.embargoterms4 monthsen
uws-etd.degree.departmentApplied Mathematicsen
uws-etd.degreeMaster of Mathematicsen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages