Implementations of iterative algorithms in Hadoop and Spark

Lai, Junyu

Implementations of iterative algorithms in Hadoop and Spark

dc.contributor.author	Lai, Junyu
dc.date.accessioned	2014-07-29T14:16:10Z
dc.date.available	2014-11-27T06:30:07Z
dc.date.issued	2014-07-29
dc.date.submitted	2014
dc.description.abstract	Facing the challenges of large amounts of data generated by various companies (such as Facebook, Amazon, and Twitter), cloud computing frameworks such as Hadoop are used to store and process the Big Data. Hadoop, an open source cloud computing framework, is popular because of its scalability and fault tolerance. However, by frequently writing and reading data from the Hadoop Distributed File System (HDFS), Hadoop is quite slow in many applications. Apache Spark, a new cloud computing framework developed at AMPLab of UC Berkeley, solves this problem by caching data in memory. Spark develops a new abstraction called resilient distributed dataset (RDD) which is both scalable and fault-tolerant. In this thesis, we describe the architecture of Hadoop and Spark and discuss their differences. Properties of RDDs and how they work in Spark are discussed in detail, which gives a guide on how to use them efficiently. The main contribution of the thesis is to implement the PageRank algorithm and Conjugate Gradient (CG) method in Hadoop and Spark, and show how Spark out-performs Hadoop by taking advantage of memory caching.	en
dc.description.embargoterms	4 months	en
dc.identifier.uri	http://hdl.handle.net/10012/8586
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	Hadoop	en
dc.subject	Spark	en
dc.subject	Resilient Distributed Datasets	en
dc.subject	Conjugate Gradient method	en
dc.subject.program	Applied Mathematics	en
dc.title	Implementations of iterative algorithms in Hadoop and Spark	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	Applied Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lai_Junyu.pdf
Size:: 2.84 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.67 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Applied Mathematics