Serverless Data Analytics with Flint

Kim, Youngbin

Serverless Data Analytics with Flint

Files

Kim_Youngbin.pdf (2.43 MB)

Date

2018-08-30

Authors

Kim, Youngbin

Publisher

University of Waterloo

Abstract

Serverless architectures organized around loosely-coupled function invocations represent an emerging design for many applications. Recent work mostly focuses on user-facing products and event-driven processing pipelines. In this thesis, we explore a completely different part of the application space and examine the feasibility of analytical processing on big data using a serverless architecture. We present Flint, a prototype Spark execution engine that takes advantage of AWS Lambda to provide a pure pay-as-you-go cost model. With Flint, a developer uses PySpark exactly as before, but without needing a Spark cluster and only paying for the execution of individual Spark programs. We describe the design, implementation, and performance of Flint, along with the challenges associated with serverless analytics.

URI

http://hdl.handle.net/10012/13681

Collections

Theses
Computer Science

Full item page

Serverless Data Analytics with Flint

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By