An Empirical Evaluation of the Viability of the Serverless Paradigm for Scientific Workflows
Loading...
Date
2023-12-22
Authors
Elshamy, Abdallah
Advisor
Al-Kiswany, Samer
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Scientific workflows are typically data-intensive. They consist of many stages, each of which may contain hundreds to even thousands of tasks. Traditionally, scientific workflows have been executed using the serverful computing model. Serverless computing presents an attractive alternative to the serverful computing model as it frees developers from managing and provisioning resources and offers a fine-grained billing model. In this work, we study the viability and evaluate the trade-offs of using the serverless paradigm to run scientific workflows. We follow an empirical approach to evaluate the performance and cost benefits of this paradigm and to study the suitability of the current serverless software stack to support complex data-intensive scientific workflows. Specifically, we discuss, implement, and evaluate three orchestration approaches for executing scientific workflows: serverful-centralized, serverless-centralized, and serverless-decentralized. This work is the first to implement and evaluate a purely serverless orchestration approach that does not require deploying a dedicated workflow manager for scientific workflows. Our evaluation shows that serverless orchestration approaches cause a noticeable performance overhead for some workflow patterns (e.g., reduce stages) due to accessing a large amount of remote data. We propose two optimizations (i.e., prefetching file privileges and container placement) that exploit data locality to mitigate that impact. Our evaluation with three scientific workflows—Montage, 1000Genomes, and SRA Search—shows that serverless-centralized and serverless-decentralized achieve a comparable performance to a serverful approach. Also, our results show that prefetching file privileges and container placement optimizations improve the performance by 32.6% and 44% respectively when compared to an unoptimized version for the Montage application. We also introduce a cost model to estimate which approach is cheaper for a given application and a cloud provider. Our cost analysis shows that answering this question depends on the characteristics of the workflow and the pricing of the cloud provider.
Description
Keywords
Serverless Computing, Cloud Computing, Scientific Workflows