Network-accelerated Scheduling for Large Clusters

Kettaneh, Ibrahim

Network-accelerated Scheduling for Large Clusters

Files

Kettaneh_Ibrahim.pdf (1.99 MB)

Date

2020-05-04

Authors

Kettaneh, Ibrahim

Advisor

Samer, Al-Kiswany

Publisher

University of Waterloo

Abstract

We explore a novel design approach for accelerating schedulers for large scale clusters. Our approach follows a centralized design and leverages the programmability of recent programmable switches to accelerating scheduling operations. We demonstrate the feasibility and benefits of this approach by building two schedulers: one for accelerating data analytics scheduling and one for accelerating scheduling in key-value stores. First, we present a scheduler designed for low-latency data analytics workloads. The proposed scheduler receives job description, maintains a task queue in the switch memory, and schedules tasks on the next available worker at line-rate. The core of this design is a novel pipeline-based scheduling logic that can schedule tasks at line-rate. Our prototype evaluation on a cluster with a Barefoot Tofino switch shows that the proposed approach can reduce scheduling overhead by an order of magnitude compared to state-of-the-art schedulers. Second, we present a network-accelerated scheduler for linearizable key-value stores. The proposed design exploits programmable switches to keep track of write requests and responses, and to identify where the latest version of each object is stored. Our prototype evaluation shows that the proposed design achieves up to 42% higher throughput, and 35-97% lower latency than the current state-of-the art approaches.