UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Multi-agent Learning for Cooperative Scheduling of Microsecond-scale Services at Rack Scale

Loading...
Thumbnail Image

Date

2022-01-25

Authors

Hossein Abbasi Abyaneh, Ali

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

This work considers the load-balancing problem in dense racks running microsecond-scale services. In such a system, balancing the load among hundreds to thousands of cores requires making millions of scheduling decisions per second. Achieving this throughput while providing microsecond-scale tail latency and high availability is extremely challenging. To address this challenge, we design a fully distributed load-balancing framework. In this framework, servers cooperatively balance the load in the system. We model the interactions among servers as a cooperative stochastic game. In this game, servers make scheduling decisions upon receiving and completing tasks. When a server receives a task, it decides whether to keep the task or migrate the task to another server. Moreover, when a server completes a task, it decides if it needs to steal a task from another server. We propose a distributed multi-agent learning algorithm to find the game's parametric Nash equilibrium. Our proposed algorithm enables servers to make scheduling decisions in tens of nanoseconds based on (possibly outdated) estimates of the load on other servers. We implement and deploy our distributed load-balancing algorithm on a rack-scale computer with 264 physical cores. We compare our load balancing algorithm with state-of-the-art load balancing disciplines. Our proposed solution provides up to 20% more throughput at low tail latency than widely used load balancing policies.

Description

Keywords

LC Keywords

Citation