Multi-agent Learning for Cooperative Scheduling of Microsecond-scale Services at Rack Scale

Hossein Abbasi Abyaneh, Ali

Multi-agent Learning for Cooperative Scheduling of Microsecond-scale Services at Rack Scale

dc.contributor.advisor	Zahedi, Seyed Majid
dc.contributor.author	Hossein Abbasi Abyaneh, Ali
dc.date.accessioned	2022-01-25T19:15:50Z
dc.date.available	2023-01-26T05:50:07Z
dc.date.issued	2022-01-25
dc.date.submitted	2022-01-19
dc.description.abstract	This work considers the load-balancing problem in dense racks running microsecond-scale services. In such a system, balancing the load among hundreds to thousands of cores requires making millions of scheduling decisions per second. Achieving this throughput while providing microsecond-scale tail latency and high availability is extremely challenging. To address this challenge, we design a fully distributed load-balancing framework. In this framework, servers cooperatively balance the load in the system. We model the interactions among servers as a cooperative stochastic game. In this game, servers make scheduling decisions upon receiving and completing tasks. When a server receives a task, it decides whether to keep the task or migrate the task to another server. Moreover, when a server completes a task, it decides if it needs to steal a task from another server. We propose a distributed multi-agent learning algorithm to find the game's parametric Nash equilibrium. Our proposed algorithm enables servers to make scheduling decisions in tens of nanoseconds based on (possibly outdated) estimates of the load on other servers. We implement and deploy our distributed load-balancing algorithm on a rack-scale computer with 264 physical cores. We compare our load balancing algorithm with state-of-the-art load balancing disciplines. Our proposed solution provides up to 20% more throughput at low tail latency than widely used load balancing policies.	en
dc.identifier.uri	http://hdl.handle.net/10012/17968
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.title	Multi-agent Learning for Cooperative Scheduling of Microsecond-scale Services at Rack Scale	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	1 year	en
uws.contributor.advisor	Zahedi, Seyed Majid
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: HosseinAbbasiAbyaneh_Ali.pdf
Size:: 818.89 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering