UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

CLPush: Proactive Cache Transfers in NUMA Applications

Loading...
Thumbnail Image

Date

2023-09-26

Authors

Pathak, Gautam

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Modern Non-Uniform Memory Access (NUMA) systems support a thread count of as much as 128 threads to support high performance applications. These systems usually employ a scalable cache-coherent directory mechanism to ensure that the most up-to-date data is passed around among all the cores. It is common to use invalidate-based protocols in such systems. NUMA applications incur a lot of overhead due to data not being present in a particular socket's cache and having to fetch it from a cache in another socket. For example, in applications such as the producer-consumer problem, when threads reside in two different sockets, having to consume data from a socket different than where data is produced can be extremely expensive. This cost occurs due to coherence messages having to cross the sockets when the consumer threads require the shared data. In this thesis, I present a cache manipulation instruction, coined CLPush, which proactively transfers data across to a predetermined destination, so as to reduce cache demand misses and improve performance. The optimization is presented as an instruction hint to the processor that directs a cache to send data across to another predetermined destination. I present various variants of CLPush, which involve having one or more destinations to transfer the data to. I also discuss the potential use cases of this instruction in different applications, such as the producer-consumer problem, and Futures and Promises. I also analyse the performance of CLPush in two variants of the producer-consumer problem.

Description

Keywords

cache, coherence, multicore, NUMA, non uniform memory access

LC Keywords

Citation