Redesigning Datacenter Systems to Leverage Hardware-Acceleration

dc.contributor.authorUdayashankar, Sreeharsha
dc.date.accessioned2026-05-27T18:36:00Z
dc.date.available2026-05-27T18:36:00Z
dc.date.issued2026-05-27
dc.date.submitted2026-05-22
dc.description.abstractThe exponential growth of digital data generation imposes severe performance and efficiency demands on modern datacenter infrastructure, creating unique interrelated challenges. Datacenter infrastructure must offer high data storage capacities, achieve high throughput, and support modern workloads that require low-latency data processing. While hardware accelerators, such as CPUs supporting SIMD vector instruction sets and network switches supporting P4-based programmability, have the potential to help achieve these requirements, their adoption in large-scale systems is hindered by restrictive programming models and resource constraints. This thesis addresses these challenges by redesigning deduplicated storage systems and cluster schedulers to leverage hardware acceleration effectively. It enables high-throughput data reduction in deduplicated storage systems (Chapter 3, Chapter 4, and Chapter 5) by using two approaches: redesigning them to use the SIMD capabilities of modern CPUs and by reducing the computation needed to achieve data reduction. It enables low-latency data processing by leveraging in-network acceleration for cluster scheduling (Chapter 6). The thesis first presents VectorCDC (Chapter 3), a method for accelerating data deduplication by restructuring hashless content-defined chunking (CDC) algorithms to exploit vector instructions. By identifying and optimizing the common processing patterns they use, Extreme Byte Searches and Range Scans, VectorCDC significantly improves their chunking throughputs. VRAM, the fastest VectorCDC-accelerated algorithm achieves throughput improvements of 8.35×–26.2× over existing vector-accelerated techniques and up to 207.2× over unaccelerated baselines. Importantly, VectorCDC maintains its throughput advantages across x86, ARM, and IBM CPU architectures. While generally competitive with their hash-based counterparts, these hashless CDC algorithms achieve lower deduplication efficiency on datasets with specific pathological patterns. To address this, this thesis presents WideCDC (Chapter 4). WideCDC improves the deduplication efficiency of hashless CDC algorithms by basing chunk boundary decisions on wide regions of multiple bytes, instead of singular byte values. To achieve high throughput, WideCDC uses vector-compatible Accumulated Extreme Byte Searches and Gated Range Scans. WideCDC improves deduplication efficiency on pathological datasets by 2.95× and further improves throughput by 2.04× over VectorCDC. Additionally, to address the throughput degradation of CDC algorithms at the large chunk sizes favored by production systems, this thesis presents SeqCDC (Chapter 5). SeqCDC is a novel chunking algorithm that uses a novel lightweight boundary detection mechanism, content-defined data skipping, and a vector instruction-focused design. SeqCDC improves chunking throughput by 10× over unaccelerated algorithms and 25–30% over the fastest vector-accelerated alternatives, while minimally affecting deduplication efficiency. Finally, this thesis proposes Draconis (Chapter 6), a network-accelerated scheduler built using P4 programmable switches, designed to support microsecond-scale workloads. Draconis forgoes the inefficient design adopted by prior switch-based schedulers by implementing a switch-compatible task queue with delayed pointer correction, eliminating the latency penalties caused by node-level blocking. Evaluation results demonstrate that Draconis reduces the 99th percentile scheduling delay by 3×–200× over state-of-the-art network-accelerated solutions, and increases scheduling throughput by 52×–116× over state-of-the-art server-based solutions.
dc.identifier.urihttps://hdl.handle.net/10012/23417
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectdistributed systems
dc.subjectnetworking
dc.subjectvector instructions
dc.subjectdata deduplication
dc.subjectcontent defined chunking
dc.subjectprogrammable switches
dc.subjectAVX
dc.titleRedesigning Datacenter Systems to Leverage Hardware-Acceleration
dc.typeDoctoral Thesis
uws-etd.degreeDoctor of Philosophy
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.comment.hiddenI have made all the requested changes except those to the Statement of Contributions. The "Summary of other projects conducted during my PhD" section was specifically requested for and approved by my advisor to position the contributions mentioned in my thesis and weight it against my other contributions. Thus, it is directly related to the thesis. Please let me know if any more changes are required. Thanks!
uws.contributor.advisorAl-Kiswany, Samer
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Udayashankar_Sreeharsha.pdf
Size:
18.4 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections