Show simple item record

dc.contributor.authorRizvi, Syed Muhammad Sajjad
dc.date.accessioned2021-01-08 21:13:09 (GMT)
dc.date.available2021-01-08 21:13:09 (GMT)
dc.date.issued2021-01-08
dc.date.submitted2021-01-07
dc.identifier.urihttp://hdl.handle.net/10012/16626
dc.description.abstractBig data applications put significant latency and throughput demands on distributed storage systems. Meeting these demands requires storage systems to use a significant amount of infrastructure resources, such as network capacity and storage devices. Resource demands largely depend on the workloads and can vary significantly over time. Moreover, demand hotspots can move rapidly between different infrastructure locations. Existing storage systems are largely infrastructure-oblivious as they are designed to support a broad range of hardware and deployment scenarios. Most only use basic configuration information about the infrastructure to make important placement and routing decisions. In the case of cloud-based storage systems, cloud services have their own infrastructure-specific limitations, such as minimum request sizes and maximum number of concurrent requests. By ignoring infrastructure-specific details, these storage systems are unable to react to resource demand changes and may have additional inefficiencies from performing redundant network operations. As a result, provisioning enough resources for these systems to address all possible workloads and scenarios would be cost prohibitive. This thesis studies the performance problems in commonly used distributed storage systems and introduces novel infrastructure-aware design methods to improve their performance. First, it addresses the problem of slow reads due to network congestion that is induced by disjoint replica and path selection. Selecting a read replica separately from the network path can perform poorly if all paths to the pre-selected endpoints are congested. Second, this thesis looks at scalability limitations of consensus protocols that are commonly used in geo-distributed key value stores and distributed ledgers. Due to their network-oblivious designs, existing protocols redundantly communicate over highly oversubscribed WAN links, which poorly utilize network resources and limits consistent replication at large scale. Finally, this thesis addresses the need for a cloud-specific realtime storage system for capital market use cases. Public cloud infrastructures provide feature-rich and cost-effective storage services. However, existing realtime timeseries databases are not built to take advantage of cloud storage services. Therefore, they do not effectively utilize cloud services to provide high performance while minimizing deployment cost. This thesis presents three systems that address these problems by using infrastructure-aware design methods. Our performance evaluation of these systems shows that infrastructure-aware design is highly effective in improving the performance of large scale distributed storage systems.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectinfrastructure-awareen
dc.subjectdistributed storage systemsen
dc.subjectdistributed consensus protocolsen
dc.subject.lcshDatabase managementen
dc.subject.lcshDistributed databasesen
dc.subject.lcshBig dataen
dc.titleOn the Importance of Infrastructure-Awareness in Large-Scale Distributed Storage Systemsen
dc.typeDoctoral Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws-etd.embargo.terms0en
uws.contributor.advisorWong, Bernard
uws.contributor.advisorKeshav, Srinivasan
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages