Adaptive Data Storage and Placement in Distributed Database Systems

Abebe, Michael

Adaptive Data Storage and Placement in Distributed Database Systems

Files

Abebe_Michael.pdf (11.39 MB)

Date

2022-09-01

Authors

Abebe, Michael

Publisher

University of Waterloo

Abstract

Distributed database systems are widely used to provide scalable storage, update and query facilities for application data. Distributed databases primarily use data replication and data partitioning to spread load across nodes or sites. The presence of hotspots in workloads, however, can result in imbalanced load on the distributed system resulting in performance degradation. Moreover, updates to partitioned and replicated data can require expensive distributed coordination to ensure that they are applied atomically and consistently. Additionally, data storage formats, such as row and columnar layouts, can significantly impact latencies of mixed transactional and analytical workloads. Consequently, how and where data is stored among the sites in a distributed database can significantly affect system performance, particularly if the workload is not known ahead of time. To address these concerns, this thesis proposes adaptive data placement and storage techniques for distributed database systems. This thesis demonstrates that the performance of distributed database systems can be improved by automatically adapting how and where data is stored by leveraging online workload information. A two-tiered architecture for adaptive distributed database systems is proposed that includes an adaptation advisor that decides at which site(s) and how transactions execute. The adaptation advisor makes these decisions based on submitted transactions. This design is used in three adaptive distributed database systems presented in this thesis: (i) DynaMast that efficiently transfers data mastership to guarantee single-site transactions while maintaining well-understood and established transactional semantics, (ii) MorphoSys that selectively and adaptively replicates, partitions and remasters data based on a learned cost model to improve transaction processing, and (iii) Proteus that uses learned workload models to predictively and adaptively change storage layouts to support both high transactional throughput and low latency analytical queries. Collectively, this thesis is a concrete step towards autonomous database systems that allow users to specify only the data to store and the queries to execute, leaving the system to judiciously choose the storage and execution mechanisms to deliver high performance.

Keywords

distributed databases, data storage, data placement, adaptive databases

URI

http://hdl.handle.net/10012/18704

Collections

Theses
Computer Science

Full item page

Adaptive Data Storage and Placement in Distributed Database Systems

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By