Sampling-based Predictive Database Buffer Management
Loading...
Date
2023-09-25
Authors
Vanderkooy, Theodore
Advisor
Daudjee, Khuzaima
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
This thesis presents a database buffer caching policy that uses information about long-
running scans to estimate future accesses. These estimates are used to approximate the
optimal caching policy, which requires knowledge about future accesses. The buffer caching
policy must be efficient with low CPU overhead, which is achieved with sampling: buffer
eviction considers only a small random sample of buffers and access time estimates are
used to select among the sample. This design is easily tuned by adjusting the sample size,
and easily modified to improve the access time estimates and expand the set of workload
types that can be predicted effectively.
This approach is implemented in PostgreSQL and evaluated on a series of experiments
based on TPC-H. Based on the experiments, this approach works very well for workloads
with mainly sequential scans, reducing I/O volume by up to 38% over PostgreSQL’s Clock-
sweep implementation, and is competitive with standard approaches for workloads using a
mix of sequential scans and index accesses.
Description
Keywords
database, caching, buffer management