Domain Ordering and Box Cover Problems for Beyond Worst-Case Join Processing

Loading...
Thumbnail Image

Date

2019-09-17

Authors

Alway, Kaleb

Advisor

Salihoglu, Semih
Blais, Eric

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Join queries are a fundamental computational task in relational database management systems. For decades, complex joins were most often computed by decomposing the query into a query plan made of a sequence of binary joins. However, for cyclic queries, this type of query plan is sub-optimal. The worst-case run time of any such query plan exceeds the number of output tuples for any query instance. Recent theoretical developments in join query processing have led to join algorithms which are worst-case optimal, meaning that they run in time proportional to the worst-case output size for any query with the same shape and the same number of input tuples. Building on these results are a class of algorithms providing bounds which go beyond this worst-case output size by exploiting the structure of the input instance rather than just the query shape. One such algorithm, Tetris, is worst-case optimal and also provides an upper bound on its run time which depends on the minimum size of a geometric box certificate for the input query. A box certificate is a subset of a box cover whose union covers every tuple which is not present in the query output. A box cover is a set of n-dimensional boxes which cover all of the tuples not contained in the input relations. Many query instances admit different box certificates and box covers when the values in the attributes' domains are ordered differently. If we permute the input query according to a domain ordering which admits a smaller box certificate, use the permuted query as input to Tetris, then transform the result back with the inverse domain ordering, we can compute the query faster than was possible if the domain ordering was fixed. If we can efficiently compute an optimal domain ordering for a query, then we can state a beyond worst-case bound that is stronger than what is provided by Tetris. This paper defines several optimization problems over the space of domain orderings where the objective is to minimize the size of either the minimum box certificate or the minimum box cover for the given input query. We show that most of these problems are NP-hard. We also provide approximation algorithms for several of these problems. The most general version of the box cover minimization problem we will study, BoxMinPDomF, is shown to be NP-hard, but we can compute an approximation only a poly-logarithmic factor larger than K^(a*r), where K is the minimum box cover size under any domain ordering and r is the maximum number of attributes in a relation. This result allows us to compute join queries in time N+K^(a*r*(w+1))+Z, times a poly-logarithmic factor in N, where N is the number of input tuples, w is the treewidth of the query, and Z is the number of output tuples. This is a new beyond worst-case bound. There are queries for which this bound is exponentially smaller than any bound provided by Tetris. The most general version of the box certificate minimization problem we study, CertMinPDomF, is also shown to be NP-hard. It can be computed exactly if the minimum box certificate size is at most 3, but no approximation algorithm for an arbitrary minimum size is known. Finding such an approximation algorithm is an important direction for future research.

Description

Keywords

data systems, databases, algorithms, join queries, box covers

LC Keywords

Citation