Show simple item record

dc.contributor.authorAlway, Kaleb 18:16:18 (GMT) 18:16:18 (GMT)
dc.description.abstractJoin queries are a fundamental computational task in relational database management systems. For decades, complex joins were most often computed by decomposing the query into a query plan made of a sequence of binary joins. However, for cyclic queries, this type of query plan is sub-optimal. The worst-case run time of any such query plan exceeds the number of output tuples for any query instance. Recent theoretical developments in join query processing have led to join algorithms which are worst-case optimal, meaning that they run in time proportional to the worst-case output size for any query with the same shape and the same number of input tuples. Building on these results are a class of algorithms providing bounds which go beyond this worst-case output size by exploiting the structure of the input instance rather than just the query shape. One such algorithm, Tetris, is worst-case optimal and also provides an upper bound on its run time which depends on the minimum size of a geometric box certificate for the input query. A box certificate is a subset of a box cover whose union covers every tuple which is not present in the query output. A box cover is a set of n-dimensional boxes which cover all of the tuples not contained in the input relations. Many query instances admit different box certificates and box covers when the values in the attributes' domains are ordered differently. If we permute the input query according to a domain ordering which admits a smaller box certificate, use the permuted query as input to Tetris, then transform the result back with the inverse domain ordering, we can compute the query faster than was possible if the domain ordering was fixed. If we can efficiently compute an optimal domain ordering for a query, then we can state a beyond worst-case bound that is stronger than what is provided by Tetris. This paper defines several optimization problems over the space of domain orderings where the objective is to minimize the size of either the minimum box certificate or the minimum box cover for the given input query. We show that most of these problems are NP-hard. We also provide approximation algorithms for several of these problems. The most general version of the box cover minimization problem we will study, BoxMinPDomF, is shown to be NP-hard, but we can compute an approximation only a poly-logarithmic factor larger than K^(a*r), where K is the minimum box cover size under any domain ordering and r is the maximum number of attributes in a relation. This result allows us to compute join queries in time N+K^(a*r*(w+1))+Z, times a poly-logarithmic factor in N, where N is the number of input tuples, w is the treewidth of the query, and Z is the number of output tuples. This is a new beyond worst-case bound. There are queries for which this bound is exponentially smaller than any bound provided by Tetris. The most general version of the box certificate minimization problem we study, CertMinPDomF, is also shown to be NP-hard. It can be computed exactly if the minimum box certificate size is at most 3, but no approximation algorithm for an arbitrary minimum size is known. Finding such an approximation algorithm is an important direction for future research.en
dc.publisherUniversity of Waterlooen
dc.subjectdata systemsen
dc.subjectjoin queriesen
dc.subjectbox coversen
dc.titleDomain Ordering and Box Cover Problems for Beyond Worst-Case Join Processingen
dc.typeMaster Thesisen
dc.pendingfalse R. Cheriton School of Computer Scienceen Scienceen of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws.contributor.advisorSalihoglu, Semih
uws.contributor.advisorBlais, Eric
uws.contributor.affiliation1Faculty of Mathematicsen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages