Query Answering over Functional Dependency Repairs

Loading...
Thumbnail Image

Date

2013-09-23T15:48:49Z

Authors

Galiullin, Artur

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Inconsistency often arises in real-world databases and, as a result, critical queries over dirty data may lead users to make ill-informed decisions. Functional dependencies (FDs) can be used to specify intended semantics of the underlying data and aid with the cleaning task. Enumerating and evaluating all the possible repairs to FD violations is infeasible, while approaches that produce a single repair or attempt to isolate the dirty portion of data are often too destructive or constraining. In this thesis, we leverage a recent advance in data cleaning that allows sampling from a well-defined space of reasonable repairs, and provide the user with a data management tool that gives uncertain query answers over this space. We propose a framework to compute probabilistic query answers as though each repair sample were a possible world. We show experimentally that queries over many possible repairs produce results that are more useful than other approaches and that our system can scale to large datasets.

Description

Keywords

Data Cleaning, Probabilistic Databases

LC Keywords

Citation