Workload Matters: A Robust Approach to Physical RDF Database Design
MetadataShow full item record
Recent advances in Information Extraction, Linked Data Management and the Semantic Web have led to a rapid increase in both the volume and the variety of publicly available graph-structured data. As more and more businesses start to capitalize on graph-structured data, data management systems are being exposed to workloads that are far more diverse and dynamic than what they were designed to handle. In particular, most systems rely on a workload-oblivious physical layout with a fixed-schema and are adaptive only if the changes in the schema are minor. Thus, they are unable to perform consistently well across different types of workloads. This thesis introduces fundamental techniques for supporting diverse and dynamic workloads in RDF data management systems. Instead of assuming anything about the workload upfront, these techniques allow systems to adjust their physical designs as queries are executed. This includes changing the way (i) records are clustered in the storage system, (ii) data are organized and indexed, and (iii) queries are optimized, all at runtime. The thesis proceeds with a discussion of the challenges that have been encountered in implementing these ideas in a proof-of-concept prototype called chameleon-db, and it concludes with a thorough experimental evaluation.