Automated Storage Layout for Database Systems
MetadataShow full item record
Modern storage systems are complex. Simple direct-attached storage devices are giving way to storage systems that are flexible, network-attached, consolidated and virtualized. Today, storage systems have their own administrators, who use specialized tools and expertise to configure and manage storage resources. As a result, database administrators are no longer in direct control of the design and configuration of their database systems' underlying storage resources. This introduces problems because database physical design and storage configuration are closely related tasks, and the separation makes it more difficult to achieve a good end-to-end design. For instance, the performance of a database system depends strongly on the storage layout of database objects, such as tables and indexes, and the separation makes it hard to design a storage layout that is tuned to the I/O workload generated by the database system. In this thesis we address this problem and attempt to close the information gap between database and storage tiers by addressing the problem of predicting the storage (I/O) workload that will be generated by a database management system. Specifically, we show how to translate a database workload description, together with a database physical design, into a characterization of the I/O workload that will result. Such a characterization can directly be used by a storage configuration tool and thus enables effective end-to-end design and configuration spanning both the database and storage tiers. We then introduce our storage layout optimization tool, which leverages such workload characterizations to generate an optimized layout for a given set of database objects. We formulate the layout problem as a non-linear programming (NLP) problem and use the I/O characterization as input to an NLP solver. We have incorporated our I/O estimation technique into the PostgreSQL database management system and our layout optimization technique into a database layout advisor. We present an empirical assessment of the cost of both tools as well as the efficacy and accuracy of their results.