Query Evaluation in the Presence of Fine-grained Access Control
Access controls are mechanisms to enhance security by protecting data from unauthorized accesses. In contrast to traditional access controls that grant access rights at the granularity of the whole tables or views, fine-grained access controls specify access controls at finer granularity, e.g., individual nodes in XML databases and individual tuples in relational databases. While there is a voluminous literature on specifying and modeling fine-grained access controls, less work has been done to address the performance issues of database systems with fine-grained access controls. This thesis addresses the performance issues of fine-grained access controls and proposes corresponding solutions. In particular, the following issues are addressed: effective storage of massive access controls, efficient query planning for secure query evaluation, and accurate cardinality estimation for access controlled data. Because fine-grained access controls specify access rights from each user to each piece of data in the system, they are effectively a massive matrix of the size as the product of the number of users and the size of data. Therefore, fine-grained access controls require a very compact encoding to be feasible. The proposed storage system in this thesis achieves an unprecedented level of compactness by leveraging the high correlation of access controls found in real system data. This correlation comes from two sides: the structural similarity of access rights between data, and the similarity of access patterns from different users. This encoding can be embedded into a linearized representation of XML data such that a query evaluation framework is able to compute the answer to the access controlled query with minimal disk I/O to the access controls. Query optimization is a crucial component for database systems. This thesis proposes an intelligent query plan caching mechanism that has lower amortized cost for query planning in the presence of fine-grained access controls. The rationale behind this query plan caching mechanism is that the queries, customized by different access controls from different users, may share common upper-level join trees in their optimal query plans. Since join plan generation is an expensive step in query optimization, reusing the upper-level join trees will reduce query optimization significantly. The proposed caching mechanism is able to match efficient query plans to access controlled query plans with minimal runtime cost. In case of a query plan cache miss, the optimizer needs to optimize an access controlled query from scratch. This depends on accurate cardinality estimation on the size of the intermediate query results. This thesis proposes a novel sampling scheme that has better accuracy than traditional cardinality estimation techniques.