Efficient Structure-aware OLAP Query Processing over Large Property Graphs

Zhang, Yan

Efficient Structure-aware OLAP Query Processing over Large Property Graphs

dc.contributor.advisor	Özsu, Tamer
dc.contributor.author	Zhang, Yan
dc.date.accessioned	2017-12-14T16:07:55Z
dc.date.available	2017-12-14T16:07:55Z
dc.date.issued	2017-12-14
dc.date.submitted	2017
dc.description.abstract	Property graph model is a semantically rich model for real-world applications that represent their data as graphs, e.g., communication networks, social networks, financial transaction networks. On-Line Analytical Processing (OLAP) provides an important tool for data analysis by allowing users to perform data aggregation through different combinations of dimensions. For example, given a Q&A forum dataset, in order to study if there is a correlation between a poster's age and his or her post quality, one may ask what is the average age of users grouped by the post score. Another example is that, in the field of music industry, it may be interesting to ask what total sales of records are with respect to different music companies and years so as to conduct a market activity analysis. Surprisingly, current graph databases do not efficiently support OLAP aggregation queries. In most cases, such queries are transformed to a sequence of join operations, and the system computes everything from scratch. For example, Neo4j, a state-of-art graph database system, processes each OLAP query in two steps. First, it expands the nodes and edges that satisfy the given query constraint. Then it performs the aggregation over all the valid substructures returned from the first step. However, in data warehousing workloads, it is common to have repeated queries from time to time. Computing everything from scratch would be highly inefficient. Materialization and view maintenance techniques developed in traditional RDBMS have proved to be efficient for processing OLAP workloads. Following the generic materialization methodology, in this thesis we develop a structure-aware cuboid caching solution to efficiently support OLAP aggregation queries over property graphs. Structure-aware means that our solution takes both heterogeneous attributes and graph topological information into consideration. The essential idea is to precompute and materialize some views based on statistics of history workload, such that future query processing can be accelerated. We implement a prototype system on top of Neo4j. Empirical studies over real-world property graphs show that, with a reasonable space cost constraint, our solution on average achieves 15-30x speedup over native Neo4j in time efficiency.	en
dc.identifier.uri	http://hdl.handle.net/10012/12724
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	OLAP	en
dc.subject	Graph Database	en
dc.subject	Property Graph	en
dc.subject	Materialized View	en
dc.title	Efficient Structure-aware OLAP Query Processing over Large Property Graphs	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Özsu, Tamer
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhang_Yan.pdf
Size:: 18.35 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.08 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science