Integrating Column-Oriented Storage and Query Processing Techniques into Graph Database Management Systems

Gupta, Pranjal

Integrating Column-Oriented Storage and Query Processing Techniques into Graph Database Management Systems

dc.contributor.advisor	Semih, Salihoglu
dc.contributor.author	Gupta, Pranjal
dc.date.accessioned	2020-08-14T16:46:31Z
dc.date.available	2020-08-14T16:46:31Z
dc.date.issued	2020-08-14
dc.date.submitted	2020-08-10
dc.description.abstract	Column-oriented RDBMSs, which support traditional read-heavy analytics workloads, employ a specific set of storage and query processing techniques for scalability and performance, such as positional tuple IDs, column-specific compression, and block-oriented processing. We revisit these techniques in the context of contemporary graph database management systems (GDBMSs). GDBMSs support a new set of analytics workloads, such as fraud detection in financial transaction networks or recommendations in social networks, that are also read-heavy but have fundamentally different access patterns than traditional analytics workloads. We first review the data characteristics and query access patterns in GDBMS to identify components of GDBMSs where existing columnar techniques can and cannot directly be used. We then present the physical data layout of columnar data structures, new columnar compression, and query-processing techniques that are optimized for GDBMSs. Our techniques include a new compact vertex and edge ID scheme, a new null and empty list compression scheme based on prefix-sums, and list-based query processing. We have integrated our techniques into GraphflowDB, an in-memory GDBMS. Compared to uncompressed storage, our compression techniques has scaled the system by 3.55x with minimal performance overheads. Our null compression scheme outperforms existing columnar schemes in query performance, with minor loss in compression rate and achieves both higher compression rate and better query performance as compared to row-oriented storage techniques adopted by existing GDBMSs. Finally, our list-based query processor techniques improve query performance by 2.7x on a variety of path queries and significantly outperform their corresponding conventional versions.	en
dc.identifier.uri	http://hdl.handle.net/10012/16122
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	graph database	en
dc.subject	graph database management system	en
dc.subject	databases	en
dc.subject	relational database management system	en
dc.subject	columnar storage	en
dc.subject	column stores	en
dc.subject	compression	en
dc.subject	query processing	en
dc.subject	list-based query processing	en
dc.subject	null compression	en
dc.subject	adjacency lists	en
dc.subject	property lists	en
dc.subject	vertex columns	en
dc.subject	property pages	en
dc.title	Integrating Column-Oriented Storage and Query Processing Techniques into Graph Database Management Systems	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Semih, Salihoglu
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Gupta_Pranjal.pdf
Size:: 1.15 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science