Integrating Column-Oriented Storage and Query Processing Techniques into Graph Database Management Systems

dc.contributor.authorGupta, Pranjal
dc.date.accessioned2020-08-14T16:46:31Z
dc.date.available2020-08-14T16:46:31Z
dc.date.issued2020-08-14
dc.date.submitted2020-08-10
dc.description.abstractColumn-oriented RDBMSs, which support traditional read-heavy analytics workloads, employ a specific set of storage and query processing techniques for scalability and performance, such as positional tuple IDs, column-specific compression, and block-oriented processing. We revisit these techniques in the context of contemporary graph database management systems (GDBMSs). GDBMSs support a new set of analytics workloads, such as fraud detection in financial transaction networks or recommendations in social networks, that are also read-heavy but have fundamentally different access patterns than traditional analytics workloads. We first review the data characteristics and query access patterns in GDBMS to identify components of GDBMSs where existing columnar techniques can and cannot directly be used. We then present the physical data layout of columnar data structures, new columnar compression, and query-processing techniques that are optimized for GDBMSs. Our techniques include a new compact vertex and edge ID scheme, a new null and empty list compression scheme based on prefix-sums, and list-based query processing. We have integrated our techniques into GraphflowDB, an in-memory GDBMS. Compared to uncompressed storage, our compression techniques has scaled the system by 3.55x with minimal performance overheads. Our null compression scheme outperforms existing columnar schemes in query performance, with minor loss in compression rate and achieves both higher compression rate and better query performance as compared to row-oriented storage techniques adopted by existing GDBMSs. Finally, our list-based query processor techniques improve query performance by 2.7x on a variety of path queries and significantly outperform their corresponding conventional versions.en
dc.identifier.urihttp://hdl.handle.net/10012/16122
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectgraph databaseen
dc.subjectgraph database management systemen
dc.subjectdatabasesen
dc.subjectrelational database management systemen
dc.subjectcolumnar storageen
dc.subjectcolumn storesen
dc.subjectcompressionen
dc.subjectquery processingen
dc.subjectlist-based query processingen
dc.subjectnull compressionen
dc.subjectadjacency listsen
dc.subjectproperty listsen
dc.subjectvertex columnsen
dc.subjectproperty pagesen
dc.titleIntegrating Column-Oriented Storage and Query Processing Techniques into Graph Database Management Systemsen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorSemih, Salihoglu
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gupta_Pranjal.pdf
Size:
1.15 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: