Show simple item record

dc.contributor.authorAnzum, Nafisa
dc.date.accessioned2020-09-28 13:46:20 (GMT)
dc.date.available2020-09-28 13:46:20 (GMT)
dc.date.issued2020-09-28
dc.date.submitted2020-09-23
dc.identifier.urihttp://hdl.handle.net/10012/16380
dc.description.abstractConnections amongst real-world entities provide significant insights for numerous real-life applications in social networks, semantic web, road maps, finance, among others. Graphs are perhaps the most natural way to model such connections in application data. However, in many enterprises, an application data is still primarily stored in an RDBMS in a tabular format and users extract graphs out of an RDBMS and store them in specialized graph processing systems. As a result, many users face two major challenges before conducting any graph analysis. First, extracting graphs from an RDBMS requires building an ETL pipeline, which can require a significant amount of time. Second, keeping the extracted graph in the graph processing system, such as a graph database management system (GDBMS), in sync with the original data in the RDBMS requires developing additional non-trivial synchronization code. In this thesis, we study and address these two challenges and present two software systems, GraphWrangler and R2GSync, that we have developed to solve these challenges. GraphWrangler is an interactive system that streamlines the ETL pipeline. Users connect to an RDBMS using GraphWrangler and with several simple interactions, such as dragging and dropping of rows and columns and drawing edges on the screen, they describe table-to-graph mappings. This way, users can describe the graphs they would like to extract without writing any custom scripts. In addition, GraphWrangler allows user to immediately visualize their tables in the form of a graph. Our second system, R2GSync, uses the mappings of an extracted graph and maintains a consistent, i.e., in sync, copy of this graph in a GDBMS as updates happen to the original RDBMS from which the graph was extracted. Querying the extracted graph inside the GDBMS requires a new querying functionality inside the GDBMS that we call edge views. We describe our implementation of edge views and several optimizations to make queries that contain edge views more efficient.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectgraph databaseen
dc.subjectgraph transformationen
dc.subjecttabular dataen
dc.subjectrelational database management systemsen
dc.subjectedge viewsen
dc.subjectdata transformationen
dc.titleSystems for Graph Extraction from Tabular Dataen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws.contributor.advisorSalihoglu, Semih
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages