Systems for Graph Extraction from Tabular Data

Anzum, Nafisa

Systems for Graph Extraction from Tabular Data

dc.contributor.advisor	Salihoglu, Semih
dc.contributor.author	Anzum, Nafisa
dc.date.accessioned	2020-09-28T13:46:20Z
dc.date.available	2020-09-28T13:46:20Z
dc.date.issued	2020-09-28
dc.date.submitted	2020-09-23
dc.description.abstract	Connections amongst real-world entities provide significant insights for numerous real-life applications in social networks, semantic web, road maps, finance, among others. Graphs are perhaps the most natural way to model such connections in application data. However, in many enterprises, an application data is still primarily stored in an RDBMS in a tabular format and users extract graphs out of an RDBMS and store them in specialized graph processing systems. As a result, many users face two major challenges before conducting any graph analysis. First, extracting graphs from an RDBMS requires building an ETL pipeline, which can require a significant amount of time. Second, keeping the extracted graph in the graph processing system, such as a graph database management system (GDBMS), in sync with the original data in the RDBMS requires developing additional non-trivial synchronization code. In this thesis, we study and address these two challenges and present two software systems, GraphWrangler and R2GSync, that we have developed to solve these challenges. GraphWrangler is an interactive system that streamlines the ETL pipeline. Users connect to an RDBMS using GraphWrangler and with several simple interactions, such as dragging and dropping of rows and columns and drawing edges on the screen, they describe table-to-graph mappings. This way, users can describe the graphs they would like to extract without writing any custom scripts. In addition, GraphWrangler allows user to immediately visualize their tables in the form of a graph. Our second system, R2GSync, uses the mappings of an extracted graph and maintains a consistent, i.e., in sync, copy of this graph in a GDBMS as updates happen to the original RDBMS from which the graph was extracted. Querying the extracted graph inside the GDBMS requires a new querying functionality inside the GDBMS that we call edge views. We describe our implementation of edge views and several optimizations to make queries that contain edge views more efficient.	en
dc.identifier.uri	http://hdl.handle.net/10012/16380
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	graph database	en
dc.subject	graph transformation	en
dc.subject	tabular data	en
dc.subject	relational database management systems	en
dc.subject	edge views	en
dc.subject	data transformation	en
dc.title	Systems for Graph Extraction from Tabular Data	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Salihoglu, Semih
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Anzum_Nafisa.pdf
Size:: 2.11 MB
Format:: Adobe Portable Document Format
Description:: MMath Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science