Entity Matching and Disambiguation Across Multiple Knowledge Graphs

Farag, Michael

Entity Matching and Disambiguation Across Multiple Knowledge Graphs

Files

Farag_Michael.pdf (4.06 MB)

Date

2019-06-10

Authors

Farag, Michael

Advisor

Ilyas, Ihab

Publisher

University of Waterloo

Abstract

Knowledge graphs are considered an important representation that lie between free text on one hand and fully-structured relational data on the other. Knowledge graphs are a back-bone of many applications on the Web. With the rise of many large-scale open-domain knowledge graphs like Freebase, DBpedia, and Yago, various applications including document retrieval, question answering, and data integration have been relying on them. In this thesis, We are primarily interested in knowledge graphs from the perspective of integrating disparate heterogeneous sources, with an eye towards applications such as document retrieval and question answering. Integrating different knowledge graphs is very important for enriching the knowledge shared among them. The core part of this integration process is matching entities across the knowledge graphs. The biggest challenge to entity matching is the ambiguity. The obvious solution is to make use of the graph structure and entity neighbourhoods for matching and disambiguating entities. We formalize the entity matching problem and present the rst large-scale dataset, Ambiguous DBpedia-Wikidata, for this task based on exiting cross-ontology links between DBpedia and Wikidata, focused on several hundred thousand ambiguous entities. We propose an entity matching framework that is capable of disambiguating entities across different knowledge graphs. The framework consists of fuzzy string matcher and graph embedding-based matcher. Using a classifi cation-based approach, we find that a simple multi-layered perceptron based on representations derived from RDF2VEC graph embeddings of entities in each knowledge graph is sufficient to achieve high accuracy, with only limited training data. The contribution of our work is both a large dataset for examining this problem and strong baselines on which future work can be based. We also present SimpleDBpediaQA, a new benchmark dataset for simple question answering over knowledge graphs that was created by mapping SimpleQuestions entities and predicates from Freebase to DBpedia. We show how entity matching using manual annotations can be used for migrating datasets across knowledge graphs. Although this mapping is conceptually straightforward, there are a number of nuances that make the task non-trivial, owing to the different conceptual organizations of the two knowledge graphs. Finally, if manual annotations are scarce, we show how our entity matching framework can be used to generate free annotations to train our model and then use it for disambiguation. In that essence, we introduce SimpleQuestions++, a new question answering benchmark that have all questions linked to Freebase, DBpedia, and Wikidata.

URI

http://hdl.handle.net/10012/14750

Collections

Theses
Computer Science

Full item page

Entity Matching and Disambiguation Across Multiple Knowledge Graphs

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections