An In-Depth Exploration of the High-Quality Entity Linking for Information Retrieval: MMEAD

dc.contributor.authorLin, Luyun
dc.date.accessioned2023-12-19T20:31:37Z
dc.date.available2023-12-19T20:31:37Z
dc.date.issued2023-12-19
dc.date.submitted2023-12-14
dc.description.abstractEntity linking has emerged significantly during the digital information explosion, aiming to provide context and meaning to huge amounts of unstructured data. While traditional information retrieval primarily relied on keyword-based searches, it often yielded contextually insufficient and ambiguous results. Entity linking fundamentally involves connecting distinct mentions to specific entities in a knowledge base. This process not only resolves ambiguities but also enriches the contextual understanding of the data by associating named entities with their corresponding entries in the knowledge graph. To truly harness the potential of entity linking, we must explore this concept within specific contexts and scenarios, which necessitates the availability of robust benchmark datasets that reflect the complexities of real-world information. MS MARCO serves as a key benchmark and resource for the development of deep learning models in the domain of information retrieval. It offers a distinctive chance to observe entity linking in practice and to test its effectiveness across a variety of situations. In this background, we introduce and emphasize the MS MARCO Entity Annotations and Disambiguations (MMEAD), a framework uniquely designed to bridge MS MARCO collections with state-of-the-art entity linkers. Grounded in the solid foundation of Wikipedia knowledge graphs, MMEAD prioritizes user-friendliness, precision, extensibility, and comprehensive metadata provision. Its data representation through the intuitive JSONL files ensures a seamless entity-linking experience. Addressing the challenges in information retrieval, the research utilizes MMEAD to explore expansion via entity-linked terms, fine-tuning both sparse and dense retrieval techniques. This entity expansion approach to data augmentation aims to align more closely with the real user intentions. Furthermore, this work integrates the strengths of MMEAD with powerful systems such as Faiss and DuckDB, dissolving the barriers between structured and unstructured data searches into a unified comprehensive search framework, providing enhanced data categorization, entity frequency assessments, and more, pointing toward a transformative shift in data retrieval and management systems. It also demonstrates the merging of MMEAD with Wikidata capitalizing on the strengths of open-linked data to offer a rich, synthesized view of global information.en
dc.identifier.urihttp://hdl.handle.net/10012/20184
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectInformation Retrievalen
dc.subjectEntity Linkingen
dc.subjectMSMARCOen
dc.titleAn In-Depth Exploration of the High-Quality Entity Linking for Information Retrieval: MMEADen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorLin, Jimmy
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lin_Luyun.pdf
Size:
2.52 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: