Similarity and Diversity in Information Retrieval

Akinyemi, John

Similarity and Diversity in Information Retrieval

Files

Akinyemi_John.pdf (2.13 MB)

Date

2012-04-30T20:08:43Z

Authors

Akinyemi, John

Publisher

University of Waterloo

Abstract

Inter-document similarity is used for clustering, classification, and other purposes within information retrieval. In this thesis, we investigate several aspects of document similarity. In particular, we investigate the quality of several measures of inter-document similarity, providing a framework suitable for measuring and comparing the effectiveness of inter-document similarity measures. We also explore areas of research related to novelty and diversity in information retrieval. The goal of diversity and novelty is to be able to satisfy as many users as possible while simultaneously minimizing or eliminating duplicate and redundant information from search results. In order to evaluate the effectiveness of diversity-aware retrieval functions, user query logs and other information captured from user interactions with commercial search engines are mined and analyzed in order to uncover various informational aspects underlying queries, which are known as subtopics. We investigate the suitability of implicit associations between document content as an alternative to subtopic mining. We also explore subtopic mining from document anchor text and anchor links. In addition, we investigate the suitability of inter-document similarity as a measure for diversity-aware retrieval models, with the aim of using measured inter-document similarity as a replacement for diversity-aware evaluation models that rely on subtopic mining. Finally, we investigate the suitability and application of document similarity for requirements traceability. We present a fast algorithm that uncovers associations between various versions of frequently edited documents, even in the face of substantial changes.

Keywords

URI

http://hdl.handle.net/10012/6687

Collections

Theses
Computer Science

Full item page

Similarity and Diversity in Information Retrieval

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections