Determining the Utility of Key-term Highlighting for High Recall Information Retrieval Systems

Wang, Xue Jun

Determining the Utility of Key-term Highlighting for High Recall Information Retrieval Systems

Files

Wang_XueJun.pdf.pdf (5.01 MB)

Date

2021-09-28

Authors

Wang, Xue Jun

Advisor

Grossman, Maura

Publisher

University of Waterloo

Abstract

High-recall information retrieval (HRIR) is an important tool used in tasks such as electronic discovery ("eDiscovery") and systematic review of medical research. Applications of HRIR often uses a human as its oracle to determine the relevance of immense numbers of documents, which is expensive in both time and money. Various methods for reducing the amount of time spent per assessment and improving the quality of assessors have been proposed to improve these systems. For this thesis, we examine the method of presenting documents where key-terms are highlighted in place of plain-text document. This is commonly accepted as a positive feature which achieves both of the previously mentioned improvements, but there is currently a lack of empirical evidence to support its effectiveness. We describe an user study in which participants are assigned to one of two variations of a HRIR system (key-term highlighting vs plain-text) with a post task questionnaire. Our results failed to show statistically significant improvement for labelling documents with key-term highlighting over plain-text for any of the measures recall, precision, and F1, but may negatively affect retention of concepts. Our study provides empirical evidence for how the use of key-term highlighting affects an assessor's abilities to label documents and provides insight into when including this feature may be harmful rather than helpful.