An Investigation of Preference Judging Consistency

Phan Minh, Linh Nhi

An Investigation of Preference Judging Consistency

Files

PhanMinh_LinhNhi.pdf (4.45 MB)

Date

2023-04-12

Authors

Phan Minh, Linh Nhi

Advisor

Smucker, Mark

Publisher

University of Waterloo

Abstract

Preference judging has been proposed as an effective method to identify the most relevant documents for a given search query. In this thesis, we investigate the degree to which assessors using a preference judging system are able to consistently find the same top documents and how consistent they are in their own preferences. We also examine to what extent variability in assessor preferences affect the evaluation of information retrieval systems. We designed and conducted a user study where 40 participants were recruited to preference judge 30 topics taken from the 2021 TREC Health Misinformation track. The research study found that the number of judgments needed to find the top-10 preferred documents using preference judging is about twice the number of documents in that topic. It also suggests that relying on just one non-professional assessor to do preference judging is not sufficient for evaluating information retrieval systems. Additionally, the study showed that preference judging to find the top-10 documents does significantly change the rankings of runs as compared to the rankings reported in the TREC 2021 Health Misinformation track, with most changes happening among the lower-ranked runs rather than the top-ranked runs. Overall, this thesis provides insights into assessor behaviour and assessor agreement when using preference judgments for evaluating information retrieval systems.

URI

http://hdl.handle.net/10012/19272

Collections

Theses
Computer Science

Full item page

An Investigation of Preference Judging Consistency

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections