Information Retrieval Evaluation Measures Based on Preference Graphs
MetadataShow full item record
Offline evaluation for web search has used mostly graded judgments to evaluate the performance of information retrieval systems. While graded judgments suffer several known problems, preference judgments simply judge one item over another, which avoids the problem of complex definition of relevance scores. Previous research about evaluation measures for preference judgments focuses on translating preferences into relevance scores applied in the traditional evaluation measures, or weighting and counting the number of agreements between actual ranking from users’ preferences and ideal ranking generated by systems. However, these measures lack clear theoretical foundations and their values have no obvious interpretation. On the other hand, although preference judgments for general web search have been studied extensively, there is limited research on investigating preference judgments application for web image search. This thesis addresses exactly these questions, which proposes a preference-based evaluation measure to compute the maximum similarity between an actual ranking from users’ preferences and an ideal ranking generated by systems. Specifically, this measure constructs a directed multigraph and computes the ordering of vertices, which we call the ideal ranking, that has maximum similarity to actual ranking calculated by the rank similarity measure. This measure is able to take any arbitrary collection of preferences that might include the property of conflicts, redundancies, incompleteness, and diverse type results (documents or images). Our results show that Greedy PGC matches or exceeds the performance of evaluation measures proposed in previous research.
Cite this version of the work
Chengxi Luo (2021). Information Retrieval Evaluation Measures Based on Preference Graphs. UWSpace. http://hdl.handle.net/10012/17178