Determining the Effectiveness of Multi-user, Hybrid, Human-Computer Assessments for High Recall Information Retrieval Systems
Abstract
Electronic Discovery (eDiscovery), a use-case of High-Recall Information Retrieval
(HRIR), seeks to obtain substantially all and only the relevant documents responsive to a
request for production in litigation. Applications of HRIR typically use a human as their
oracle to determine the relevance for a large number of documents, which is expensive both
in terms of time/effort and cost. HRIR experts suggest that Continuous Active Learning
(CAL) systems, the state-of-the-art information retrieval (IR) tools used for eDiscovery
have the potential to achieve superior results and achieving them is limited primarily by
the fallibility of the accuracy of human relevance assessments.
In this research, we seek to understand the impact of the error rate in human relevance
feedback on CAL systems and attempt to address them using six distinct multi-user–
based, hybrid, human-computer assessment strategies. In contrast to the widely used
single-user-based, hybrid, human-computer assessment strategy, these multi-user strategies
re-provision resources to re-reviewing documents that the user may have misjudged, rather
than examining more documents, in the pursuit of mitigating human relevance feedback
error, while also achieving a high-recall and high-precision review. Within the constraints
of a specified review budget, we want to determine which review strategy has the best
chance of precisely retrieving more relevant documents.
Our results show that leveraging a multi-user review strategy that “efficiently” uses
three reviewers to review documents (CAL QC–Type 1) and a multi-user review strategy
that uses the CAL system as one of the users in a three-reviewer approach (CAL QC–Type
2) can enable the end-to-end CAL system to achieve a significantly higher recall and higher
precision when compared to that achieved by a single-user-based review strategy while
employing the same review budget. This research provides evidence that CAL systems have
the potential to better accommodate the needs of the HRIR applications by incorporating
multi-user review strategies.
Collections
Cite this version of the work
Solaiappan Alagappan
(2022).
Determining the Effectiveness of Multi-user, Hybrid, Human-Computer Assessments for High Recall Information Retrieval Systems. UWSpace.
http://hdl.handle.net/10012/18622
Other formats