Determining the Effectiveness of Multi-user, Hybrid, Human-Computer Assessments for High Recall Information Retrieval Systems

Alagappan, Solaiappan

Determining the Effectiveness of Multi-user, Hybrid, Human-Computer Assessments for High Recall Information Retrieval Systems

dc.contributor.advisor	Grossman, Maura
dc.contributor.author	Alagappan, Solaiappan
dc.date.accessioned	2022-08-23T18:08:12Z
dc.date.available	2022-08-23T18:08:12Z
dc.date.issued	2022-08-23
dc.date.submitted	2022-08-12
dc.description.abstract	Electronic Discovery (eDiscovery), a use-case of High-Recall Information Retrieval (HRIR), seeks to obtain substantially all and only the relevant documents responsive to a request for production in litigation. Applications of HRIR typically use a human as their oracle to determine the relevance for a large number of documents, which is expensive both in terms of time/effort and cost. HRIR experts suggest that Continuous Active Learning (CAL) systems, the state-of-the-art information retrieval (IR) tools used for eDiscovery have the potential to achieve superior results and achieving them is limited primarily by the fallibility of the accuracy of human relevance assessments. In this research, we seek to understand the impact of the error rate in human relevance feedback on CAL systems and attempt to address them using six distinct multi-user– based, hybrid, human-computer assessment strategies. In contrast to the widely used single-user-based, hybrid, human-computer assessment strategy, these multi-user strategies re-provision resources to re-reviewing documents that the user may have misjudged, rather than examining more documents, in the pursuit of mitigating human relevance feedback error, while also achieving a high-recall and high-precision review. Within the constraints of a specified review budget, we want to determine which review strategy has the best chance of precisely retrieving more relevant documents. Our results show that leveraging a multi-user review strategy that “efficiently” uses three reviewers to review documents (CAL QC–Type 1) and a multi-user review strategy that uses the CAL system as one of the users in a three-reviewer approach (CAL QC–Type 2) can enable the end-to-end CAL system to achieve a significantly higher recall and higher precision when compared to that achieved by a single-user-based review strategy while employing the same review budget. This research provides evidence that CAL systems have the potential to better accommodate the needs of the HRIR applications by incorporating multi-user review strategies.	en
dc.identifier.uri	http://hdl.handle.net/10012/18622
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.title	Determining the Effectiveness of Multi-user, Hybrid, Human-Computer Assessments for High Recall Information Retrieval Systems	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Grossman, Maura
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Alagappan_Solaiappan.pdf
Size:: 1.81 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science