Refresh Strategies in Continuous Active Learning

dc.contributor.authorGhelani, Nimesh
dc.date.accessioned2018-08-27T17:08:01Z
dc.date.available2018-08-27T17:08:01Z
dc.date.issued2018-08-27
dc.date.submitted2018
dc.description.abstractHigh recall information retrieval is crucial to tasks such as electronic discovery and systematic review. Continuous Active Learning (CAL) is a technique where a human assessor works in loop with a machine learning model; the model presents a set of documents likely to be relevant and the assessor provides relevance feedback. Our focus in this thesis is on one particular aspect of CAL: refreshing, which is a crucial and recurring event in the CAL process. During a refresh, the machine learning model is trained with the relevance judgments and a new list of likely-to-be-relevant documents is produced for the assessor to judge. It is also computationally the most expensive step in CAL. In this thesis, we investigate the effects of the default and alternative refresh strategies on the effectiveness and efficiency of CAL. We find that more frequent refreshes can significantly reduce the human effort required to achieve certain recall. For moderately sized datasets, the high computation cost of frequent refreshes can be reduced through a careful implementation. For dealing with resource constraints and large datasets, we propose alternative refresh strategies which provide the benefits of frequent refreshes at a lower computation cost. In this thesis, we also discuss the design of a modern implementation of the CAL algorithm which is efficient and extensible. Our implementation can be used as a research tool as well as for practical applications.en
dc.identifier.urihttp://hdl.handle.net/10012/13669
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectHigh Recall Information Retrievalen
dc.titleRefresh Strategies in Continuous Active Learningen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorSmucker, Mark
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ghelani_Nimesh.pdf
Size:
759.27 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.08 KB
Format:
Item-specific license agreed upon to submission
Description: