Test collections for web-scale datasets using Dynamic Sampling

dc.contributor.authorSingh, Anmol
dc.date.accessioned2022-01-17T17:20:10Z
dc.date.available2022-01-17T17:20:10Z
dc.date.issued2022-01-17
dc.date.submitted2022-01-06
dc.description.abstractDynamic Sampling is a non-uniform statistical sampling strategy based on S-CAL, a high-recall retrieval algorithm. It is used for the construction of statistical test collections for evaluating information retrieval systems. Dynamic Sampling has been shown to lead to comparable or better test collections compared to pooling methods, at a fraction of the assessment effort. In this work, we adapt a high-recall retrieval system to run a Dynamic Sampling protocol for web-scale datasets. We use this to create relevance assessments for 30 topics from the TREC 2019 Medical Misinformation Track. We compare our relevance assessments to qrels created using two pooling based approaches. We also compare the official NIST qrels, which were based on ClueWeb12B (7% of the full dataset), to qrels based on the full ClueWeb12 dataset. Our results suggest Dynamic Sampling yields a reasonably good test collection, with comparable or lower variance for most evaluation measures. For fixed depth measures like Precision@K, the NIST qrels based on ClueWeb12B appear to have higher bias with respect to the other qrels, suggesting that it might be better to use qrels based on the full collection when possible.en
dc.identifier.urihttp://hdl.handle.net/10012/17889
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/kshanmol/2019-med-misinfo-qrelsen
dc.subjectinformation retrieval evaluationen
dc.titleTest collections for web-scale datasets using Dynamic Samplingen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorCormack, Gordon
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Singh_Anmol.pdf
Size:
322.21 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: