Show simple item record

dc.contributor.authorOgundepo, Odunayo
dc.date.accessioned2023-04-28 17:47:36 (GMT)
dc.date.available2023-04-28 17:47:36 (GMT)
dc.date.issued2023-04-28
dc.date.submitted2023-04-28
dc.identifier.urihttp://hdl.handle.net/10012/19361
dc.description.abstractLanguage diversity in NLP is critical in enabling the development of tools for a wide range of users. However, there are limited resources for building such tools for many languages, particularly those spoken in Africa. For search, most existing datasets feature few to no African languages, directly impacting researchers’ ability to build and improve information access capabilities in those languages. Motivated by this, we created AfriCLIRMatrix, a test collection for cross-lingual information retrieval research in 15 diverse African languages automatically created from Wikipedia. The dataset comprises 6 million queries in English and 23 million relevance judgments automatically extracted from Wikipedia inter-language links. We extract 13,050 test queries with relevant judgments across 15 languages, covering a significantly broader range of African languages than other existing information retrieval test collections. In addition to providing a much-needed resource for researchers, we also release BM25, dense retrieval, and sparse-dense hybrid baselines to establish a starting point for the development of future systems. We hope that our efforts will stimulate further research in information retrieval for African languages and lead to the creation of more effective tools for the benefit of users.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/castorini/africlirmatrixen
dc.subjectInformation Retrievalen
dc.subjectAfrican Languagesen
dc.subjectNLPen
dc.subjectNatural Language Processingen
dc.titleEnabling Cross-lingual Information Retrieval for African Languagesen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws-etd.embargo.terms0en
uws.contributor.advisorJimmy, Lin
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages