Show simple item record

dc.contributor.authorAdeyemi, Mofetoluwa
dc.date.accessioned2024-04-30 15:19:55 (GMT)
dc.date.available2024-04-30 15:19:55 (GMT)
dc.date.issued2024-04-30
dc.date.submitted2024-04-17
dc.identifier.urihttp://hdl.handle.net/10012/20520
dc.description.abstractWeb resources are becoming more available in various languages, increasing the importance of cross-lingual information retrieval (CLIR) in accessing information that is present in a different language. To support CLIR studies, test collections are actively curated in the information retrieval (IR) field for the evaluation of methods and systems. Resources which support the evaluation of CLIR for African languages exist, however, these resources are few and are mostly curated synthetically or through translation, making them biased towards certain retrieval methods or prone to “Translationese” issues. Current resources also have document collections collected from sources with scarce resources for African languages, potentially limiting the provision of documents relevant to a search query. To address these, we present CIRAL, a test collection covering retrieval between English and four African languages: Hausa, Somali, Swahili and Yoruba. With its corpora developed from African news and blogs, which are rich sources of textual data for these languages, CIRAL was formulated for the passage ranking task with queries in English and passages in the African languages. Native speakers of the African languages develop the queries and provide query-passage relevance assessment. As often done in IR to curate test collections and promote research participation in CLIR, CIRAL was hosted as a shared task at the Forum for Information Retrieval and Evaluation (FIRE) 2023, where pools were collected for a subset of the collection. In this thesis, we provide a detailed description of CIRAL as a body of work, covering its curation process and shared task. Additionally, we conduct retrieval and reranking experiments, evaluating the effectiveness of systems in CLIR for African languages and demonstrating the utility of CIRAL. These include BM25 baselines with query and document translations and dense retrieval baselines with multilingual dense passage retrievers. We also examine the zero-shot reranking capabilities of T5 cross-encoder models and Large Language Models (LLMs) such as GPT and Zephyr in CLIR for African languages. We hope CIRAL fosters CLIR evaluation and research in African languages, and hence the development of retrieval systems that are well-suited for such tasks.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/ciralproject/ciralen
dc.subjectcross-lingual information retrievalen
dc.subjectAfrican languagesen
dc.subjecttest collectionsen
dc.subjectlarge language modelsen
dc.subjectevaluationsen
dc.titleFacilitating Cross-Lingual Information Retrieval Evaluations for African Languagesen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws-etd.embargo.terms0en
uws.contributor.advisorLin, Jimmy
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages