Multilingual Grammatical Error Detection And Its Applications to Prompt-Based Correction

Sutter Pessurno de Carvalho, Gustavo

Multilingual Grammatical Error Detection And Its Applications to Prompt-Based Correction

dc.contributor.advisor	Poupart, Pascal
dc.contributor.author	Sutter Pessurno de Carvalho, Gustavo
dc.date.accessioned	2024-01-05T16:56:10Z
dc.date.available	2024-01-05T16:56:10Z
dc.date.issued	2024-01-05
dc.date.submitted	2023-12-16
dc.description.abstract	Grammatical Error Correction (GEC) and Grammatical Error Correction (GED) are two important tasks in the study of writing assistant technologies. Given an input sentence, the former aims to output a corrected version of the sentence, while the latter's goal is to indicate in which words of the sentence errors occur. Both tasks are relevant for real-world applications that help native speakers and language learners to write better. Naturally, these two areas have attracted the attention of the research community and have been studied in the context of modern neural networks. This work focuses on the study of multilingual GED models and how they can be used to improve GEC performed by large language models (LLMs). We study the difference in performance between GED models trained in a single language and models that undergo multilingual training. We expand the list of datasets used for multilingual GED to further experiment with cross-dataset and cross-lingual generalization of detection models. Our results go against previous findings and indicate that multilingual GED models are as good as monolingual ones when evaluated in the in-domain languages. Furthermore, multilingual models show better generalization to novel languages seen only at test time. Making use of the GED models we study, we propose two methods to improve corrections of prompt-based GEC using LLMs. The first method aims to mitigate overcorrection by using a detection model to determine if a sentence has any mistakes before feeding it to the LLM. The second method uses the sequence of GED tags to select the in-context examples provided in the prompt. We perform experiments in English, Czech, German and Russian, using Llama2 and GPT3.5. The results show that both methods increase the performance of prompt-based GEC and point to a promising direction of using GED models as part of the correction pipeline performed by LLMs.	en
dc.identifier.uri	http://hdl.handle.net/10012/20216
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	grammatical error correction	en
dc.subject	grammatical error detection	en
dc.subject	natural language processing	en
dc.subject	machine learning	en
dc.subject	deep learning	en
dc.title	Multilingual Grammatical Error Detection And Its Applications to Prompt-Based Correction	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Poupart, Pascal
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sutter_Pessurno_de_Carvalho_Gustavo.pdf
Size:: 678.42 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science