Enhancing Large Language Model Fine-Tuning for Classification Using Conditional Mutual Information

Sivakaran, Thanushon

Enhancing Large Language Model Fine-Tuning for Classification Using Conditional Mutual Information

dc.contributor.author	Sivakaran, Thanushon
dc.date.accessioned	2025-04-16T13:52:28Z
dc.date.available	2025-04-16T13:52:28Z
dc.date.issued	2025-04-16
dc.date.submitted	2025-04-15
dc.description.abstract	Large language models (LLMs) have achieved impressive advancements in recent years, showcasing their versatility and effectiveness in various tasks such as natural language understanding, generation, and translation. Despite these advancements, the full potential of information theory (IT) to further enhance the development of LLMs has yet to be fully explored. This thesis aims to bridge this gap by introducing the information-theoretic concept of Conditional Mutual Information (CMI) and applying it to the fine-tuning process of LLMs for classification tasks. We explore the promise of CMI in two primary ways: minimizing CMI to optimize a model's standalone performance and maximizing CMI to improve knowledge distillation (KD) and create more capable student models. To implement CMI in LLM fine-tuning, we adapt the recently proposed CMI-constrained deep learning framework, initially developed for image classification tasks, with necessary modifications for LLMs. In our experiments, we focus on applying CMI to LLM fine-tuning and knowledge distillation using the GLUE benchmark, a widely used suite of classification tasks for evaluating the performance of language models. Through minimizing CMI during the fine-tuning process, we achieve superior performance on 6 out of 8 GLUE classification tasks compared to the baseline BERT model. Furthermore, we explore the use of CMI to maximize information transfer during the KD process, where a smaller "student" model is trained to mimic the behavior of a larger, more powerful "teacher" model. By maximizing the teacher's CMI, we ensure that richer semantic information is passed to the student, improving performance. Our results show that maximizing CMI during KD leads to substantial improvements in 6 out of 8 GLUE classification tasks when compared to DistilBERT, a popular distilled version of BERT.
dc.identifier.uri	https://hdl.handle.net/10012/21594
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	LLMs
dc.subject	machine learning
dc.subject	NLP
dc.subject	information theory
dc.title	Enhancing Large Language Model Fine-Tuning for Classification Using Conditional Mutual Information
dc.type	Master Thesis
uws-etd.degree	Master of Applied Science
uws-etd.degree.department	Electrical and Computer Engineering
uws-etd.degree.discipline	Electrical and Computer Engineering
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Yang, En-Hui
uws.contributor.affiliation1	Faculty of Engineering
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sivakaran_Thanushon.pdf
Size:: 3.39 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering