Efficient Learning for Large Language Models

Rajabzadeh, Hossein

Efficient Learning for Large Language Models

dc.contributor.author	Rajabzadeh, Hossein
dc.date.accessioned	2026-01-20T18:27:29Z
dc.date.available	2026-01-20T18:27:29Z
dc.date.issued	2026-01-20
dc.date.submitted	2026-01-16
dc.description.abstract	Artificial Intelligence (AI) systems have become indispensable across domains such as healthcare, finance, robotics, and scientific discovery. At the heart of this revolution, Large Language Models (LLMs) have emerged as the central paradigm, demonstrating remarkable reasoning, generalization, and multi-domain adaptability. However, their exponential growth in scale introduces severe computational bottlenecks in training, fine-tuning, and inference, limiting accessibility, sustainability, and real-world deployment. This dissertation advances the efficiency of LLMs across all lifecycle stages by introducing a suite of five frameworks that significantly reduce compute, memory, and latency costs with minimal or no loss in accuracy. First, Quantized Dynamic Low-Rank Adaptation (QDyLoRA) enables memory-efficient fine-tuning across multiple LoRA ranks in a single training pass, achieving competitive performance to QLoRA while reducing GPU memory usage by up to 65% and supporting flexible rank selection at inference time. Second, Sorted-LoRA introduces a stochastic depth–aware fine-tuning framework that co-trains multiple sub-models of varying depths within a single cycle. On LLaMA2–7B, it produces submodels up to 40% smaller that retain over 98% task accuracy, with the largest variant even surpassing the base model by +0.34%. Third, LoRA-Drop accelerates autoregressive inference by dynamically substituting computationally redundant layers with lightweight low-rank modules during decoding. It delivers up to 2.6× faster decoding and a 50% reduction in KV-cache memory with less than 0.5% degradation in accuracy, offering latency-aware adaptability for real-world deployment. Fourth, EchoAtt exploits redundancy in attention maps by sharing attention matrices among similar layers. On TinyLLaMA–1.1B, it achieves 15% faster inference, 25% faster training, and a 4% parameter reduction while improving zero-shot accuracy, highlighting that structural compression can enhance rather than degrade model generalization. Finally, ECHO-LLaMA introduces cross-layer Key–Value (KV) and Query–Key (QK) sharing to reduce redundant attention computation. This approach achieves up to 77% higher token-per-second throughput during training, 16% higher Model FLOPs Utilization (MFU), and 7% higher test-time throughput, while preserving language modeling performance. On the mechanical-domain RoboEval benchmark, ECHO-CodeLLaMA-7B boosts average accuracy from 62.15% to 63.01% with only 50% KV sharing, confirming its robustness in domain adaptation. Together, these contributions form a coherent research program on the efficiency of large-scale Transformers. They demonstrate that intelligently exploiting representational redundancy—through quantization, low-rank structure, cross-layer sharing, and adaptive computation—can yield substantial compute savings with minimal trade-offs.
dc.identifier.uri	https://hdl.handle.net/10012/22858
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.title	Efficient Learning for Large Language Models
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	Mechanical and Mechatronics Engineering
uws-etd.degree.discipline	Mechanical Engineering
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Kwon, Hyock Ju
uws.contributor.affiliation1	Faculty of Engineering
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Rajabzadeh_Hossein.pdf
Size:: 9.66 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Mechanical and Mechatronics Engineering