Approaching Memorization in Large Language Models
No Thumbnail Available
Date
2025-10-08
Authors
Advisor
Shang, Weiyi
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Large Language Models (LLMs) risk memorizing and reproducing sensitive or proprietary information from their training data. In this thesis, we investigate the behavior and mitigation of memorization in LLMs by adopting a pipeline that combines membership inference and data extraction attacks, and we evaluate memorization across multiple models. Through systematic experiments, we analyze how memorization varies with model size, architecture, and content category. We observe memorization rates ranging from 42% to 64% across the investigated models, demonstrating that memorization remains a persistent issue, and that the existing memorization-revealing pipeline remains valid on these models. Certain content categories are more prone to memorization, and realistic usage scenarios can still trigger it. Finally, we explore knowledge distillation as a mitigation approach: distilling Llama3-8B reduces the extraction rate by approximately 20%, suggesting a viable mitigation option. This work contributes a novel dataset and a BLEU-based evaluation pipeline, providing practical insights for research on LLM memorization.
Description
Keywords
large language model, memorization