Approaching Memorization in Large Language Models

Cheng, Xiaoyu

Approaching Memorization in Large Language Models

Files

Cheng_Xiaoyu.pdf (2.37 MB)

Date

2025-10-08

Authors

Cheng, Xiaoyu

Advisor

Shang, Weiyi

Publisher

University of Waterloo

Abstract

Large Language Models (LLMs) risk memorizing and reproducing sensitive or proprietary information from their training data. In this thesis, we investigate the behavior and mitigation of memorization in LLMs by adopting a pipeline that combines membership inference and data extraction attacks, and we evaluate memorization across multiple models. Through systematic experiments, we analyze how memorization varies with model size, architecture, and content category. We observe memorization rates ranging from 42% to 64% across the investigated models, demonstrating that memorization remains a persistent issue, and that the existing memorization-revealing pipeline remains valid on these models. Certain content categories are more prone to memorization, and realistic usage scenarios can still trigger it. Finally, we explore knowledge distillation as a mitigation approach: distilling Llama3-8B reduces the extraction rate by approximately 20%, suggesting a viable mitigation option. This work contributes a novel dataset and a BLEU-based evaluation pipeline, providing practical insights for research on LLM memorization.