Repository logo
About
Deposit
Communities & Collections
All of UWSpace
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
Log In
Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Cheng, Xiaoyu"

Filter results by typing the first few letters
Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • No Thumbnail Available
    Item
    Approaching Memorization in Large Language Models
    (University of Waterloo, 2025-10-08) Cheng, Xiaoyu
    Large Language Models (LLMs) risk memorizing and reproducing sensitive or proprietary information from their training data. In this thesis, we investigate the behavior and mitigation of memorization in LLMs by adopting a pipeline that combines membership inference and data extraction attacks, and we evaluate memorization across multiple models. Through systematic experiments, we analyze how memorization varies with model size, architecture, and content category. We observe memorization rates ranging from 42% to 64% across the investigated models, demonstrating that memorization remains a persistent issue, and that the existing memorization-revealing pipeline remains valid on these models. Certain content categories are more prone to memorization, and realistic usage scenarios can still trigger it. Finally, we explore knowledge distillation as a mitigation approach: distilling Llama3-8B reduces the extraction rate by approximately 20%, suggesting a viable mitigation option. This work contributes a novel dataset and a BLEU-based evaluation pipeline, providing practical insights for research on LLM memorization.

DSpace software copyright © 2002-2025 LYRASIS

  • Privacy policy
  • End User Agreement
  • Send Feedback