The Libraries will be performing routine maintenance on UWSpace on October 13th, 2025, from 8 - 9 am ET. UWSpace will be unavailable during this time. Service should resume by 9 am ET.
 

Approaching Memorization in Large Language Models

dc.contributor.authorCheng, Xiaoyu
dc.date.accessioned2025-10-08T12:45:20Z
dc.date.available2025-10-08T12:45:20Z
dc.date.issued2025-10-08
dc.date.submitted2025-10-06
dc.description.abstractLarge Language Models (LLMs) risk memorizing and reproducing sensitive or proprietary information from their training data. In this thesis, we investigate the behavior and mitigation of memorization in LLMs by adopting a pipeline that combines membership inference and data extraction attacks, and we evaluate memorization across multiple models. Through systematic experiments, we analyze how memorization varies with model size, architecture, and content category. We observe memorization rates ranging from 42% to 64% across the investigated models, demonstrating that memorization remains a persistent issue, and that the existing memorization-revealing pipeline remains valid on these models. Certain content categories are more prone to memorization, and realistic usage scenarios can still trigger it. Finally, we explore knowledge distillation as a mitigation approach: distilling Llama3-8B reduces the extraction rate by approximately 20%, suggesting a viable mitigation option. This work contributes a novel dataset and a BLEU-based evaluation pipeline, providing practical insights for research on LLM memorization.
dc.identifier.urihttps://hdl.handle.net/10012/22561
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectlarge language model
dc.subjectmemorization
dc.titleApproaching Memorization in Large Language Models
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentElectrical and Computer Engineering
uws-etd.degree.disciplineElectrical and Computer Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorShang, Weiyi
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Cheng_Xiaoyu.pdf
Size:
2.37 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: