On the Generalizability of AI-Generated Text Detection

David, Amir

On the Generalizability of AI-Generated Text Detection

dc.contributor.author	David, Amir
dc.date.accessioned	2026-01-20T14:43:32Z
dc.date.available	2026-01-20T14:43:32Z
dc.date.issued	2026-01-20
dc.date.submitted	2026-01-14
dc.description.abstract	As large language models (LLMs) become ubiquitous, reliably distinguishing their outputs from human writing is critical for academic integrity, content moderation, and preventing model collapse from synthetic training data. This thesis examines the generalizability of LLM-text detectors across evolving model families and domains. We compiled a comprehensive evaluation dataset from commonly-used human corpora and generated corresponding samples using recent OpenAI and Anthropic models spanning multiple generations. Comparing the state-of-the-art zero-shot detector (Binoculars) against supervised RoBERTa/DeBERTa classifiers, we arrive at four main findings. First, zero-shot detection fails on newer models. Second, supervised detectors maintain high TPR in-distribution but exhibit asymmetric cross-generation transfer. Third, commonly reported metrics such as AUROC can obscure poor performance at deployment-relevant thresholds: detectors achieving high AUROC yield near-zero TPR at low FPR, and existing low-FPR evaluations often lack statistical reliability due to small sample sizes. Fourth, through tail-focused training and calibration, we reduce FPR by up to 4× (from ~1% to ~0.25%) while maintaining 90% TPR. Our results suggest that robust detection requires continually re-calibrated, model-aware pipelines rather than static universal detectors.
dc.identifier.uri	https://hdl.handle.net/10012/22854
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	artificial intelligence
dc.subject	deep learning
dc.subject	large language model
dc.subject	OpenAI
dc.subject	ChatGPT
dc.subject	Anthropic
dc.subject	Claude
dc.subject	detection
dc.subject	robustness
dc.subject	SOCIAL SCIENCES::Statistics, computer and systems science::Informatics, computer and systems science::Computer and systems science
dc.subject	TECHNOLOGY::Information technology::Computer science::Software engineering
dc.subject	TECHNOLOGY::Information technology::Computer science
dc.subject	machine learning
dc.subject	zero-shot
dc.subject	supervised learning
dc.subject	state of the art
dc.subject	llm
dc.subject	bert
dc.title	On the Generalizability of AI-Generated Text Detection
dc.type	Master Thesis
uws-etd.degree	Master of Mathematics
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Kerschbaum, Florian
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: David_Amir.pdf
Size:: 476.89 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science