On the Generalizability of AI-Generated Text Detection
| dc.contributor.author | David, Amir | |
| dc.date.accessioned | 2026-01-20T14:43:32Z | |
| dc.date.available | 2026-01-20T14:43:32Z | |
| dc.date.issued | 2026-01-20 | |
| dc.date.submitted | 2026-01-14 | |
| dc.description.abstract | As large language models (LLMs) become ubiquitous, reliably distinguishing their outputs from human writing is critical for academic integrity, content moderation, and preventing model collapse from synthetic training data. This thesis examines the generalizability of LLM-text detectors across evolving model families and domains. We compiled a comprehensive evaluation dataset from commonly-used human corpora and generated corresponding samples using recent OpenAI and Anthropic models spanning multiple generations. Comparing the state-of-the-art zero-shot detector (Binoculars) against supervised RoBERTa/DeBERTa classifiers, we arrive at four main findings. First, zero-shot detection fails on newer models. Second, supervised detectors maintain high TPR in-distribution but exhibit asymmetric cross-generation transfer. Third, commonly reported metrics such as AUROC can obscure poor performance at deployment-relevant thresholds: detectors achieving high AUROC yield near-zero TPR at low FPR, and existing low-FPR evaluations often lack statistical reliability due to small sample sizes. Fourth, through tail-focused training and calibration, we reduce FPR by up to 4× (from ~1% to ~0.25%) while maintaining 90% TPR. Our results suggest that robust detection requires continually re-calibrated, model-aware pipelines rather than static universal detectors. | |
| dc.identifier.uri | https://hdl.handle.net/10012/22854 | |
| dc.language.iso | en | |
| dc.pending | false | |
| dc.publisher | University of Waterloo | en |
| dc.subject | artificial intelligence | |
| dc.subject | deep learning | |
| dc.subject | large language model | |
| dc.subject | OpenAI | |
| dc.subject | ChatGPT | |
| dc.subject | Anthropic | |
| dc.subject | Claude | |
| dc.subject | detection | |
| dc.subject | robustness | |
| dc.subject | SOCIAL SCIENCES::Statistics, computer and systems science::Informatics, computer and systems science::Computer and systems science | |
| dc.subject | TECHNOLOGY::Information technology::Computer science::Software engineering | |
| dc.subject | TECHNOLOGY::Information technology::Computer science | |
| dc.subject | machine learning | |
| dc.subject | zero-shot | |
| dc.subject | supervised learning | |
| dc.subject | state of the art | |
| dc.subject | llm | |
| dc.subject | bert | |
| dc.title | On the Generalizability of AI-Generated Text Detection | |
| dc.type | Master Thesis | |
| uws-etd.degree | Master of Mathematics | |
| uws-etd.degree.department | David R. Cheriton School of Computer Science | |
| uws-etd.degree.discipline | Computer Science | |
| uws-etd.degree.grantor | University of Waterloo | en |
| uws-etd.embargo.terms | 0 | |
| uws.contributor.advisor | Kerschbaum, Florian | |
| uws.contributor.affiliation1 | Faculty of Mathematics | |
| uws.peerReviewStatus | Unreviewed | en |
| uws.published.city | Waterloo | en |
| uws.published.country | Canada | en |
| uws.published.province | Ontario | en |
| uws.scholarLevel | Graduate | en |
| uws.typeOfResource | Text | en |