SQLyzr: A Comprehensive Benchmark and Framework for Evaluating Text-to-SQL Systems

Abedini, Sepideh

SQLyzr: A Comprehensive Benchmark and Framework for Evaluating Text-to-SQL Systems

dc.contributor.author	Abedini, Sepideh
dc.date.accessioned	2026-04-23T17:58:42Z
dc.date.available	2026-04-23T17:58:42Z
dc.date.issued	2026-04-23
dc.date.submitted	2026-04-15
dc.description.abstract	Natural language–to–SQL (text-to-SQL) systems aim to enable users to interact with relational databases using natural language instead of SQL. Recent advances in large language models have significantly improved the performance of these systems, making them increasingly practical for real-world applications. With the rapid pace of progress and the growing adoption of text-to-SQL systems, robust benchmarking has become essential. However, existing benchmarks typically rely on a single correctness metric, lack alignment with real-world query usage patterns, and do not evaluate the scalability of generated queries, which limits their ability to provide realistic and practically meaningful evaluation. This thesis introduces SQLyzr, a comprehensive text-to-SQL benchmark and evaluation framework designed to address these limitations. SQLyzr incorporates a fine-grained taxonomy of SQL queries and reports evaluation results at the level of query categories and subcategories, enabling detailed insights into system performance across different query types. In addition, SQLyzr extends traditional evaluation by introducing complementary metrics that assess not only the correctness but also the efficiency and structural complexity of generated SQL queries. To better reflect real-world usage, SQLyzr aligns the distribution of query categories with empirical SQL workload distributions and supports dataset scaling to enable evaluation on larger databases. Building on these ideas, we also introduce a configurable text-to-SQL benchmarking framework that allows users to customize and extend benchmark components such as workloads, datasets, and evaluation metrics. The framework further provides novel features such as detailed error analysis for identifying incorrect queries with minor issues and workload augmentation for synthesizing additional question-SQL pairs that target weaknesses of a given text-to-SQL system. We use SQLyzr to evaluate two state-of-the-art text-to-SQL systems with similar overall correctness scores. Our results demonstrate that SQLyzr enables clearer comparison between systems and reveals deeper insights into their relative strengths and weaknesses.
dc.identifier.uri	https://hdl.handle.net/10012/23045
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	https://github.com/sepideh-abedini/SQLyzr
dc.title	SQLyzr: A Comprehensive Benchmark and Framework for Evaluating Text-to-SQL Systems
dc.type	Master Thesis
uws-etd.degree	Master of Mathematics
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Özsu, M. Tamer
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Abedini_Sepideh.pdf
Size:: 6.33 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science