SQLyzr: A Comprehensive Benchmark and Framework for Evaluating Text-to-SQL Systems

dc.contributor.authorAbedini, Sepideh
dc.date.accessioned2026-04-23T17:58:42Z
dc.date.available2026-04-23T17:58:42Z
dc.date.issued2026-04-23
dc.date.submitted2026-04-15
dc.description.abstractNatural language–to–SQL (text-to-SQL) systems aim to enable users to interact with relational databases using natural language instead of SQL. Recent advances in large language models have significantly improved the performance of these systems, making them increasingly practical for real-world applications. With the rapid pace of progress and the growing adoption of text-to-SQL systems, robust benchmarking has become essential. However, existing benchmarks typically rely on a single correctness metric, lack alignment with real-world query usage patterns, and do not evaluate the scalability of generated queries, which limits their ability to provide realistic and practically meaningful evaluation. This thesis introduces SQLyzr, a comprehensive text-to-SQL benchmark and evaluation framework designed to address these limitations. SQLyzr incorporates a fine-grained taxonomy of SQL queries and reports evaluation results at the level of query categories and subcategories, enabling detailed insights into system performance across different query types. In addition, SQLyzr extends traditional evaluation by introducing complementary metrics that assess not only the correctness but also the efficiency and structural complexity of generated SQL queries. To better reflect real-world usage, SQLyzr aligns the distribution of query categories with empirical SQL workload distributions and supports dataset scaling to enable evaluation on larger databases. Building on these ideas, we also introduce a configurable text-to-SQL benchmarking framework that allows users to customize and extend benchmark components such as workloads, datasets, and evaluation metrics. The framework further provides novel features such as detailed error analysis for identifying incorrect queries with minor issues and workload augmentation for synthesizing additional question-SQL pairs that target weaknesses of a given text-to-SQL system. We use SQLyzr to evaluate two state-of-the-art text-to-SQL systems with similar overall correctness scores. Our results demonstrate that SQLyzr enables clearer comparison between systems and reveals deeper insights into their relative strengths and weaknesses.
dc.identifier.urihttps://hdl.handle.net/10012/23045
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/sepideh-abedini/SQLyzr
dc.titleSQLyzr: A Comprehensive Benchmark and Framework for Evaluating Text-to-SQL Systems
dc.typeMaster Thesis
uws-etd.degreeMaster of Mathematics
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorÖzsu, M. Tamer
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Abedini_Sepideh.pdf
Size:
6.33 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections