Program Reduction: Versatility, Insights, and Efficacy

Loading...
Thumbnail Image

Date

2025-05-08

Advisor

Sun, Chengnian

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Given a program P and a property ψ it preserves, the goal of program reduction is to minimize this program while ensuring that the minimized version Pmin still preserves ψ. Program reduction is a widely used technique to facilitate compiler debugging, as most compiler bugs are triggered by program inputs, which often contain a significant amount of code unrelated to the bug. Although program reduction automates eliminating such bug-irrelevant components from a program, it can take hours to complete due to its trial-and-error strategy. Furthermore, even if the reduction process produces a smaller program after a long wait, the resulting program is not guaranteed to be optimal, as program reduction is an NP-complete problem. As a result, compiler developers consistently seek program reduction approaches that offer faster reduction speeds or produce smaller outputs. It is critical to design superior program reduction approaches to further explore the potential of this field. This thesis aims to help enhance program reduction approaches in the following three ways. The first study aims to enhance the versatility of program reduction approaches. While existing techniques, such as C-Reduce and Perses, can effectively reduce a bug-triggering program as a whole, they fail to consider the varied degrees of relevance each remaining element has to the bug. To address this limitation, this study proposes PPR, a new program reduction technique designed to minimize a pair of programs w.r.t. certain properties. Given a seed program Ps and its variant Pv with differing properties (e.g., Pv crashes a compiler while Ps does not), PPR produces a pair of minimized programs that preserve these properties separately, with reduced differences highlighting bug-related elements. Evaluation results demonstrate that PPR effectively reduces pairs of programs, producing outputs of comparably small size to those generated by Perses and C-Reduce. In addition, PPR significantly outperforms the Delta Debugging algorithm in isolating bug-related differences. The second study concentrates on the understanding of a state-of-the-art program reduction approach ProbDD. ProbDD, a variant of ddmin, employs a probabilistic model to estimate the likelihood of each element being relevant to ψ, achieving superior performance compared to the Delta Debugging algorithm. However, the theoretical probabilistic model underlying ProbDD is intricate and not fully understood. To address this, the study conducts the first in-depth theoretical analysis of ProbDD, comprehending and demystifying this state-of-the-art approach from multiple perspectives. Building upon these insights, the study further proposes CDD, a simplified counter-based model that retains the core strengths of ProbDD. The evaluation demonstrates that CDD can achieve the same performance as ProbDD, despite being much simplified. The third study integrates large language models (LLMs) to improve the efficacy of program reduction. Existing program reduction techniques typically fall into two categories: generic approaches applicable to a wide range of languages but lacking domain-specific knowledge, and language-specific approaches that exploit detailed language knowledge but fail to generalize across multiple languages. However, effectively combining both language generality and language-specific expertise in program reduction is yet to be explored. To this end, this chapter proposes LPR, the first LLM-aided technique leveraging LLMs to perform language-specific program reduction for multiple languages. The key insight is to synergize the language generality of existing program reducers, such as Perses, with the language-specific semantics learned by LLMs. The evaluation shows that LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs, while reducing the reduction time by 10.77%, 34.88%, 36.96% on benchmarks in C, Rust and JavaScript, separately. Collectively, these studies advance the field of program reduction by enhancing its versatility, insights, and efficacy. By making reduction more effective, faster, and easier to understand, this thesis facilitates efficient debugging and robust compiler development.

Description

Keywords

TECHNOLOGY::Information technology::Computer science::Software engineering, program reduction, test input minimization, delta debugging

LC Subject Headings

Citation