Slice Based Fuzzing of Software

Murali, Aniruddhan

Slice Based Fuzzing of Software

Files

Murali_Aniruddhan.pdf (2.16 MB)

Date

2026-05-26

Authors

Murali, Aniruddhan

Advisor

Nagappan, Meiyappan

Publisher

University of Waterloo

Abstract

Modern software systems are increasingly complex, and static analysis tools are widely used to identify potentially vulnerable code by issuing warnings. However, these warnings often require manual inspection by developers to determine whether the reported issues are genuine, making validation time-consuming and error-prone. Similarly, ensuring that individual commits (i.e., code changes) do not introduce bugs remains a fundamental challenge in software maintenance. Directed fuzzing has emerged as a powerful automated testing technique for bug detection. Yet applying directed fuzzing to entire projects for each warning or modified code location is computationally expensive, often requiring days of execution while yielding only incremental coverage improvements. We present a unified framework for bug detection based on the construction and fuzzing of compiled code slices centered on either static analysis warnings or functions modified in code commits. Unlike prior approaches that extract slices from the program entry point or impose restrictive slice-size limits, our framework constructs slices of arbitrary size and compiles them into standalone, testable units. For static analysis warning validation, we directly fuzz slices of the function containing the warning, enabling rapid elimination of false positives. We implement this approach in a tool called FuzzSlice. The key insight that we base our work on is that a warning that does not yield the bug when fuzzed at the function level in a given time budget is most likely a false positive. Evaluation on the Juliet benchmark shows that FuzzSlice detects all 864 known false positives in the ground truth. For open-source repositories, developers from tmux and openSSH independently labeled reported warnings. In these projects, FuzzSlice automatically identified 33 of 53 developer-confirmed false positives, reducing false positives by 62.26%. These results demonstrate that FuzzSlice substantially reduces manual validation effort, achieving complete elimination of false positives in Juliet and significant reductions in real-world codebases. To further strengthen static analysis warning validation, we introduce SnipTest, a framework that heuristically identifies true positives among warnings. SnipTest employs a layer-by-layer slicing strategy that incrementally expands the slice context around the target location prior to fuzzing. Unlike FuzzSlice which tests only the function containing the warning, SnipTest can incrementally grow the boundary of the code slice to include more calling context, enabling the validation of potential bugs with progressively increasing precision. We evaluate SnipTest on a benchmark comprising 97 true bugs and 97 false alarms across three real-world projects. SnipTest triggered 53 true bugs (54.6%), consistently across three slice levels, while the remaining cases were determined to be unreachable. Compared to state-of-the-art directed fuzzers, unseeded SnipTest confirms more bugs than unseeded baselines and matches the effectiveness of seeded fuzzers. Moreover, SnipTest significantly improves efficiency, achieving a 5.5–10.6× speedup in fuzzing time relative to both seeded and unseeded directed fuzzers. We further demonstrate its practical relevance by discovering three previously unknown bugs in vim and libpcap, leading to the disclosure of CVE-2025-11964. Finally, we extend our slice-based approach to commit verification. We introduce CommitGuard, a novel commit-aware differential fuzzing framework for detecting bugs introduced by code changes. Rather than fuzzing entire program versions, our method automatically identifies functions modified by the code changes and generates dedicated slices for those functions. For each function that has been modified, we create two slices: one using the updated version after the code change, and another using the older version before the code change. These slices are fuzzed independently, and their runtime behaviors are systematically compared to uncover divergences indicative of commit-induced bugs. We show the practicality of commit-level differential fuzzing by detecting five previously unknown bugs across 300 commits in widely used projects, including openSSL, libpcap, and leptonica. Additionally, CommitGuard exhibits a low false positive rate, with only 2 false positives among these 300 commits. Finally, we demonstrate that CommitGuard is efficient, requiring 32 minutes per commit, while achieving up to 75.36% code coverage of modified functions on average. Overall, we aim to show that slice-based differential fuzzing is both effective and computationally efficient, making it well-suited for integration into modern, fast-paced software development workflows, such as Continuous Integration (CI).