UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Software Bug Detection Using the N-gram Language Model

Loading...
Thumbnail Image

Date

2015-04-22

Authors

Chollak, Devin

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Over the years many techniques have been proposed to infer programming rules in order to improve software reliability. The techniques use violations of these programming rules to detect software defects. This thesis introduces an approach, NGDetection, which models a software’s source code using the n-gram language model in order to find bugs and refactoring opportunities in a number of open source Java projects. The use of the n-gram model to infer programming rules for software defect detection is a new domain for the application of the n-gram model. In addition to the n-gram model, NGDetection leverages two additional techniques to address limitations of existing defect detection techniques. First, the approach infers combined programming rules, which are a combination of infrequent programming rules with their related programming rules, to detect defects in a way other approaches cannot. Second, the approach integrates control flow into the n-gram model which increases the accuracy of defect detection. The approach is evaluated on 14 open source Java projects which range from 36 thousand lines of code (KLOC) to 1 million lines of code (MLOC). The approach detected 310 violations in the latest version of the projects, 108 of which are useful violations, i.e., 43 bugs and 65 refactoring opportunities. Of the 43 bugs, 32 were reported to the developers and the remaining are in the process of being reported. Among the reported bugs, 2 have been confirmed by the developers, while the rest await confirmation. For the 108 usefulviolations, at least 26 cannot be detected by existing techniques.

Description

Keywords

Bug Detection, Static Analysis, Programming Rules, Natural Language Models, N-gram

LC Keywords

Citation