Software Bug Detection Using the N-gram Language Model

Chollak, Devin

Software Bug Detection Using the N-gram Language Model

Date

2015-04-22

Authors

Chollak, Devin

Publisher

University of Waterloo

Abstract

Over the years many techniques have been proposed to infer programming rules in order to improve software reliability. The techniques use violations of these programming rules to detect software defects. This thesis introduces an approach, NGDetection, which models a software’s source code using the n-gram language model in order to find bugs and refactoring opportunities in a number of open source Java projects. The use of the n-gram model to infer programming rules for software defect detection is a new domain for the application of the n-gram model. In addition to the n-gram model, NGDetection leverages two additional techniques to address limitations of existing defect detection techniques. First, the approach infers combined programming rules, which are a combination of infrequent programming rules with their related programming rules, to detect defects in a way other approaches cannot. Second, the approach integrates control flow into the n-gram model which increases the accuracy of defect detection. The approach is evaluated on 14 open source Java projects which range from 36 thousand lines of code (KLOC) to 1 million lines of code (MLOC). The approach detected 310 violations in the latest version of the projects, 108 of which are useful violations, i.e., 43 bugs and 65 refactoring opportunities. Of the 43 bugs, 32 were reported to the developers and the remaining are in the process of being reported. Among the reported bugs, 2 have been confirmed by the developers, while the rest await confirmation. For the 108 usefulviolations, at least 26 cannot be detected by existing techniques.

Keywords

Bug Detection, Static Analysis, Programming Rules, Natural Language Models, N-gram

URI

http://hdl.handle.net/10012/9250

Collections

Theses
Computer Science

Full item page

Software Bug Detection Using the N-gram Language Model

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections