Software Bug Detection Using the N-gram Language Model

dc.contributor.authorChollak, Devin
dc.date.accessioned2015-04-22T15:00:27Z
dc.date.available2015-04-22T15:00:27Z
dc.date.issued2015-04-22
dc.date.submitted2015
dc.description.abstractOver the years many techniques have been proposed to infer programming rules in order to improve software reliability. The techniques use violations of these programming rules to detect software defects. This thesis introduces an approach, NGDetection, which models a software’s source code using the n-gram language model in order to find bugs and refactoring opportunities in a number of open source Java projects. The use of the n-gram model to infer programming rules for software defect detection is a new domain for the application of the n-gram model. In addition to the n-gram model, NGDetection leverages two additional techniques to address limitations of existing defect detection techniques. First, the approach infers combined programming rules, which are a combination of infrequent programming rules with their related programming rules, to detect defects in a way other approaches cannot. Second, the approach integrates control flow into the n-gram model which increases the accuracy of defect detection. The approach is evaluated on 14 open source Java projects which range from 36 thousand lines of code (KLOC) to 1 million lines of code (MLOC). The approach detected 310 violations in the latest version of the projects, 108 of which are useful violations, i.e., 43 bugs and 65 refactoring opportunities. Of the 43 bugs, 32 were reported to the developers and the remaining are in the process of being reported. Among the reported bugs, 2 have been confirmed by the developers, while the rest await confirmation. For the 108 usefulviolations, at least 26 cannot be detected by existing techniques.en
dc.identifier.urihttp://hdl.handle.net/10012/9250
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectBug Detectionen
dc.subjectStatic Analysisen
dc.subjectProgramming Rulesen
dc.subjectNatural Language Modelsen
dc.subjectN-gramen
dc.subject.programComputer Science (Software Engineering)en
dc.titleSoftware Bug Detection Using the N-gram Language Modelen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chollak_Devin.pdf
Size:
447.27 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.85 KB
Format:
Item-specific license agreed upon to submission
Description: