Case Studies of a Machine Learning Process for Improving the Accuracy of Static Analysis Tools

Zhao, Peng

Case Studies of a Machine Learning Process for Improving the Accuracy of Static Analysis Tools

Files

Zhao_Peng.pdf (801.68 KB)

Date

2016-10-18

Authors

Zhao, Peng

Advisor

Godfrey, Michael

Publisher

University of Waterloo

Abstract

Static analysis tools analyze source code and report suspected problems as warnings to the user. The use of these tools is a key feature of most modern software development processes; however, the tools tend to generate large result sets that can be hard to process and prioritize in an automated way. Two particular problems are (a) a high false positive rate, where warnings are generated for code that is not problematic and (b) a high rate of non-actionable true positives, where the warnings are not acted on or do not represent signi cant risks to the quality of the source code as perceived by the developers. Previous work has explored the use of machine learning to build models that can predict legitimate warnings with logistic regression [38] against Google Java codebase. Heckman [19] experimented with 15 machine learning algorithms on two open source projects to classify actionable static analysis alerts. In our work, we seek to replicate these ideas on di erent target systems, using di erent static analysis tools along with more machine learning techniques, and with an emphasis on security-related warnings. Our experiments indicate that these models can achieve high accuracy in actionable warning classi cation. We found that in most cases, our models outperform those of Heckman [19].

Keywords

machine learning, static analysis tool

URI

http://hdl.handle.net/10012/11004

Collections

Theses
Computer Science

Full item page

Case Studies of a Machine Learning Process for Improving the Accuracy of Static Analysis Tools

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections