Case Studies of a Machine Learning Process for Improving the Accuracy of Static Analysis Tools

Godfrey, MichaelZhao, Peng2016-10-182016-10-182016-10-182016-10-13http://hdl.handle.net/10012/11004Static analysis tools analyze source code and report suspected problems as warnings to the user. The use of these tools is a key feature of most modern software development processes; however, the tools tend to generate large result sets that can be hard to process and prioritize in an automated way. Two particular problems are (a) a high false positive rate, where warnings are generated for code that is not problematic and (b) a high rate of non-actionable true positives, where the warnings are not acted on or do not represent signi cant risks to the quality of the source code as perceived by the developers. Previous work has explored the use of machine learning to build models that can predict legitimate warnings with logistic regression [38] against Google Java codebase. Heckman [19] experimented with 15 machine learning algorithms on two open source projects to classify actionable static analysis alerts. In our work, we seek to replicate these ideas on di erent target systems, using di erent static analysis tools along with more machine learning techniques, and with an emphasis on security-related warnings. Our experiments indicate that these models can achieve high accuracy in actionable warning classi cation. We found that in most cases, our models outperform those of Heckman [19].enmachine learningstatic analysis toolCase Studies of a Machine Learning Process for Improving the Accuracy of Static Analysis ToolsMaster Thesis