Commit-Level vs. File-Level Vulnerability Prediction

Chong, Michael

Commit-Level vs. File-Level Vulnerability Prediction

Files

Chong_Michael.pdf (2.52 MB)

Date

2016-09-19

Authors

Chong, Michael

Advisor

Tan, Lin

Publisher

University of Waterloo

Abstract

Helping software development teams find and repair vulnerabilities before they are released and exploited can prevent costs due to loss of data, availability, and reputation. However, while general defect prediction models exist to help developers find bugs, vulnerability prediction models currently do not achieve high enough prediction performance to be used in industry [43]. Prediction of vulnerabilities in commits and files has been explored by previous work, and while commit-level prediction, at a finer granularity, may offer more useful results, there exists no clear comparison in predictive performance to justify this assumption. To inform further research in vulnerability prediction, we compare commit and file-level prediction, across 7 projects, using 6 classifiers, for 8 different training dates. We evaluate the performance of each prediction model using ‘online prediction’ for ensuring an evaluation in line with practical usage of the prediction model. We evaluate each model using four different metrics, which we interpret as representing two different practical usage scenarios. We also perform an analysis of the data and techniques for evaluating prediction models. We find that despite achieving a low absolute prediction performance, file-level prediction generally tends to outperform commit-level prediction, but in a few outstanding cases, commit-level performs better.