Evaluating the Effectiveness of Code2Vec for Bug Prediction When Considering That Not All Bugs Are the Same

Loading...
Thumbnail Image

Date

2020-09-21

Authors

Baron, Kilby

Advisor

Nagappan, Meiyappan
Godfrey, Michael

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Bug prediction is an area of research focused on predicting where in a software project future bugs will occur. The purpose of bug prediction models is to help companies spend their quality assurance resources more efficiently by prioritizing the testing of the most defect prone entities. Most bug prediction models are only concerned with predicting whether an entity has a bug, or how many bugs an entity will have, which implies that all bugs have the same importance. In reality, bugs can have vastly different origins, impacts, priorities, and costs; therefore, bug prediction models could potentially be improved if they were able to give an indication of which bugs to prioritize based on an organization’s needs. This paper evaluates a possible method for predicting bug attributes related to cost by analyzing over 33,000 bugs from 11 different projects. If bug attributes related to cost can be predicted, then bug prediction models can use the approach to improve the granularity of their results. The cost metrics in this study are bug priority, the experience of the developer who fixed the bug, and the size of the bug fix. First, it is shown that bugs differ along each cost metric, and prioritizing buggy entities along each of these metrics will produce very different results. We then evaluate two methods of predicting cost metrics: traditional deep learning models, and semantic learning models. The results of the analysis found evidence that traditional independent variables show potential as predictors of cost metrics. The semantic learning model was not as successful, but may show more effectiveness in future iterations.

Description

Keywords

bug prediction, code2vec

LC Keywords

Citation