Evaluating the Effectiveness of Code2Vec for Bug Prediction When Considering That Not All Bugs Are the Same
Loading...
Date
2020-09-21
Authors
Baron, Kilby
Advisor
Nagappan, Meiyappan
Godfrey, Michael
Godfrey, Michael
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Bug prediction is an area of research focused on predicting where in a software project
future bugs will occur. The purpose of bug prediction models is to help companies spend
their quality assurance resources more efficiently by prioritizing the testing of the most
defect prone entities. Most bug prediction models are only concerned with predicting
whether an entity has a bug, or how many bugs an entity will have, which implies that all
bugs have the same importance. In reality, bugs can have vastly different origins, impacts,
priorities, and costs; therefore, bug prediction models could potentially be improved if they
were able to give an indication of which bugs to prioritize based on an organization’s needs.
This paper evaluates a possible method for predicting bug attributes related to cost by
analyzing over 33,000 bugs from 11 different projects. If bug attributes related to cost can
be predicted, then bug prediction models can use the approach to improve the granularity of
their results. The cost metrics in this study are bug priority, the experience of the developer
who fixed the bug, and the size of the bug fix. First, it is shown that bugs differ along each
cost metric, and prioritizing buggy entities along each of these metrics will produce very
different results. We then evaluate two methods of predicting cost metrics: traditional deep
learning models, and semantic learning models. The results of the analysis found evidence
that traditional independent variables show potential as predictors of cost metrics. The
semantic learning model was not as successful, but may show more effectiveness in future
iterations.
Description
Keywords
bug prediction, code2vec