An Effort Prediction Framework for Software Defect Correction
MetadataShow full item record
Developers apply changes and updates to software systems to adapt to emerging environments and address new requirements. In turn, these changes introduce additional software defects, usually caused by our inability to comprehend the full scope of the modi ed code. As a result, software practitioners have developed tools to aid in the detection and prediction of imminent software defects, in addition to the eort required to correct them. Although software development eort prediction has been in use for many years, research into defect-correction eort prediction is relatively new. The increasing complexity, integration and ubiquitous nature of current software systems has sparked renewed interest in this eld. Eort prediction now plays a critical role in the planning activities of managers. Accurate predictions help corporations budget, plan and distribute available resources eectively and e ciently. In particular, early defect-correction eort predictions could be used by testers to set schedules, and by managers to plan costs and provide earlier feedback to customers about future releases. In this work, we address the problem of predicting the eort needed to resolve a software defect. More speci cally, our study is concerned with defects or issues that are reported on an Issue Tracking System or any other defect repository. Current approaches use one prediction method or technique to produce eort predictions. This approach usually suers from the weaknesses of the chosen prediction method, and consequently the accuracy of the predictions are aected. To address this problem, we present a composite prediction framework. Rather than using one prediction approach for all defects, we propose the use of multiple integrated methods which complement the weaknesses of one another. Our framework is divided into two sub-categories, Similarity-Score Dependent and Similarity-Score Independent. The Similarity-Score Dependent method utilizes the power of Case-Based Reasoning, also known as Instance-Based Reasoning, to compute predictions. It relies on matching target issues to similar historical cases, then combines their known eort for an informed estimate. On the other hand, the Similarity-Score Independent method makes use of other defect-related information with some statistical manipulation to produce the required estimate. To measure similarity between defects, some method of distance calculation must be used. In some cases, this method might produce misleading results due to observed inconsistencies in history, and the fact that current similarity-scoring techniques cannot account for all the variability in the data. In this case, the Similarity-Score Independent method can be used to estimate the eort, where the eect of such inconsistencies can be reduced. We have performed a number of experimental studies on the proposed framework to assess the eectiveness of the presented techniques. We extracted the data sets from an operational Issue Tracking System in order to test the validity of the model on real project data. These studies involved the development of multiple tools in both the Java programming language and PHP, each for a certain stage of data analysis and manipulation. The results show that our proposed approach produces signi cant improvements when compared to current methods.