UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Online Defect Prediction for Imbalanced Data

dc.contributor.authorTan, Ming
dc.date.accessioned2015-08-14T15:47:30Z
dc.date.available2015-08-14T15:47:30Z
dc.date.issued2015-08-14
dc.date.submitted2015-08-14
dc.description.abstractMany defect prediction techniques are proposed to improve software reliability. Change classification predicts defects at the change level, where a change is a collection of the modifications to one file in a commit. In this thesis, we conduct the first study of applying change classification in practice and share the lessons we learned. We identify two issues in the prediction process, both of which contribute to the low prediction performance. First, the data are imbalanced—there are much fewer buggy changes than clean changes. Second, the commonly used cross-validation approach is inappropriate for evaluating the performance of change classification. To address these challenges, we apply and adapt online change classification to evaluate the prediction and use resampling, updatable classification techniques as well as remove the testing-related changes to improve the classification performance. We perform the improved change classification techniques on one proprietary and six open source projects. Our results show that resampling and updatable classification techniques improve the precision of change classification by 12.2–89.5% or 6.4–34.8 percentage points (pp.) on the seven projects. Additionally, removing testing-related changes improves F1 by 62.2–3411.1% or 19.4–61.4 pp. on the six open source projects with a comparable value of precision achieved. Furthermore, we integrate change classification in the development process of the proprietary project. We have learned the following lessons: 1 ) new solutions are needed to convince developers to use and believe prediction results, and prediction results need to be actionable, 2 ) new and improved classification algorithms are needed to explain the prediction results, and insensible and unactionable explanations need to be filtered or refined, and 3 ) new techniques are needed to improve the relatively low precision.en
dc.identifier.urihttp://hdl.handle.net/10012/9532
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterloo
dc.subjectDefect Predictionen
dc.subjectCross-validationen
dc.subjectImbalanced Dataen
dc.subjectResamplingen
dc.subject.programElectrical and Computer Engineeringen
dc.titleOnline Defect Prediction for Imbalanced Dataen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tan_Ming.pdf
Size:
309.2 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.17 KB
Format:
Item-specific license agreed upon to submission
Description: