Show simple item record

dc.contributor.authorWei, Moshi
dc.date.accessioned2019-09-23 17:03:32 (GMT)
dc.date.available2019-09-23 17:03:32 (GMT)
dc.date.issued2019-09-23
dc.date.submitted2019-09-17
dc.identifier.urihttp://hdl.handle.net/10012/15119
dc.description.abstractBug fixing is a time-consuming task in software development. Automated bug repair tools are created to fix programs with little or no human effort. There are many existing tools based on the generate-and-validate (G&V) approach, which is an automated program repair technique that generates a list of repair candidates then selects the correct candidates as output. Another approach is learning the repair process with machine learning models and then generating the candidates. One machine learning-based approach is the end-to-end approach. This approach passes the input source code directly to the machine learning model and generates the repair candidates in source code as output. There are several challenges in this approach such as the large size of vocabulary, high rate of out-of-vocabulary(OOV) tokens and difficulties on embedding learning. We propose an abstraction-and-reconstruction technique on top of end-to-end approaches that convert the training source code to templates to alleviate the problems in the traditional end-to-end approach. We train the machine learning model with abstracted bug-fix pairs from open source projects. The abstraction process converts the source code to templates before passing it to the model. After the training is complete, we use the trained model to predict the fix templates of new bugs. The output of the model is passed to the reconstruction layer to get the source code patch candidates. We train the machine learning model with 470,085 bug-fix pairs collected from 1000 top python projects from Github. We use the QuixBugs dataset as the test set to evaluate the result. The fix of the bug in the QuixBugs is verified by the test cases provided by the QuixBugs dataset. We choose the traditional end-to-end approach as the baseline and comparing it with the abstraction model. The accuracy of generating correct bug fixes increase from 25% to 57.5% while the training time reduces from 5.7 hours to 1.63 hours. The overhead introduced by the reconstruction model is 218 milliseconds on average or 23.32%, which is negligible comparing to the time saved in the training, which is 4.07 hours or 71.4%. We performed a deep analysis of the result and identified three reasons that may explain why the abstraction model outperforms the baseline. Compared to existing works, our approach has the complete reconstruction process which converts templates to the source code. It shows that adding a layer of abstractions increase the accuracy and reduces the training time of machine-learning-based automated bug repair tool.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectdeep learningen
dc.subjectautomated program repairen
dc.subjectneural machine translationen
dc.subjectseq2seq modelen
dc.titleAbstraction Mechanism on Neural Machine Translation Models for Automated Program Repairen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws.contributor.advisorTan, Lin
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages