Trade-Off Exploration for Acceleration of Continuous Integration

Zeng, Zhili

Trade-Off Exploration for Acceleration of Continuous Integration

dc.contributor.advisor	McIntosh, Shane
dc.contributor.author	Zeng, Zhili
dc.date.accessioned	2023-06-21T12:36:34Z
dc.date.available	2023-06-21T12:36:34Z
dc.date.issued	2023-06-21
dc.date.submitted	2023-06-13
dc.description.abstract	Continuous Integration (CI) is a popular software development practice that allows developers to quickly verify modifications to their projects. To cope with the ever-increasing demand for faster software releases, CI acceleration approaches have been proposed to expedite the feedback that CI provides. However, adoption of CI acceleration is not without cost. The trade-off in duration and trustworthiness of a CI acceleration approach determines the practicality of the CI acceleration process. Indeed, if a CI acceleration approach takes longer to prime than to run the accelerated build, the benefits of acceleration are unlikely to outweigh the costs. Moreover, CI acceleration techniques may mislabel change sets (e.g., a build labelled as failing that passes in an unaccelerated setting or vice versa) or produce results that are inconsistent with an unaccelerated build (e.g., the underlying reason for failure does not match with the unaccelerated build). These inconsistencies call into question the trustworthiness of CI acceleration products. We first evaluate the time trade-off of two CI acceleration products — one based on program analysis (PA) and the other on machine learning (ML). After replaying the CI process of 100,000 builds spanning ten open-source projects, we find that the priming costs (i.e., the extra time spent preparing for acceleration) of the program analysis product are substantially less than that of the machine learning product (e.g., average project-wise median cost difference of 148.25 percentage points). Furthermore, the program analysis product generally provides more time savings than the machine learning product (e.g., average project-wise median savings improvement of 5.03 percentage points). Given their deterministic nature, and our observations about priming costs and benefits, we recommend that organizations consider the adoption of program analysis based acceleration. Next, we study the trustworthiness of the same PA and ML CI acceleration products. We re-execute 50 failing builds from ten open-source projects in non-accelerated (baseline), program analysis accelerated, and machine learning accelerated settings. We find that when applied to known failing builds, program analysis accelerated builds more often (43.83 percentage point difference across ten projects) align with the non-accelerated build results. Accordingly, we conclude that while there is still room for improvement for both CI acceleration products, the selected program analysis product currently provides a more trustworthy signal of build outcomes than the machine learning product. Finally, we propose a mutation testing approach to systematically evaluate the trustworthiness of CI acceleration. We apply our approach to the deterministic PA-based CI acceleration product and uncover issues that hinder its trustworthiness. Our analysis consists of three parts: we first study how often the same build in accelerated and unaccelerated CI settings produce different mutation testing outcomes. We call mutants with different outcomes in the two settings “gap mutants”. Next, we study the code locations where gap mutants appear. Finally, we inspect gap mutants to understand why acceleration causes them to survive. Our analysis of ten thriving open-source projects uncovers 2,237 gap mutants. We find that: (1) the gap in mutation outcomes between accelerated and unaccelerated settings varies from 0.11%–23.50%; (2) 88.95% of gap mutants can be mapped to specific source code functions and classes using the dependency representation of the studied CI acceleration product; (3) 69% of gap mutants survive CI acceleration due to deterministic reasons that can be classified into six fault patterns. Our results show that deterministic CI acceleration suffers from trustworthiness limitations, and highlights the ways in which trustworthiness could be improved in a pragmatic manner. This thesis demonstrates that CI acceleration techniques, whether PA or ML-based, present time trade-offs and can reduce software build trustworthiness. Our findings lead us to encourage users of CI acceleration to carefully weigh both the time costs and trustworthiness of their chosen acceleration technique. This study also demonstrates that the following improvements for PA-based CI acceleration approaches would improve their trustworthiness: (1) depending on the size and complexity of the codebase, it may be necessary to manually refine the dependency graph, especially by concentrating on class properties, global variables, and constructor components; and (2) solutions should be added to detect and bypass flaky test during CI acceleration to minimize the impact of flakiness.	en
dc.identifier.uri	http://hdl.handle.net/10012/19575
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	https://zenodo.org/record/7042150	en
dc.relation.uri	https://zenodo.org/record/7641215	en
dc.relation.uri	https://zenodo.org/record/7872663	en
dc.subject	CI Acceleration	en
dc.subject	Software Build	en
dc.subject	Mutation Testing	en
dc.subject	Continuous Integration	en
dc.title	Trade-Off Exploration for Acceleration of Continuous Integration	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	McIntosh, Shane
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zeng_Zhili.pdf
Size:: 912.67 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science