FlaKat: A Machine Learning-Based Categorization Framework for Flaky Tests

Lin, Shizhe

dc.contributor.author	Lin, Shizhe
dc.date.accessioned	2023-01-26 15:46:47 (GMT)
dc.date.available	2023-01-26 15:46:47 (GMT)
dc.date.issued	2023-01-26
dc.date.submitted	2023-01-25
dc.identifier.uri	http://hdl.handle.net/10012/19125
dc.description.abstract	Flaky tests can pass or fail non-deterministically, without alterations to a software system. Such tests are frequently encountered by developers and hinder the credibility of test suites. Thus, flaky tests have caught the attention of researchers in recent years. Numerous approaches have been published on defining, locating, and categorizing flaky tests, along with auto-repairing strategies for specific types of flakiness. Practitioners have developed several techniques to detect flaky tests automatically. The most traditional approaches adopt repeated execution of test suites accompanied by techniques such as shuffled execution order, and random distortion of environment. State-of-the-art research also incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy. Moreover, strategies for repairing flaky tests have also been published for specific flaky test categories and the process has been automated as well. However, there is a research gap between flaky test detection and category-specific flakiness repair. To address the aforementioned gap, this thesis proposes a novel categorization framework, called FlaKat, which uses machine-learning classifiers for fast and accurate categorization of a given flaky test case. FlaKat first parses and converts raw flaky tests into vector embeddings. The dimensionality of embeddings is reduced and then used for training machine learning classifiers. Sampling techniques are applied to address the imbalance between flaky test categories in the dataset. The evaluation of FlaKat was conducted to determine its performance with different combinations of configurations using known flaky tests from 108 open-source Java projects. Notably, Implementation-Dependent and Order-Dependent flaky tests, which represent almost 75% of the total dataset, achieved F1 scores (harmonic mean of precision and recall) of 0.94 and 0.90 respectively while the overall macro average (no weight difference between categories) is at 0.67. This research work also proposes a new evaluation metric, called Flakiness Detection Capacity (FDC), for measuring the accuracy of classifiers from the perspective of information theory and provides proof for its effectiveness. The final obtained results for FDC also aligns with F1 score regarding which classifier yields the best flakiness classification.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.relation.uri	https://git.uwaterloo.ca/s222lin/flakytestcategorization	en
dc.subject	empirical	en
dc.subject	technological	en
dc.subject	software testing	en
dc.subject	flaky test	en
dc.subject	software quality assessment	en
dc.subject	machine learning	en
dc.subject	source code representation	en
dc.title	FlaKat: A Machine Learning-Based Categorization Framework for Flaky Tests	en
dc.type	Master Thesis	en
dc.pending	false
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Master of Applied Science	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Tahvildari, Ladan
uws.contributor.affiliation1	Faculty of Engineering	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Lin_Shizhe.pdf
Size:: 1.773Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Show simple item record