Predicting Test Suite Effectiveness for Java Programs

Inozemtseva, Laura Michelle McLean

Predicting Test Suite Effectiveness for Java Programs

Files

Inozemtseva_Laura.pdf (1.1 MB)

Date

2012-07-26T15:12:48Z

Authors

Inozemtseva, Laura Michelle McLean

Publisher

University of Waterloo

Abstract

The coverage of a test suite is often used as a proxy for its effectiveness. However, previous studies that investigated the influence of code coverage on test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear that their results generalize to larger programs. In addition, some of the studies did not account for the confounding influence of test suite size. We have extended these studies by evaluating the relationship between test suite size, block coverage, and effectiveness for large Java programs. Our test subjects were four Java programs from different application domains: Apache POI, HSQLDB, JFreeChart, and Joda Time. All four are actively developed open source programs; they range from 80,000 to 284,000 source lines of code. For each test subject, we generated between 5,000 and 7,000 test suites by randomly selecting test methods from the program's entire test suite. The suites ranged in size from 3 to 3,000 methods. We used the coverage tool Emma to measure the block coverage of each suite and the mutation testing tool Javalanche to evaluate the effectiveness of each suite. We found that there is a low correlation between block coverage and effectiveness when the number of tests in the suite is controlled for. This suggests that block coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.