|dc.description.abstract||Maintaining and evolving modern software systems is a difficult task: their scope and
complexity mean that seemingly inconsequential changes can have far-reaching consequences.
Most software development companies attempt to reduce the number of faults introduced by
adopting maintenance processes. These processes can be developed in various ways. In this
thesis, we argue that data science techniques can be used to support process development.
Specifically, we claim that robust development processes are necessary to minimize the
number of faults introduced when evolving complex software systems. These processes
should be based on empirical research findings. Data science techniques allow software
engineering researchers to develop research insights that may be difficult or impossible to
obtain with other research methodologies. These research insights support the creation of
development processes. Thus, data science techniques support the creation of empirically-based
We support this argument with three examples. First, we present insights into automated
malicious Android application (app) detection. Many of the prior studies done on this topic
used small corpora that may provide insufficient variety to create a robust app classifier.
Currently, no empirically established guidelines for corpus size exist, meaning that previous
studies have used anywhere from tens of apps to hundreds of thousands of apps to draw
their conclusions. This variability makes it difficult to judge if the findings of any one study
generalize. We attempted to establish such guidelines and found that 1,000 apps may be
sufficient for studies that are concerned with what the majority of apps do, while more than
a million apps may be required in studies that want to identify outliers. Moreover, many
prior studies of malicious app detection used outdated malware corpora in their experiments
that, combined with the rapid evolution of the Android API, may have influenced the
accuracy of the studies. We investigated this problem by studying 1.3 million apps and
showed that the evolution of the API does affect classifier accuracy, but not in the way we
originally predicted. We also used our API usage data to identify the most infrequently used
API methods. The use of data science techniques allowed us to study an order of magnitude
more apps than previous work in the area; additionally, our insights into infrequently used
methods illustrate how data science can be used to guide API deprecation.
Second, we present insights into the costs and benefits of regression testing. Regression
test suites grow over time, and while a comprehensive suite can detect faults that are
introduced into the system, such a suite can be expensive to write, maintain, and execute.
These costs may or may not be justified, depending on the number and severity of faults
the suite can detect. By studying 61 projects that use Travis CI, a continuous integration
system, we were able to characterize the cost/benefit tradeoff of their test suites. For
example, we found that only 74% of non-flaky test failures are caused by defects in the
system under test; the other 26% were caused by incorrect or obsolete tests and thus
represent a maintenance cost rather than a benefit of the suite. Data about the costs
and benefits of testing can help system maintainers understand whether their test suite
is a good investment, shaping their subsequent maintenance decisions. The use of data
science techniques allowed us to study a large number of projects, increasing the external
generalizability of the study and making the insights gained more useful.
Third, we present insights into the use of mutants to replace real faulty programs in
testing research. Mutants are programs that contain deliberately injected faults, where
the faults are generated by applying mutation operators. Applying an operator means
making a small change to the program source code, such as replacing a constant with
another constant. The use of mutants is appealing because large numbers of mutants can
be automatically generated and used when known faults are unavailable or insufficient in
number. However, prior to this work, there was little experimental evidence to support the
use of mutants as a replacement for real faults. We studied this problem and found that, in
general, mutants are an adequate substitute for faults when conducting testing research.
That is, a test suite’s ability to detect mutants is correlated with its ability to detect real
faults that developers have fixed, for both developer-written and automatically-generated
test suites. However, we also found that additional mutation operators should be developed
and some classes of faults cannot be generated via mutation. The use of data science
techniques was an essential part of generating the set of real faults used in the study.
Taken together, the results of these three studies provide evidence that data science
techniques allow software engineering researchers to develop insights that are difficult or
impossible to obtain using other research methodologies||en