Open Source Software Evolution and Its Dynamics
This thesis undertakes an empirical study of software evolution by analyzing open source software (OSS) systems. The main purpose is to aid in understanding OSS evolution. The work centers on collecting large quantities of structural data cost-effectively and analyzing such data to understand software evolution <em>dynamics</em> (the mechanisms and causes of change or growth). <br /><br /> We propose a multipurpose systematic approach to extracting program facts (<em>e. g. </em>, function calls). This approach is supported by a suite of C and C++ program extractors, which cover different steps in the program build process and handle both source and binary code. We present several heuristics to link facts extracted from individual files into a combined system model of reasonable accuracy. We extract historical sequences of system models to aid software evolution analysis. <br /><br /> We propose that software evolution can be viewed as <em>Punctuated Equilibrium</em> (<em>i. e. </em>, long periods of small changes interrupted occasionally by large avalanche changes). We develop two approaches to study such dynamical behavior. One approach uses the evolution spectrograph to visualize file level changes to the implemented system structure. The other approach relies on automated software clustering techniques to recover system design changes. We discuss lessons learned from using these approaches. <br /><br /> We present a new perspective on software evolution dynamics. From this perspective, an evolving software system responds to external events (<em>e. g. </em>, new functional requirements) according to <em>Self-Organized Criticality</em> (SOC). The SOC dynamics is characterized by the following: (1) the probability distribution of change sizes is a power law; and (2) the time series of change exhibits long range correlations with power law behavior. We present empirical evidence that SOC occurs in open source software systems.