|dc.description.abstract||Highly-configurable software systems often leverage variability modeling to achieve systematical reuse and mass customization. Although facilitating variability management, variability models do not eliminate the variability in other artifacts. In fact, evolving a system's variability is far from trivial, as variation points spread across different artifacts, possibly at multiple locations---evolving a single feature may affect many variation points. To make matters worse, existing approaches for variability evolution have been largely criticized in practice, as industry-based reports claim them as ineffective.
Ineffective support appears to be a direct consequence of lacking an in-depth understanding of how variability evolution happens in practice. For instance, most of the existing research focuses on variability evolution as it happens in variability models only, ignoring how its evolution relate to other artifacts (e.g., build and implementation files). Moreover, when validating new variability evolution approaches, researchers often rely on randomly generated models, or in some situations, even on fictitious cases. Studies that do account for variability evolution across different artifacts do so in the context of small systems, which are unlikely to be representative of the complexity typically found in large-scale subjects.
Understanding variability evolution is a pre-requisite for properly supporting it in practice. As the former is yet immature, we seek to advance the existing understanding by performing an in-depth analysis of variability evolution in large, complex, and real-world systems in the systems software domain.
As a starting point, we perform an exploratory analysis over a sample of the Linux kernel evolution, one of the largest and longest-living configurable system. Motivated by the impact of pattern analysis in modern software engineering (e.g., refactoring patterns), we set to mine evolution patterns from the Linux kernel commit history. Specifically, our patterns focus on the variability evolution induced by adding or removing features in the variability model, capturing how other artifacts (e.g., Makefiles and code) coevolve as a consequence. We identify 23 variability-coevolution patterns, from which we crosscheck their properties with the current literature, evidencing limitations in existing approaches, as well as providing insights for improving existing tools and helping to shape future ones. Additionally, we also observe how developers implement new features, finding feature scattering as a recurrent practice. This is particularly interesting, as feature scattering is often criticized in practice. We argue that scattering is not necessarily bad if used with care---in fact, as with the Linux kernel case, existing systems have shown that it is possible to achieve long-term evolution while accepting some level of feature scattering. The limits of feature scattering, however, are currently unknown. This is not surprising, as no empirical study investigates feature scattering across the evolution of large and long-lived software systems.
From our exploratory analysis of the Linux kernel, we perform further assessments to strengthen our understanding.
First, we set to increase the external validity of our patterns by validating them in the context of three other systems: axTLS, Toybox, and uClibc. We find that our patterns cover as much as 64% of all feature additions and removal cases across the evolution of our three chosen subjects---altogether, our validation spans a period of over 20 years of evolution. Moreover, we find 14 patterns whose use goes beyond Linux. In fact, we claim them as general cases within the systems software domain.
Second, seeking a better understanding of feature scattering limits, we return our attention to the Linux kernel evolution. Different from the mining of patterns, our analysis considers the entire snapshot of the Linux kernel commit history, covering almost eight years of evolution. Scoped to the scattering of device-driver features, the most common feature type in the Linux kernel, we set to identify empirical limits within the codebase, including the proportion of scattered features, as well as identifying typical scattering degrees. We also note specific feature types which appear to be more prone to scattering. While we do not claim the limits we find as universal, our study provides evidence that scattering can go as far as the limits we observe in the Linux kernel implementation.||en