Toward Improved Understanding and Management of Software Clones
The cloning of code is controversial as a development practice. Empirical studies on the long-term effects of cloning on software quality and maintainability have produced mixed results. Some studies have found that cloning has a negative impact on code readability, bug propagation, and the presence of cloning may indicate wider problems in software design and management. At the same time, other studies have found that cloned code is less likely to have defects, and thus is arguably more stable, better designed, and better maintained. These results suggest that the effect of cloning on software quality and maintainability may be determinable only on a case-by-case basis, and this only aggravates the challenge of establishing a principled framework of clone management and understanding. This thesis aims to improve the understanding and management of clones within software systems. There are two main contributions. First, we have conducted an empirical study on cloning in one of the major device drivers families of the Linux kernel. Different from many previous empirical studies on cloning, we incorporate the knowledge about the development style, and the architecture of the subject system into our study; our findings address the evolution of clones; we have also found that the presence of cloning is a strong predictor (87\% accuracy) of one aspect of underlying hardware similarity when compared to a vendor-based model (55\% accuracy) and a randomly chosen model (9\% accuracy). The effectiveness of using the presence of cloning to infer high-level similarity suggests a new perspective of using cloning information to assist program comprehension, aspect mining, and software product-line engineering. Second, we have devised a triage-oriented taxonomy of clones to aid developers in prioritizing which kinds of clones are most likely to be problematic and require attention; a preliminary validation of the utility of this taxonomy has been performed against a large open source system. The cloning-based software quality assurance (QA) framework based on our taxonomy adds a new dimension to traditional software QA processes; by exploiting the clone detection results within a guided framework, the developer is able to evaluate which instances of cloning are most likely to require urgent attention.