Evolution and Architecture of Open Source Software Collections: A Case Study of Debian
MetadataShow full item record
Software has been studied at a variety of granularities. Code, classes, groups of classes, programs and finally large scale applications have been examined in detail. What lies beyond is the study of software collections that group together many individual applications. Collecting software and distributing it via a central repository has been popular for a while in the open source world, and only recently caught on commercially with Apple’s Mac app store and Microsoft’s Windows store. In many of these software collections, there is normally a complex process that must be followed in order to fully integrate new applications into the system. Moreover, in the case of open source software collections, applications frequently rely on each other for functionality and their interactions can be complex. We know that there are thousands of applications in these software collections that people depend on worldwide, but research in this area has been limited compared to other areas and granularities of software. In this thesis, we explore the evolution and architecture of a large open source software collections by using Debian as a case study. Debian is a software collection based off the Linux kernel with a large number of packages spread over multiple hardware platforms. Each package provides a particular service or application and is actively maintained by one or more developers. This thesis investigates how these packages evolve through time and their interactions with one another. The first half of the thesis describes the life cycle of a package from inception to end by carrying out a longitudinal study using the Ultimate Debian Database (UDD). The birth of packages is examined to see how Debian is growing. Conversely, package death is also analyzed to determine the lifespan of these packages. Moreover, four different package attributes are examined. They are package age, package bugs, package maintainers and package popularity. These four attributes combine to give us the overall biography of Debian packages. Debian’s architecture is explored in the second part of the thesis, where we analyze how packages interact with each other by examining the package dependencies in detail. The dependencies within Debian are extensive, which makes for an interesting architecture, but they are complex to analyze. This thesis provides a close look at the layered pattern. This pattern categorizes each package into one of five layers based on how they are used. These layers may also be visualized to give a concise view of how an application is structured. Using these views, we define five architectural subpatterns and anti-subpatterns which can aid developers in creating and maintaining packages.