The Separate Compilation Assumption
MetadataShow full item record
Call graphs are an essential requirement for almost all inter-procedural analyses. This motivated the development of many tools and frameworks to generate the call graph of a given program. However, the majority of these tools focus on generating the call graph of the whole program (i.e., both the application and the libraries that the application depends on). A popular compromise to the excessive cost of building a call graph for the whole program is to build an application-only call graph. To achieve this, all the effects of the library code and any calls that the library makes back into the application are usually ignored. This results in potential unsoundness in the generated call graph and therefore in analyses that use it. Additionally, the scope of the application classes to be analyzed by such an algorithm has been often arbitrarily defined. In this thesis, we define the separate compilation assumption, which clearly defines the division between the application and the library based on the fact that the code of the library has to be compiled without access to the code of the application. We then use this assumption to define more specific constraints on how the library code can interact with the application code. These constraints make it possible to generate sound and reasonably precise call graphs without analyzing libraries. We investigate whether the separate compilation assumption can be encoded universally in Java bytecode, such that all existing whole-program analysis frameworks can easily take advantage of it. We present and evaluate Averroes, a tool that generates a placeholder library that over-approximates the possible behaviour of an original library. The placeholder library can be constructed quickly without analyzing the whole program, and is typically in the order of 80 kB of class files (comparatively, the Java standard library is 25 MB). Any existing whole-program call graph construction framework can use the placeholder library as a replacement for the actual libraries to efficiently construct a sound and precise application call graph. Averroes improves the analysis time of whole-program call graph construction by a factor of 3.5x to 8x, and reduces memory requirements by a factor of 8.4x to 12x. In addition, Averroes makes it easier for whole-program frameworks to handle reflection soundly in two ways: it is based on conservative assumptions about all behaviour within the library, including reflection, and it provides analyses and tools to model reflection in the application. We also evaluate the precision of the call graphs built with Averroes in existing whole-program frameworks. Finally, we provide a correctness proof for Averroes based on Featherweight Java.