Source Language Representation of Function Summaries in Static Analysis
Static analysis is a popular method to find bugs. In context-sensitive static analysis the analyzer considers the calling context when evaluating a function call. This approach makes it possible to find bugs that span across multiple functions. In order to find those issues the analyzer engine requires information about both the calling context and the callee. Unfortunately the implementation of the callee might only be available in a separate translation unit or module. In these scenarios the analyzer either makes some assumptions about the behavior of the callee (which may be unsound) or conservatively creates a program state that marks every value that might be affected by this function call. In this case the marked value becomes unknown which implies significant loss of precision.
In order to mitigate this overapproximation a common approach is to assign a summary to some of the functions, and each time the implementation is not available, use the summary to analyze the effect of the function call. These summaries are in fact approximations of the function implementations that can be used to model some behavior of the called functions in a given context. The most proper way to represent summaries, however, remains an open question.
This paper describes a method of representing C/C++ functions’ summaries in the same language. We evaluate the advantages and disadvantages of this approach. It is challenging to use source language representation efficiently due to the compilation model of C/C++. We propose an efficient solution. The emphasis of the paper is on using static analysis to find errors in the programs, however the same approach can be used to optimize programs or any other tasks that static analysis is capable of. Our proof of concept implementation is available in the upstream version of the Clang compiler.